# Lecture 19 -- Graphs Traversal and Shortest Paths
## Motivating Questions
How can we find the *shortest* path between two vertices in a graph?
## Conceptual Study Questions
- Why does breadth-first traversal always yield a shortest path, but depth-first might not?
- If we had used a `HashSet` to store `toCheck` instead of a `Stack`, `Queue`, or `List`, would we still have gotten shortest paths? Why or why not?
- Imagine that you had a graph and wanted to compute a route from vertex X to vertex Y. One time, you compute the route using depth-first traversal; another time, you compute it using breadth-first traversal. Would both attempts have visited all the same vertices (even in different orders)? Either explain why or draw a graph that shows why not.
- These notes state that returning a route from breadth-first traversal requires an additional data structure for the "came in from" information. Is an additional data structure also needed for depth-first traversal? Why or why not?
# An Alternate Way to Write `canReach`
Our current recursive version of `canReachVisit` works fine, but we could also write this method non-recursively. The key to seeing how it works is to realize that the recursive calls and the for loop set up an order in which the vertices will be explored. Let's look at the program trace again:
```=java=
man.canReachVisited(har, {})
bos.canReachVisited(har, {man})
pvd.canReachVisited(har, {man, bos})
return false
wor.canReachVisited(har, {man, bos, pvd})
har.canReachVisited(har, {man, bos, pvd, wor})
return true
```
When the method gets called from `pvd`, Java remembers that the for-loop still needs to process `wor`. Rather than have Java remember that, let's modify our code to maintain an explicit list of the vertices to process in order. We'll call this list `toCheck`. When we get to the neighbors, we'll add them to `toCheck`. Then, we keep taking vertices off of `toCheck` and process them until there are none left.
Here's the code. It uses a stack to store `toCheck` (recall from lab that a stack is like a list in which we only add elements to the front and remove them from the front):
```=java=
public class CityVertex {
...
public boolean canReachStack(CityVertex dest) {
HashSet<CityVertex> visited = new HashSet<CityVertex>();
Stack<CityVertex> toCheck = new Stack<CityVertex>();
// put the start node into the toCheck Stack
toCheck.push(this);
// as long as we have unvisited nodes to check, keep processing them
while (!toCheck.isEmpty()) {
CityVertex check = toCheck.pop();
visited.add(check);
if (check == dest) {
// found a solution! Cancel the while loop and return
return true;
} else {
// add each unvisited node to the stack
for (CityVertex neighbor : check.toCities) {
if (!visited.contains(neighbor))
toCheck.push(neighbor);
}
}
}
return false;
}
}
```
Note that this program is not recursive. But it parallels what happens in the recursive call by maintaining the list of vertices that need to be processed, then processing them all in turn.
Why would we bother writing this version? Because being explicit about the order in which the vertices are explored turns out to be important in understanding how to process graphs. We'll see why in a moment.
## Finding Shortest Paths
Up until now, we've asked "is there a route". We haven't asked for the route itself. In many cases, however, we do want to know how to get from point A to point B in a graph. Moreover, we often want to get there in as few steps as possible. Thus, we are turning our attention to finding a *shortest path* between two cities.
To work through this problem, let's consider a different graph. Assume we are trying to find the shortest route from A to F. We determine "shortest" by the number of edges in the route.

By inspecting the graph, we see that there are no routes from A to F of length 1. There are, however, two routes from A to F of length 2: A-C-F and A-B-F. Either one of these could be returned as a shortest route from A to F.
### Computing shortest paths
As we have done recently, let's try working out a plan for computing shortest routes. Here's one proposal:
**Plan to Find Shortest Route by Enumerating All Paths**
1. Generate a list of all routes in the graph
2. Find those that go from A to F
3. Sort them in order of length
4. Return one of the shortest ones
What do we think of this approach? Generating a list of all the routes seems complicated, as well as overkill. In a large graph, it would also be a pretty long list, and sorting that list would get expensive (either $r^2$ or $r * log(n)* where $r$ is the number of routes). This really isn't viable, but we show it here just to show the contrast with a better approach.
**Plan to Find Shortest Route by Checking Paths in Order of Length**
If we instead try to mimic the explanation we gave just below the graph, we might try an approach like the following:
1. Generate all paths of length 1 from A; check whether any goes to F
2. Generate all paths of length 2 from A; check whether any goes to F
3. repeat at increasing lengths until path found or out of paths
This is an improvement -- by checking the routes in order of length until we find one that works, we avoid having to generate all of them or having to sort them. We just need a way to generate the paths in order of length.
How might we go about this?
We start with the start vertex (here A). The routes of length 1 from A can be generated by taking each neighbor of A and creating a path from A to that vertex. This would yield routes A-D, A-C, and A-B. The routes of length 2 could be computed by taking each route of length 1 and extending it with the neighbors of the last vertex on the route. Thus:
- A-D would yield new route A-D-E
- A-C would yield new routes A-C-G and A-C-F
- A-B would yield new route A-C-F
Summing up, we would like to consider routes in this order:
A, A-D, A-C, A-B, A-D-E, A-C-G, A-C-F (stop here, since have route from A to F)
**Stop and make sure you understand this approach**
Actually creating and storing all of these paths would end up using up a fair bit of memory. It would be nice to optimize this. Rather than building the whole path, we will take two steps:
1. have `toCheck` store the vertices from which to expand potential routes, not the full paths
- vertices will be explored in the order A, D, C, B, E, G, F
2. track which vertex was responsible for adding each new vertex to `toCheck`. This information looks like:
- D came in from A
- C came in from A
- B came in from A
- E came in from D
- G came in from C
- F came in from C
For step 1, all we have to do is switch the Stack in our previous code to a Queue: that way, we would explore the nodes closer to A before nodes farther from A. Here's the code with step 1 implemented:
```=java=
public class CityVertex {
...
// different approach to canReach -- we make the call stack explicit
public boolean canReachQueue(CityVertex dest) {
HashSet<CityVertex> visited = new HashSet<CityVertex>();
Queue<CityVertex> toCheck = new LinkedList<>();
// put the start node into the toCheck Stack
toCheck.add(this);
// as long as we have unvisited nodes to check, keep processing them
while (!toCheck.isEmpty()) {
CityVertex check = toCheck.remove();
visited.add(check);
if (check == dest) {
// found a solution! Cancel the while loop and return
return true;
} else {
// add each unvisited node to the (end of the) queue
for (CityVertex neighbor : check.toCities) {
if (!visited.contains(neighbor))
toCheck.add(neighbor);
}
}
}
return false;
}
}
```
For step 2, we have to add some data structure to our code that will store the "came from" entries. We're leaving this as an exercise for you on the project.
### Reconstructing Paths
Assume we had stored the following information.
- D came in from A
- C came in from A
- B came in from A
- E came in from D
- G came in from C
- F came in from C
To reconstruct the route, we start with F in this list of information and work back to A, noting each vertex we encounter. Here, we see that F came in from C, which came in from A. Thus, our path must be A-C-F.
Our implementation to construct a path would be a separate method that we call once we determine that a path exists. Here's an outline of the code one last time:
```=java=
public class CityVertex {
...
// different approach to canReach -- we make the call stack explicit
public LinkedList<CityVertex> shortestRoute(CityVertex dest) {
HashSet<CityVertex> visited = new HashSet<CityVertex>();
Queue<CityVertex> toCheck = new LinkedList<>();
??? cameFrom; // a data structure storing how vertices came in
// put the start node into the toCheck Stack
toCheck.add(this);
// as long as we have unvisited nodes to check, keep processing them
while (!toCheck.isEmpty()) {
CityVertex check = toCheck.remove();
visited.add(check);
if (check == dest) {
// found a solution!
// Cancel the while loop and reconstruct the route
return this.reconstructRoute(dest, cameFrom);
} else {
// add each unvisited node to the (end of the) queue
for (CityVertex neighbor : check.toCities) {
if (!visited.contains(neighbor))
toCheck.add(neighbor);
}
}
}
// return empty list or throw exception if no route found
return new LinkedList<>();
}
}
```
Figuring out what `cameFrom` looks like and how to reconstruct the route from it is part of your work on an upcoming assignment, so we won't give you the details here.