Abhishek Pathfinding Algorithms

8/13/2019 Abhishek Pathfinding Algorithms

http://slidepdf.com/reader/full/abhishek-pathfinding-algorithms 1/15

Beginners Guide to Pathfinding Algorithms

Version: Printer Friendly

Basic Search

Senior Diablo

Search methods aren't the perfect solution for every problem, but with creative applications can solve

many. If search is an appropriate solution, which of the many methods do you use? Which method is

guaranteed to find the solution? Which is the most efficient?

The following psuedocode represents the most basic form of tree/network searching. It contains no

heuristics or assumptions. It will be modified to demonstrate each of the search methods discussed.

Pseudocode:

This assumes that the following have been passed to the function

A start node of the network or tree, denoted S

A goal node, denoted G

This also assumes there is a tree or network from which contains the paths to be followed.

create a list P

add the start node S, to P giving it one element

Until first path of P ends with G, or P is empty

extract the first path from P

extend the path one step to all neighbors creating X new paths

reject all paths with loops

add each remaining new path to P

If G found -> success. Else -> failure.

We'll follow a simple example:

http://ai-depot.com/Features/Print.html?id=41



http://ai-depot.com/Tutorial/PathFinding-Optimal.html

http://ai-depot.com/Tutorial/PathFinding-Heuristics.html

http://ai-depot.com/Tutorial/PathFinding-Blind.html

http://ai-depot.com/Tutorial/PathFinding.html
















Each node is denoted with a letter. The start node is S the goal node is G. A and B are nodes

representing intermediate destinations. Each line connecting two nodes represents the path which must

be followed to arrive at that new node. Notice that each path has a number, or a weight, associated

with it. This number can represent simple distance or travel difficulty (such as difficult terrain, for

example) or travel time such as how long it takes to get there.

First, we pass the start node, S and the goal node G to our search function. S is added to the list first

thing and then we enter the main loop.

S is extracted from the list and extended one step to each of its neighbors. The only neighbor S has is A

so this results in one path, S->A, which is then added back into the list which now contains only that

element (we removed S, remember?)

Next time through the loop, it extracts S->A and extends one step to each of A's neighbors. This gives us

three more paths: S->A->S, S->A->B and S->A->G. S->A->S is a loop, so we discard it. This leaves us with

S->A->B and S->A->G each of which are pushed into the list. The list now contains two elements, theones we just added.

The next iteration proceeds in a similar manner: S->A->B is extracted and extended to it's neighbors,

giving us: S->A->B->A and S->A->B->G. The first is a loop (see A in there twice?) and so is eliminated. S-

>A->B->G is added to the list. The list now contains two elements: S->A->G and S->A->B->G

At last, we're done. The loop condition checks the first element of the list which is S->A->G. Since the

last part of this path ends at the destination, G, it jumps to the end. Having reached G, it returns success.



Is this the most efficient method? That depends... It looks like it, I mean there are two possible paths: S-

>A->G which only has two steps (S to A and A to G) and S->A->B->G which has three (S to A, A to B, and B

to G). It has fewer steps so its the shortest path, right?

Not necessarily: If you add up the weight of each possible path (S->A->G and S->A->B->G) you get 7 for

S->A->G and only 6 for S->A->B->G. So, the second is actually shorter even though the former has fewer

steps. Which one you want is up to you. This most basic method will give you the fewest number of

steps, but not necessarily the lowest-cost (or shortest) path.

Another possibility we can explore is this:

Notice there is no possible path from S to G. What will happen in this case is:

S will be expanded to S->A

S->A will be expanded to S->A->B

S->A->B cannot be expanded, so the list will remain empty

The loop will end because the list is empty, G has not been found so it will return failure.

Next we'll make our search method a little more complicated and explore DFS, BFS and random search



Assume you know nothing about the graph/tree/network/whatever being searched (I'm just going to

call it a tree from now on because, once you reject all loops a graph or network is basically just a tree

anyway. This is oversimplified, and there's a mathematical proof of it, but that only complicates the

issue). You dont know how many neighbors each node has until you get there. From starting node, S,you know what neighbors it has (say A and B) but you dont know how many neighbors A and B have

until you get there.

Even not knowing things like that, DFS and BFS are guaranteed to find you a path if one exists.

BFS is fairly similar to the basic search I described previously. This uses a data structure called a queue.

You add the newly formed paths to the back of the list. When you remove them to expand them, you

remove them from the front.

create a queue P






add each remaining new path to the BACK of P


BFS explores the tree uniformly checks all paths one step away from the start, then two steps, then

three, and so on.

DFS differs from BFS only in how the new paths are added to the list. In this case, it uses a stack rather

than a queue. In a stack, new elements are added to the front of the list rather than the back, but when

the remove the paths to expand them, you still remove them from the front. The result of this is that

DFS explores one path, ignoring alternatives, until it either finds the goal or it cant go anywhere else.

create a stack P








push each remaining new path to the FRONT of P


For this example, we'll have to make our graph a little more complicated to better demonstrate. I have

left the weights off the node connections to make it a little more simple.

This example begins as the first one did: we add S to the list (in this case a stack). Then we enter the

main loop:

S is removed and expanded to each neighbor resulting in the paths S->A and S->D which are then added

to the FRONT of the list (the order of individual paths added is a matter of preference. For this exampl,

I'm adding them so S->A is first and S->D is second).

S->A is extracted and expanded giving us: S->A->B, S->A->D, and S->A->S. S->A->S gives us a loop, so we

reject it leaving us with the following list of paths: S->A->B, S->A->D, S->D

We extract S->A->B and expand it to the following: S->A->B->A (loop), S->A->B->C, S->A->B->D. Rejecting

the loop and adding the others to our list gives our list the following contents: S->A->B->C, S->A->B->D,

S->A->D, S->D

Next, we look at S->A->B->C and expand it one step which only takes us back to B in a loop. So, we reject

it and move on - it was a dead end. Our list now contains: S->A->B->D, S->A->D, S->D

Expanding S->A->B->D adds only S->A->B->D->E to our list, having ignored the three loops: S->A->B->D-

>S, S->A->B->D->A, and S->A->B->D->B.

Finally we're in the home stretch. Over the next few passes through the loop it expands S->A->B->D->E

to S->A->B->D->E->F and finally to S->A->B->D->E->F->G which gives us a path to our goal.



BFS and DFS are guaranteed to find a path to the goal (if one exists) but not necessarily the most

efficient one. In this case, DFS gave us a path, but one that had an unnecessary detuour through A and

B.

So if both are guaranteed to reach the destination, which one do we use? The answer to this depends on

the tree itself. BFS is bad for those trees that have a high branching factor, that means that each node

has a lot of neighbors use DFS for this. DFS is bad for those trees that have a lot of very long paths: use

BFS for this.

So what do you do if you dont know the branching factor or the length of the paths? Use a non-

deterministic random search. With this, you simply add the paths to random parts of the list. It doesn't

necessarily help, but it doesnt hurt either.

Next, A little knowledge can go a long way with Heuristics.

DFS, BFS and nondeterministic random searches are all fine and good if you dont know anything about

the tree you're searching. If you know even a little bit, though, that knowledge can help you immensely.

For one thing, if you know the branching factor and average distance of the paths, you can decide if you

should use BFS or DFS.

If you know more than that, for example, distance to goal, you can use that to greatly improve the

efficiency of your algorithm.

Given our previous graph,



lets assume each node has the following distances from the goal:

Note, though, that the gray lines represent distance, not actual paths. Using these distance

measurements and DFS, we produce a method called Hill Climbing. The psuedocode for Hill Climbing

follows:

create a stack P





If any new paths exist

sort them by their distances from the last node to the goal




push each remaining new path to the FRONT of P


Note the italicized addition. This determines the order they are added to the stack as in DFS (remember

before when I said it was a matter of preference? It was then, but now things have changed).

Now we look at the previous example again, with the distances added:

We add S (distance of 20) to the stack and enter the main loop.

Remove S(20), expand it to S->A(15) and S->D(10). We sort these so the shortest remaining path goes

first and add them to the stack. So, our stack is now S->D(10) and S->A(15).

Remove the first one, S->D(10), and expand it to S->D->A(15), S->D->B(9) and S->D->E(8) and add them

(sorted) to the stack which now has the following: S->D->E(8), S->D->B(9), S->D->A(15) and S->A(15)

Expand S->D->E(8) to S->D->E->F(3) which is still the shortest path so it then gets expanded to S->D->E-

>F->G and we've reached our goal. Note that this takes care of the problem brought up in the DFS

discussion. This does not mean that Hill Climbing solves all the problems of DFS, it just happened to find

a more efficient path in this one example. In its worst-case scenario, Hill Climbing behaves as DFS.

While the Hill Climbing Method improves the efficiency of DFS, BFS has a potential improvement as well

- Beam Search. Beam Search artificially limits the branching factor of the tree to some arbitrary value

(for example, 2). This value is denoted W for width of the beam.

The psuedocode for beam search is as follows:

create a queue P






extend ALL PATHS one step to all neighbors creating X new paths


Sort all paths by estimated distance to goal

discard all but closest W paths

push each remaining new path to the BACK of P


The effect of this is it limits the number of neighbors that must be explored to only those that are

closest to the goal. The estimated remaining distance is, in most cases the straight-line distance.

However, because the beam search discards potential paths which it never examines again, it may be

possible to discard paths which prove more efficient later on or, in some worse cases, discard the only

paths to the goal. A very simple example can demonstrate this:

The numbers represent the estimated straight line distance to the goal (these, and most values like this

are completely arbitrary numbers I concocted for demonstration purposes) For this example, we'll

assume W is 1, so we only want to explore the shortest estimated distance and ignore everything else.

We start with S which is expanded to S->A. S->A is expanded to S->A->B and S->A->C. We eliminate all

but the closest w paths. Since w=1, we keep only the 1 closest path. S->A->B has approximately '3' to go

while S->A->C only has one. We put S->A->C into the list and discard S->A->B. Then, on the next loop, we

find that S->A->C cant be expanded so we return a failure indicating that no path exists. But, one DOES

exist, but our beam search failed to find it. This is where beam search can break down.

Hill Climbing and Beam search both have inherent problems and unless special care is taken (and

sometimes its not practicle to monitor the search and make sure its working correctly) they may not find



a path, even if one exists. So, there must be a way to use heuristic knowledge in some way to guarantee

a path will be found. The solution to this is Best First Search. The psuedocode follows:

create a list P




extend ALL PATHS one step to all neighbors creating X new paths


add each remaining new path to P

Sort entire list P by estimated distance to goal


This is guaranteed to find the path to the goal if any path exists and is likely (though not guaranteed) to

do so efficiently. It may follow some unnecessary twists and turns but is still more efficient than BFS or

DFS in most cases. In its worst-case scenario, however, it behaves just like BFS

Blind searches will find ANY path. Heuristic searches will (usually) find ANY path, but will do so faster

(usually) than blind search.

Sometimes it's OK to find just ANY path to the goal as long as you get there. But sometimes you want to

find the BEST path to the goal. The fastest, cheapest, or easiest route to take is oftentimes more

important than finding SOME path. Thats where optimal search comes in. The methods that follow are

intended to find the optimal path.

The first method is an exhaustive search. This method is guaranteed to find the best path, but is often

quite inefficient. The method is simple and, at first glance, logical: explore every possible path and

return the shortest one. One way to do this is to do BFS or DFS, but dont stop when the goal is reached.

Continue until EVERY node has been visited. During this, though, keep track of the distances traveled on

each path and return the shortest one.

This is practical for only small problems as this can get quite computationally expensive very fast.

However, this is not much different than blind searches, so why not add a bit of heuristic tuning to

improve efficiency as we have done before? We can expand shortest paths first (as in Best-First Search)

and we can stop exploring certain paths if it hasnt reached the goal yet but is still longer than an existing

complete path. The end result of this is called Branch and Bound search.

I'll demonstrate on a new graph:



First, S is added with distance 0. Note that this is the distance TRAVELLED now, rather than the

estimated distance to goal.

S is expanded to S->A (with distance travelled = 3) and S->D (distance=4). These are then sorted so the

shortest current path is first. This is S->A(3) first.

S->A(3) is expanded to S->A->B(3 + 4 (the distance from A to B) is 7) and S->A->D(8) both are added to

the list which is then sorted S->D(4), S->A->B(7), S->A->D(8)

Expanding the shortest one, S->D(4), yields S->D->A(9) and S->D->E(6). The latter of these two happens

to have the shortest path now, so we expand that one on the next iteration yielding: S->D->E->B(11) and

S->D->E->F(10)

S->A->B is the shortest now, with 7. Expanding this gives us S->A->B->C(11) and S->A->B->E(12).

(for brevity, I'll list the following steps in this form: [Shortest path] expands to [list of resulting paths])

S->A->D(8) yields S->A->D->E(10)

S->D->A(9) yields S->D->A->B(13)

S->D->E->F(10) yields S->D->E->F->G(13) which means we've found the goal with a cost of 13!

Now, as we begin expanding the rest of the paths, we can discard any that give us a cost of greater than

13. S->A->D->E(10) is the current cheapest at this step. We expand it to S->A->D->E->B(15) and S->A->D-

>E->F(14) Since both are greater than 13, we ignore them.

And so on. It turns out that 13 IS the shortest cost path and our Branch and Bound program found it. I

know you've been dying for the Branch and Bound psuedocode, so here it is:

create a list P






extend first path one step to all neighbors creating X new paths


add each remaining new path to of P

Sort all paths by current distance travelled, shortest first.


Now we're going to improve Branch and Bound by adding an underestimate of the remaining distance

to the goal. This gives us an underestimate of the total distance of the path:

Total underestimate = current distance travelled + underestimate of distance remaining

We can therefore stop expanding paths if the path's total underestimate is of greater distance than that

of a complete path already found. Underestimates must yield the shortest possible path (shortest

distance between two points is a straight line so we use that as our underestimate heuristic).

We'll use the graph above, plus the following straight-line distances:

Now, we keep track of a total underestimate: the total distance travelled, plus the straight-line distance

from the last node in the path to the goal.

We expand S(0 distance travelled + 11 underestimate = 11) to S->A(3 + 10.4 = 13.4) and S->D(12.9)

S->D(12.9) has a shorter underestimate than S->A(13.4), so we expand that one. S->D->A(19.4) and S->D-

>E(12.9) result.

S->D->E(12.9) expands to S->D->E->B(17.7) and S->D->E->F(13)



S->D->E->F(13) Expands to the goal at S->D->E->F->G(13) and we've reached our goal with a cost of 13.

This is the same thing we got before, but this one took a lot fewer steps and didnt explore nearly as

many paths. After we reach the goal with a cost of 13, all other partially explored paths have an

estimated cost of greater than 13 so they cant possibly have a lower cost than the path we've already

found. So, we can safely ignore them.

create a list P







Sort all paths by total underestimate, shortest first.


The total underestimate, again, is: distance of partial path travelled + straight-line distance from last

node in path to goal

This seems pretty good and, as it turns out, this is the optimal path for this graph. There is another way

to improve efficiency, however, and that is to avoid doing the same work twice. We can do this using

Dynamic Programming.

If we implement Branch and Bound with Dynamic Programming, we can see this in action (using the

same graph). In this example, we're looking only at distance travelled again. We'll return to

underestimates shortly.



As usual, we expand S to S->A and S->D. These have partial paths of 3 and 4 respectively.

S->A(3) expands to S->A->B(7) and S->A->D(8) We already had a path go to D, though with the path S->D(4). Since we have a shorter path to D, we can ignore the longer path and discard S->A->D(8).

S->D(4) expands to S->D->A(9) and S->D->E(6) Again, S->D->A(9) is a longer path to A than simply S->A(3)

so we can discard it.

S->D->E(6) expands to S->D->E->B(11) and S->D->E->F(10) We discard S->D->E->B(11) because we

already have a shorter path to B and continue

S->A->B(7)expands to S->A->B->C(11) and S->A->B->E(12) Since we've reached E with half that cost, we

discard this longer path.

S->D->E->F(10) expands to S->D->E->F->G(13) and we've reached our goa, again with the shortest path.

create a list P






for all paths that end at the same node, keep only the shortest one.


Sort all paths by total distance travelled, shortest first.




Now, the dynamic programming saved us some steps from the first examples, but not as many as the

second (with underestimates). Since we have two different methods of saving us steps, wouldnt it be

great if we could combine them some how?

You (and the rest of the AI community) are in luck! Combining Branch and Bound with dynamic

programming and underestimates yields the favorite A* pathfinding algorithm.

create a list P






for all paths that end at the same node, keep only the shortest one.


Sort all paths by total underestimate, shortest first.


If you've forgotten, the total underestimate is: distance of partial path travelled + straight-line distancefrom last node in path to goal

As you can see, A* is the culmination of several time and step-saving techniques which have arisen over

the years

Abhishek Pathfinding Algorithms

Documents