A* PATH FINDING ALGORITHM Presented by Daniel Natapov
May 12, 2015
A* PATH FINDING ALGORITHMPresented by Daniel Natapov
PROBLEM DEFINITION
Find the shortest (weighted) path from a start node to a goal node in a graph (or grid).
What if we are “informed” with heuristics? Can we “look-ahead” and direct our search?
APPLICATION
Games – NPC movement. Needs to be smart and fast.
Games use grids to describe the environment. These slides do too.
Many other applications: Network routing Image processing A.I. Path finding ...
GRID = GRAPH
Grid allows movement between adjacent cells in 4 or 8 possible directions.
Each direction may have a different cost.
=
EXAMPLE – GET FROM S TO T
S
T
EXAMPLE – EDGE WEIGHTS
In a game, edge weights depend on various factors, ie travel on road vs. grass.
For simplicity: lets say all horizontal and vertical costs are the same.
Also assume no diagonal paths.
WWDD? – WHAT WOULD DIJKSTRA’S DO?
S
T
Found it! (finally)
S
WWDD?
Dijkstra’s algorithm guarantees shortest path.
But searches a lot of unneeded area. We know where the destination node is, (just
not how to get there). We can try to direct the search with greedy
Best-First-Search.
BEST-FIRST-SEARCH
Similar to Dijkstra’s, but is informed. Has some estimate of how far from the goal
each vertex is: “look-ahead”. This estimate is a heuristic. It prioritizes vertices which it believes to be
closets to the goal, as opposed to vertices closest to the start.
BEST-FIRST-SEARCH EXAMPLE
S
T
S
HOW TO BREAK IT – OBSTACLES!
S
T
S
Found it! (could’ve
taken a better route)
BEST-FIRST-SEARCH
Best-First-Search works faster than Dijkstra’s. But does not guarantee an optimal-path. We want some combination of Dijkstra’s and
Best-First-Search. Enter A*!
A* ALGORITHM
Prioritizes its search based on: The distance traveled (Dijkstra’s) The distance remaining (Best-First-Search)
g(n) = Distance traveled from the start to a cell.
h(n) = Estimated distance from a cell to the target.
Value of a cell is f(n) = g(n) + h(n). The algorithm prioritizes cells whose f(n) is
lowest.
WHAT’S ALL THIS TALK ABOUT ESTIMATES?
An estimate of the distance between a cell and a target is a heuristic.
May be able to estimate distance between two cells.
Choosing a good heuristic is important, and can be difficult.
In our simplified case it is easy: the Manhattan Distance:
h(n) = |cell.x – goal.x| + |cell.y – goal.y|
MANHATTAN DISTANCE
Good for our case. Actual distance can never be less (more on this later).
Lets go through a complete example.
A* EXAMPLE
S T
A* EXAMPLE
g = 1h = 6f = 7
g = 1h = 6f = 7
S g = 1h = 4f = 5
T
g = 1h = 6f = 7
A* EXAMPLE
g = 1h = 6f = 7
g = 1h = 6f = 7
S g = 1h = 4f = 5
T
g = 1h = 6f = 7
A* EXAMPLE
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 1h = 6f = 7
S g = 1h = 4f = 5
T
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 2h = 3f = 5
A* EXAMPLE
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 1h = 6f = 7
S g = 1h = 4f = 5
T
g = 1h = 6f = 7
g = 2h = 5f = 7
A* EXAMPLE
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 1h = 6f = 7
S g = 1h = 4f = 5
T
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 3h = 6f = 9
g = 3h = 6f = 9
A* EXAMPLE
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 1h = 6f = 7
S g = 1h = 4f = 5
T
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 3h = 6f = 9
A* EXAMPLE
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 1h = 6f = 7
S g = 1h = 4f = 5
T
g = 2h = 7f = 9
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 2h = 7f = 9
g = 3h = 6f = 9
A* EXAMPLE
g = 3h = 6f = 9
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 1h = 6f = 7
S g = 1h = 4f = 5
T
g = 2h = 7f = 9
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 2h = 7f = 9
g = 3h = 6f = 9
g=2h=7f=9
g=2h=7f=9
g=2h=7f=9
g=3h=8f=11
g=3h=8f=11
g=3h=8f=11
g=3h=8f=11
A* EXAMPLE
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 3h = 6f = 9
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 2h = 7f = 9
g = 1h = 6f = 7
S g = 1h = 4f = 5
T
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 3h = 6f = 9
g = 4 h = 5f = 9
A* EXAMPLE
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 3h = 6f = 9
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 2h = 7f = 9
g = 1h = 6f = 7
S g = 1h = 4f = 5
T
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 3h = 6f = 9
g = 4 h = 5f = 9
g = 5h = 4f = 9
A* EXAMPLE
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 3h = 6f = 9
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 2h = 7f = 9
g = 1h = 6f = 7
S g = 1h = 4f = 5
T
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 6 h = 3f = 9
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 3h = 6f = 9
g = 4 h = 5f = 9
g = 5h = 4f = 9
g = 6h = 3f = 9
A* EXAMPLE
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 3h = 6f = 9
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 2h = 7f = 9
g = 1h = 6f = 7
S g = 1h = 4f = 5
g = 7h = 2f = 9
T
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 6 h = 3f = 9
g = 7h = 2f = 9
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 3h = 6f = 9
g = 4 h = 5f = 9
g = 5h = 4f = 9
g = 6h = 3f = 9
A* EXAMPLE
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 3h = 6f = 9
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 2h = 7f = 9
g = 1h = 6f = 7
S g = 1h = 4f = 5
g = 7h = 2f = 9
g = 8h = 1f = 9
T
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 1h = 6f = 7
g = 2h = 5f = 7
g = 6 h = 3f = 9
g = 7h = 2f = 9
g = 8h = 1f = 9
g = 3h = 8f = 11
g = 2h = 7f = 9
g = 3h = 6f = 9
g = 4 h = 5f = 9
g = 5h = 4f = 9
g = 6h = 3f = 9
MORE ABOUT HEURISTICS
Depending heuristic, A* can be admissible. This guarantees an optimal solution, despite
using an estimate. For A* to be admissible and guarantee an
optimal solution we need:n, h(n) ≤ h*(n)
h*(n) is the actual distance. If the heuristic overestimates the actual
distance, an optimal solution is not guaranteed.
MORE ABOUT HEURISTICS CONT’D
For admissibility we also need monotonicity. Satisfy triangle inequality h(n1) ≤ c(n1 → n2)
+ h(n2)n1
goal
h(n1)
n2
c(n1->n2)
h(n2)
FIDDLING WITH THE HEURISTIC
Use the heuristic to balance speed vs. accuracy.
If h(n) = 0, then f(n) = g(n). In other words, A* becomes Dijkstra’s.
If h(n) >> g(n), g(n) can be ignored. f(n) ≈ h(n). A* becomes Best-First-Search.
In general: The bigger g(n) is, the more it expands, which
makes it slower. The bigger h(n) is, the more direct the search is,
but better paths could be missed.
FIDDLING WITH THE HEURISTIC 2
If h(n) = h*(n), then A* will find the optimal solution, and not expand anything unnecessary. Straight to the target.
Only possible with good heuristic and no obstacles.
Can ‘fiddle’ with the heuristic and set it depending on the need.
Sometimes okay to get an approximate solution at the cost of a speed-up.
SPEED-ACCURACY SEE-SAW
g(n) h(n)
SpeedAccuracy
FORMAL DEFINITION
preCond: Input a grid/graph G with positive edge weights, a source node s, a target node t.
Also given a admissible heuristic for estimating distances.
postCond: Finds a shortest weighted path from s to t.
Loop Invariant: So far, the nodes have been handled in order of f(n), where f(n) = g(n) + h(n).
FORMAL DEFINITION CONT’D
Step: Handle the found (not handled) node with min f(n).
Store the parent for each cell – the cell through which the shortest path from s came.
Exit: Stop when t has been found.
Obtaining the post condition: LI + Exit + Code => PostCond.
Proving the path we traced back is shortest: Prove there is a path of this length: We have one.
Prove there is no shorter path: ...
PROOF THAT THERE IS NO BETTER PATH
We know our heuristic is admissible, h(n) ≤ h*(n).
By LI, our path handles cells in order of the minimum of f(n) = g(n) + h(n).
All unhandled paths have a larger f(n) than ours.
We found t. So h(n)=0. f(n) becomes our actual -g(n). In other words, our actual cost is lower than the actual+estimated of any other found node.
The estimated cost is always less than the actual cost. Meaning our actual cost is less than any other actual cost.
COMPLICATED SLIDE. PAY
ATTENTION!
CONFUSED?
f(n) ≤ f(any) = g(any) + h(any)
≤ g(any) + actual(any)
Our actual cost
Any other found estimate
Any other ‘actual’
WHAT IT ALL MEANS
If heuristic is admissible, A* returns the shortest path.
It will find it by (likely) expanding and searching less cells than Dijkstra’s.
But if the condition that h(n) ≤ h*(n) is violated, we can no longer ensure optimality.
Should it always be optimal?
RUNNING TIME
Well.... Dijkstra’s O(|E| + |V |log |V| )
V = number of vertices, E = number of edges. Obviously it is possible for A* to search every
edge as well, so we have no savings in the worst case.
Lets focus on the nodes instead: Dijkstra’s O(V2)
RUNNING TIME – DIJKSTRA’S
S T
Area of circle is O(L2)
RUNNING TIME – A*
S T
Area of half ellipse: O(L∙H)
RUNNING TIME – A*
ST
In total: O((L/n)∙H∙n) = O(L∙H)
ALL DONE! Thank you. Questions?
The algorithm was first described in 1968 by Peter Hart, Nils Nilsson, and Bertram Raphael
References & Resources: http://theory.stanford.edu/~amitp/GameProgrammi
ng/ (Great source)
http://www.policyalmanac.org/games/aStarTutorial.htm
http://en.wikipedia.org/wiki/A*_search_algorithm http://www.cse.yorku.ca/course_archive/2008-09/W/
3402/slides/Week3.pdf