What is AI?cgi.di.uoa.gr/~ys02/siteAI2005/lectures/ai2004-2pp.pdf · AI is the eld of science and engineering which attempts to build ... What is AI (cont’d) De nitions found in

'

&

$

%

What is AI?

AI is the field of science and engineering which attempts to build

intelligent systems.

But what are intelligent systems?

�� !"��#��$�%�&('*)��+'

&

$

%

What is AI (cont’d)

Definitions found in AI textbooks tend to fall into the following

categories:

• AI is the field of science and engineering which attempts to

build systems that act like humans.

• ... that think like humans.

• ... that think rationally.

• ... that act rationally.

'

&

$

%

Acting Like Humans: The Turing Test Approach

To pass the Turing test a computer should have the following

capabilities:

• natural language processing

• knowledge representation

• automated reasoning

• machine learning

• computer vision

• robotics

Within AI, there has not been a big effort to pass the Turing test.

�� !"��#��$�%�&('*)��+'

&

$

%

Thinking Like Humans:

The Cognitive Modelling Approach

How do humans think?

There two ways to find out:

• Introspection

• Psychological experiments

Example: The GPS program by Newell and Simon

In this tradition, psychology and cognitive science are very

relevant.

'

&

$

%

Thinking Rationally: The Laws of Thought Approach

What are the laws of thought? This question goes back to the

syllogisms of the Greek philosopher Aristotle.

The logicist tradition in AI has followed this approach.

Example: Early work on theorem proving

The emphasis on this tradition is correct inference. As a result related

work from philosophy and logic is very important.

�� !"��#��$�%�&('*)��+'

&

$

%

Acting Rationally: The Rational Agent Approach

In this approach the design of rational agents is the main

problem.

What is an agent?

'

&

$

%

Agents

An agent is anything that can be viewed as perceiving its

environment through sensors and acting upon that environment

through effectors or actuators.

?

agent

percepts

sensors

actions

effectors

environment

�� !"��#��$�%�&('*)��+'

&

$

%

Examples of Agents

• Human agents

• Robotic agents

• Software agents (or software robots or softbots).

'

&

$

%

Rational Agents

A rational agent is one that acts so as to achieve the best

outcome or, when there is uncertainty, the best expected outcome.

Other properties of agents:

• autonomy

• social ability

• situatedness

• adaptivity

• ...

�� !"��#��$�%�&('*)��+'

&

$

%

Acting Rationally (cont’d)

The study of AI as rational agent design is

• more general than the laws of thought approach

• “easier” than approaches based on human thought or human

behaviour

This is the approach that we will take in this course. We will

concentrate on general principles of rational agents and on

components for constructing them.

'

&

$

%

Foundations of AI

The following disciplines have contributed ideas, viewpoints and

techniques to AI.

• Philosophy

• Mathematics

• Economics

• Neuroscience

• Psychology and cognitive science

• Computer science and engineering

• Control theory and cybernetics

• Linguistics

�� !"��#��$�%�&('*)��+'

&

$

%

History of AI

• Gestation (1943-1955)

Models of artificial neurons (McCulloch and Pitts, 1943).

Hebbian learning (Hebb, 1949).

The article “Computing Machinery and Intelligence” by Alan

Turing (1950).

Snarc: The first neural network computer (Minsky and

Edmonds, 1951).

• Birth (1956)

The Dartmouth workshop in the summer of 1956 (McCarthy,

Minsky, Newell, Simon).

'

&

$

%

History of AI (cont’d)

• Early enthusiasm, great expectations (1952-1969)

Logic Theorist, General Problem Solver, Geometry Theorem

Prover, game playing, Lisp, theorem proving, Shakey the robot,

micro-worlds, adalines, perceptrons.

• A dose of reality (1966-1973)

Programs with no domain knowledge, intractability problems.

Cancellation of big projects on machine translation (US),

Lighthill report (UK).

�� !"��#��$�%�&('*)��+'

&

$

%


• Knowledge-based systems (1969-1979)

The role of domain specific knowledge, expert systems.

Representation and reasoning languages (e.g., Prolog and frame-based

languages).

• AI becomes industry (1980-present)

The first successful expert system: R1 (McDermott, DEC).

The Japanese 5th generation project (1981) and its emphasis on logic

programming.

Microelectronics and Computer Technology Corporation (MCC) in the

U.S.

Alvey report in the U.K.

• The return of neural networks (1986-present)

Connectionism.

'

&

$

%


• AI becomes science (1987-present)

Neats vs. scruffies.

Knowledge representation, speech recognition, neural networks

and data mining, Bayesian networks, robotics, computer vision.

• Intelligent agents (1995-present)

See the conference AAMAS

(http://www.aamas-conference.org/)

• Semantic Web (1998-present)

See the site http://www.semanticweb.org/.

�� !"��#��$�%�&('*)��+'

&

$

%

State of the Art

• Autonomous planning and scheduling

See NASA’s Remote Agent

(http://ic.arc.nasa.gov/projects/remote-agent/).

• Game Playing

See IBM’s Deep Blue

(http://www.research.ibm.com/deepblue/).

• Autonomous control

See CMU’s NavLab computer controlled minivan

(http://www.ri.cmu.edu/labs/lab_28.html).

See DARPA’s grand challenge in autonomous ground vehicles

(http://www.stanfordracing.org/).

'

&

$

%

State of the Art

• Constraint solving software

See solvers by ILOG (http://www.ilog.com).

�� !"��#��$�%�&('*)��+'

&

$

%

Readings

Chapter 1 and 2 (not in depth) of AIMA.

'

&

$

%

Solving Problems by Searching

• Agents, Goal-Based Agents, Problem-Solving Agents

• Search Problems

• Blind Search Strategies

�� !"��#��$�%�&('*)��+'

&

$

%

Agents

?

agent

percepts

sensors

actions

effectors

environment

Definition. An agent is anything that can be viewed as

perceiving its environment through sensors and acting upon

that environment through effectors or actuators.

'

&

$

%

Examples of Agents

• Human agents

• Robotic agents

• Software agents (or software robots or softbots).

�� !"��#��$�%�&('*)��+'

&

$

%

How Agents Should Act?

The behavior of an agent depends on the following:

• The environment. This is the world where the agent lives.

• The percept sequence. This is the complete history of everything the

agent has ever perceived.

The agent can be described (almost completely) by an agent function

that maps every given percept sequence to an action.

The agent function will be implemented by an agent program.

• The performance measure. This is the objective criterion for

success of an agent’s behavior. It is imposed by the agent designer. It

might not be easy to define the performance measure.

Example: The performance measure of an automatic taxi driver

should be ....

'

&

$

%

Goal-Based Agents

Agent

En

viro

nm

en

t

Sensors

Effectors

What it will be like if I do action A

What the worldis like now

What action Ishould do now

State

How the world evolves

What my actions do

Goals

The behavior of an agent also depends on:

• The agent’s goals. A goal specifies what states of the

environment are desirable for the agent.

�� !"��#��$�%�&('*)��+'

&

$

%

Problem-Solving Agents

Problem-solving agents are a class of goal-based agents.

Problem solving agents decide what to do by finding sequences of

actions that lead to desirable states.

Example: Consider an agent in the city of Arad, Romania. How

can it get to Bucharest the next day, on time for its flight?

'

&

$

%

Problem-Solving Agents

Problem-solving agents work by carrying out the following tasks

repeatedly:

• Goal formulation: decide what the objective is.

• Problem formulation: decide what actions and states to

consider in order to meet the objective.

• Search: find a sequence of actions that achieve the goal.

• Execution: execute the chosen sequence of actions.

�� !"��#��$�%�&('*)��+'

&

$

%

Example: Route Finding in Romania

Giurgiu

UrziceniHirsova

Eforie

Neamt

Oradea

Zerind

Arad

Timisoara

Lugoj

Mehadia

Dobreta

Craiova

Sibiu Fagaras

Pitesti

Vaslui

Iasi

Rimnicu Vilcea

Bucharest

'

&

$

%

Our First Agent Program

function SimpleProblemSolvingAgent(percept) returns an

action

static seq, state, goal, problem

state← UpdateState(state, percept)

if seq is empty then

goal← FormulateGoal(state)

problem← FormulateProblem(state, goal)

seq ← Search(problem)

end

action← First(seq)

seq ← Rest(seq)

return action

�� !"��#��$�%�&('*)��+'

&

$

%

Structure of Agents

Agent = Architecture + Program

The architecture makes the percepts from the sensors available to

the program, runs the program, and feeds the program’s action

choices to the effectors as they are generated.

We will only deal with agent programs in this course.

'

&

$

%

Search Problems

The basic elements of a search problem are:

• The initial state.

• The set of available actions. To specify the available actions

we use a successor function Succ which, given a state x,

returns a set of ordered pairs (action, successor state). This

set tells us what actions are possible in x and what states are

reachable from x by executing these actions.

The initial state and the successor function define state space

of a search problem: the set of all states reachable from the

initial state by any sequence of actions.

A path in the state space is any sequence of states connected

by a sequence of actions.

�� !"��#��$�%�&('*)��+'

&

$

%

Search Problems (cont’d)

• The goal to be achieved. The goal is a set of world states

called the goal states. Goals can be specified implicitly by a

goal test i.e., a test which can be applied to a state to

determine if it is a goal state.

• A path cost function is a function (usually denoted by g)

which assigns a numeric cost to each path. The path cost will

usually be the sum of the costs of the individual actions along

the path.

The step cost of taking action a to go from state x to state y

is denoted by c(x, a, y).

A solution to a search problem is a path from the initial state to a

goal state. A solution is optimal if it has the lowest path cost

among all solutions.

'

&

$

%

An Example

The route finding problem from Arad to Bucharest can formally be

specified as follows:

• The states specify the city we are in e.g., In(Arad).

• The only available action is GoTo e.g., GoTo(Sibiu).

• For every city the successor function gives us a set of pairs

(GoTo(.), In(.)). For example,

S(Arad) = {(GoTo(Sibiu), In(Sibiu)),

(GoTo(T imisoara), In(T imisoara)), (GoTo(Zerind), In(Zerind))}.

• The initial state is In(Arad). The goal state is In(Bucharest).

• The path cost can be the road distance in kilometers.

�� !"��#��$�%�&('*)��+'

&

$

%

The 8-puzzle

Start State Goal State

2

45

6

7

8

1 2 3

4

67

81

23

45

6

7

81

23

45

6

7

8

5

'

&

$

%

The 8-puzzle (cont’d)

Formal specification:

• States: a state description specifies the location of each tile and

blank

• Actions: blank moves L, R, U, D

• Goal state

• Path cost: length of the path

�� !"��#��$�%�&('*)��+

The 8-queens problem

'

&

$

%

The 8-queens problem (cont’d)


• States: any arrangement of 0 to 8 queens on board

• Actions: add a queen to any square

• Goal test: 8 queens on board, none attacked

• Path cost: zero

Size of state space: 648

�� !"��#��$�%�&('*)��+'

&

$

%

The 8-queens problem (cont’d)

Alternative specification:

• States: arrangements of 0 to 8 queens with none attacked

• Actions: place a queen in the left-most empty column such

that it is not attacked by any other queen.


• Path cost: zero

Size of state space: 88

'

&

$

%

Search Problems in the Real World

• Route finding

• Touring problems e.g., travelling salesman

• Robot navigation

• Automatic assembly sequencing

• Protein design

• Query optimisation problems in DBMS

• Internet searching

• Automatic workflow generation

�� !"��#��$�%�&('*)��+'

&

$

%

Computational Complexity Note

Almost all of the problems presented above have NP-hard or

worse computational complexity.

Thus, we should not expect simple algorithms for search problems

to be efficient. This is a big problem for search problems; we will

try to find ways to tackle it!

'

&

$

%

Searching for Solutions

Example:

Timisoara

Timisoara

(a) The initial state Arad

(b) After expanding Arad

(c) After expanding Sibiu

Arad

Sibiu Zerind

Rimnicu VilceaOradeaFagarasArad

Arad

Sibiu Zerind

�� !"��#��$�%�&('*)��+'

&

$

%

Searching for Solutions (cont’d)

Comments:

• Finding a solution is done by searching through the state space.

The trick is to maintain and extend a set of partial solutions.

• The choice of which state to expand next is determined by the

search strategy.

• The search process is building up a search tree that is

superimposed over the state space.

• It is important to distinguish between the state space and the

search tree.

'

&

$

%

Searching for Solutions (cont’d)

function TreeSearch(problem, strategy)

returns a solution or failure

initialize the search tree using the initial state of problem

loop do

if there are no candidates for expansion then

return failure

choose a leaf node for expansion according to strategy

if the node contains a goal state then

return the corresponding solution

else expand the node and add the resulting nodes

to the search tree

end

�� !"��#��$�%�&('*)��+'

&

$

%

Search Tree Nodes

Nodes in the search tree can be represented by a data structure with

five components:

• State: the state to which the node corresponds.

• ParentNode: the node in the search tree that generated this

node.

• Action: the action that was applied to generate the node.

• PathCost: the cost of the path from the initial state to the node.

• Depth: the number of nodes on the path from the root to this

node.

'

&

$

%

The Fringe or Frontier

The set of nodes awaiting to be expanded is called the fringe or

frontier. It can be implemented as a queue with operations:

• MakeQueue(Elements)

• Empty?(Queue)

• RemoveFront(Queue)

• QueuingFn(Elements,Queue)

�� !"��#��$�%�&('*)��+'

&

$

%

The General Tree Search Algorithm

function TreeSearch(problem,QueuingFn)

returns a solution, or failure

fringe← MakeQueue(MakeNode(InitialState[problem]))

loop do

if fringe is empty then return failure

node← RemoveFront(fringe)

if GoalTest[problem] applied to State[node] succeeds then

return node

fringe← QueuingFn(fringe,Expand(node, problem))

end

The function Expand is responsible for calculating each of the

components of the nodes it generates.

'

&

$

%

Search Algorithms

We will consider two kinds of search algorithms:

• Uninformed or blind

• Informed or heuristic

Evaluation criteria for a search algorithm:

• Completeness

• Optimality

• Time complexity

• Space complexity

�� !"��#��$�%�&('*)��+'

&

$

%

Uninformed Search Methods

• Breadth-first search

• Uniform-cost search

• Depth-first search

• Depth-limited search

• Iterative deepening search

• Bidirectional search

'

&

$

%

Breadth-first search (BFS)

function BreadthFirstSearch(problem)


return TreeSearch(problem,EnqueueAtEnd)

Example:

�� !"��#��$�%�&('*)��+'

&

$

%

Breadth-first search (cont’d)

Evaluation:

• Complete? Yes, if the branching factor b is finite.

• Time: O(bd+1) where b is the branching factor and d is the

depth of the solution.

• Space: O(bd+1). This is the biggest problem with BFS.

• Optimal? Yes, under the assumption that the path cost is a

non-decreasing function of the depth of the node (e.g., when all

actions have identical cost).

Note: BFS finds the shallowest goal state.

'

&

$

%

Uniform-cost search (UCS)

Modifies BFS by always expanding the lowest cost node on the

fringe (as measured by the path cost).

Example:

(a) (b)

S

0 S

A B C1 5 15

5 15

S

A B C

G11 S

A B C15

G11

G10

S G

A

B

C

1 10

55

15 5

�� !"��#��$�%�&('*)��+'

&

$

%

Uniform-cost search (cont’d)

Evaluation:

• Complete? Yes.

• Time: O(bdC∗/εe) where b is the branching factor, C∗ is the

cost of the optimal solution and every action costs at least

ε > 0.

• Space: same as time.

• Optimal? Yes.

Completeness and optimality hold under the assumption that the

branching factor is finite and the cost never decreases as we go

along a path i.e., g(Successor(n)) ≥ g(n) for every node n. The

last condition holds e.g., when each action costs at least ε > 0.

Note: BFS is UCS with g(n) =Depth(n).

'

&

$

%

Depth-first search (DFS)

Depth-first search always expands one of the nodes at the deepest

level of the search tree.

Example:

�� !"��#��$�%�&('*)��+'

&

$

%

Depth-first search (cont’d)

function DepthFirstSearch(problem)


TreeSearch(problem,EnqueueAtFront)

Evaluation:

• Complete? No

• Time: O(bm) where b is the branching factor and m is the

maximum depth of the search tree.

• Space: O(bm).

• Optimal? No

'

&

$

%

Depth-limited search (DLS)

Like DFS but imposes a depth limit on search. E.g., for the

“driving to Bucharest” example, a good depth-limit is 19 (we have

20 cities).

Evaluation:

• Complete? Yes, iff l ≥ d where l is the depth limit and d the

depth of a solution.

• Time: O(bl)

• Space: O(bl)

• Optimal? No

Question: Can we always find a good depth-limit?

�� !"��#��$�%�&('*)��+'

&

$

%

Iterative-deepening search (IDS)

IDS sidesteps the issue of choosing the best depth-limit by trying

all possible ones: 0,1,2 and so on.

function IterativeDeepeningSearch(problem)

returns a solution sequence

for depth← 0 to ∞ do

if DepthLimitedSearch(problem, depth) succeeds then

return its result

end

return failure

'

&

$

%

Iterative-deepening search (cont’d)

Example:

Limit = 3

Limit = 2

Limit = 1

Limit = 0

.....

�� !"��#��$�%�&('*)��+'

&

$

%


Question: Is IDS wasteful?

Answer: No!

Let us assume that the solution is found when the last node at level

d is expanded. Then the number of nodes generated in a BFS to

depth d is

1 + b+ b2 + · · ·+ bd + (bd+1 − b)

The number of nodes generated in an IDS to depth d is

(d+ 1) + db+ (d− 1)b2 + · · ·+ 2bd−1 + 1bd

Using the above formulas we can see that BFS can actually be a lot

more wasteful than IDS (e.g., try b = 10 and d = 5).

'

&

$

%


Evaluation:

• Complete? Yes, under the assumptions for BFS.

• Time: O(bd)

• Space: O(bd)

• Optimal? Yes, under the assumptions for BFS.

IDS is the search algorithm of choice when the search space is large

and the depth of the search is not known.

�� !"��#��$�%�&('*)��+'

&

$

%

Bidirectional search

Idea: Search both forward from the initial state and backward

from the goal. Stop when the two searches meet.

Motivation: bd/2 + bd/2 << bd

Problems:

• What does it mean to search backwards from the goal?

• What if we have many possible goal states?

• Can we check efficiently that the two searches meet?

• What kind of search do we do in each half?

'

&

$

%

Bidirectional search (cont’d)

Evaluation:

• Complete? Yes, if the branching factor is finite and both

directions use BFS.

• Time: O(bd/2)

• Space: O(bd/2)

• Optimal? Yes, if both directions use BFS and under the

assumptions for BFS.

�� !"��#��$�%�&('*)��+'

&

$

%

Avoiding Repeated States

Example:

A

B

C

D

A

BB

CCCC

'

&

$

%

Avoiding Repeated States (cont’d)

• In this case the state space is a graph.

• A solution is to avoid generating any state that was generated

before. This can be enforced by keeping a list of the generated

states called the closed list. In this case the fringe of

unexpanded nodes is called the open list.

The closed list can be implemented by a hash-table for retrieval

in constant time. However, there is no easy way to avoid the

space penalty!

�� !"��#��$�%�&('*)��+'

&

$

%

The General Graph Search Algorithm

function GraphSearch(problem,QueuingFn)


closed← an empty set

fringe← MakeQueue(MakeNode(InitialState[problem]))

loop do

if fringe is empty then return failure

node← RemoveFront(fringe)

if GoalTest[problem] applied to State[node] succeeds then

return node

if State[node] is not in closed then

add State[node] to closed

fringe← QueuingFn(fringe,Expand(node, problem))

end

'

&

$

%

Summary

• Agents, Goal-Based Agents, Problem-Solving Agents

• Search Problems

• Blind Search Strategies

Readings: Chapter 3, Sections 3.1-3.5 of AIMA

�� !"��#��$�%�&('*)��+'

&

$

%

Informed (or Heuristic) Search Methods

• Heuristics

• Best-first search

• The algorithm A∗

• Properties of heuristic functions

• Branch-and-bound

'

&

$

%

Heuristics

All blind search algorithms that we discussed have time complexity

of order O(bd) or something similar. This is unacceptable in real

problems!

In large search spaces, one can do a lot better by using

domain-specific information to speed-up search.

Heuristics are “rules of thumb” for selecting the next node to be

expanded by a search algorithm.

�� !"��#��$�%�&('*)��+'

&

$

%

Best-First Search

A blind search algorithm could be improved if we knew the best (or

“seemingly best”) node to expand.

function BestFirstSearch(problem,EvalFn)

returns a solution sequence

QueuingFn← a function that orders nodes in

ascending order of EvalFn

return TreeSearch(problem,QueuingFn)

The function EvalFn is called the evaluation function.

Note: GraphSearch can be used instead of TreeSearch.

'

&

$

%

Evaluation Functions and Heuristic Functions

There is a whole family of best-first search algorithms with

different evaluation functions.

A key component of many of these algorithms is a heuristic

function h such that

h(n) = estimated cost of the cheapest path from the state

at node n to a goal state

h can be any function such that h(n) = 0 if n is a goal node. But in

order to find a good heuristic function, we need domain specific

information.

�� !"��#��$�%�&('*)��+'

&

$

%

Greedy Best-First Search

GreedyBestFirstSearch tries to expand the node that it is

closest to the goal, on the grounds that this is likely to lead to a

solution quickly. Thus nodes are evaluated using the heuristic

function h i.e., f(n) = h(n).

function GreedyBestFirstSearch(problem)


return BestFirstSearch(problem, h)

The algorithm is greedy because it prefers to take the biggest

possible bite out of the remaining cost to reach the goal.

'

&

$

%

Example - On the Road to Bucharest

Giurgiu

UrziceniHirsova

Eforie

Neamt

Oradea

Zerind

Arad

Timisoara

Lugoj

Mehadia

Dobreta

Craiova

Sibiu Fagaras

Pitesti

Vaslui

Iasi

Rimnicu Vilcea

Bucharest

71

75

118

111

70

75

120

151

140

99

80

97

101

211

138

146 85

90

98

142

92

87

86

�� !"��#��$�%�&('*)��+'

&

$

%

Example (cont’d)

hSLD(n)= straight line distance between n and the goal location.

Distances for Bucharest are shown below:

Urziceni

NeamtOradea

Zerind

Timisoara

Mehadia

Sibiu

PitestiRimnicu Vilcea

Vaslui

241

25332980

199

380234

374

Bucharest

GiurgiuHirsova

Eforie

Arad

Lugoj

DobretaCraiova

Fagaras

Iasi

0160242161

77151

366

244226

176

100193

'

&

$

%

Example (cont’d)

Rimnicu Vilcea

Zerind

Arad

Sibiu

Arad Fagaras Oradea

Timisoara

Sibiu Bucharest

329 374

366 380 193

253 0

Rimnicu Vilcea

Zerind

Arad

Sibiu

Arad Fagaras Oradea

Timisoara

329 374

366 176 380 193

Zerind

Arad

Sibiu Timisoara

253 329 374

Arad

366

(a) The initial state

(b) After expanding Arad

(c) After expanding Sibiu

(d) After expanding Fagaras

�� !"��#��$�%�&('*)��+'

&

$

%

Greedy Best-First Search (cont’d)

Evaluation:

• Complete? No (consider the problem of getting from Iasi to

Fagaras; search can oscillate between Iasi and Neamt).

• Time: O(bm) where m is the maximum depth of the search

space.

• Space: O(bm)

• Optimal? No (the path Arad-Sibiu-Rimnicu

Vilcea-Pitesti-Bucharest is optimal).

A good choice of h can reduce space and time substantially.

'

&

$

%

The A∗ Search Algorithm

Greedy Best-First Search:

• Searches by minimizing the estimated cost h(n) to the goal

• Neither optimal nor complete

Uniform Cost Search:

• Searches by minimizing the cost g(n) of the path so far

• Optimal, complete

Can we combine the two algorithms?

�� !"��#��$�%�&('*)��+'

&

$

%

The A∗ Algorithm (cont’d)

A∗ is a best-first search algorithm with evaluation function

f(n) = g(n) + h(n).

In this case f(n) is the estimated cost of the cheapest solution

through n.

function A∗Search(problem) returns a solution or failure

return BestFirstSearch(problem, g+ h)

'

&

$

%

A∗ Goes to Bucharest

See illustration in accompanying file astar-progress.ps or in

AIMA.

�� !"��#��$�%�&('*)��+'

&

$

%


Let us assume that A∗ uses TreeSearch as its main subroutine

and also:

• The function h is chosen such that it never overestimates the

cost to reach a goal. Such an h is called an admissible

heuristic.

If h is admissible then f(n) never overestimates the actual cost

of the best solution through n.

• The branching factor b is finite.

• Every action costs at least δ > 0.

'

&

$

%


Evaluation (under the previous assumptions):

• Complete? Yes.

• Time: exponential, unless the error in the heuristic function h

grows no faster than the logarithm of the actual path cost.

For most heuristics used in practice, the error is at least

proportional to the path cost.

But even when A∗ takes exponential time, it offers a huge

improvement compared to blind search.

�� !"��#��$�%�&('*)��+'

&

$

%


Evaluation (cont’d):

• Space: O(bd).

This is the main drawback of A∗. The algorithm IDA∗

addresses the large space requirements of A∗.

• Optimal? Yes.

'

&

$

%

Optimality and Completeness of A∗

Proposition. A∗ is optimal.

Proof: Let us assume that the cost of the optimal solution is C∗

and a suboptimal goal node G2 appears on the fringe. Then

because G2 is suboptimal and h(G2) = 0, we have:

f(G2) = g(G2) + h(G2) = g(G2) > C∗

Now consider a fringe node n which is on the optimal path.

Because h does not overestimate the cost to the goal, we have:

f(n) = g(n) + h(n) ≤ C∗

So G2 will not be chosen for expansion!

�� !"��#��$�%�&('*)��+'

&

$

%

Optimality and Completeness of A∗ (cont’d)

The proof of optimality breaks down when A∗ uses GraphSearch

as its main subroutine because GraphSearch can discard the

optimal path to a repeated state if it is not the first one to be

generated.

To guarantee optimality in this case, we have two options:

• Modify GraphSearch so that it discards the most

expensive path found to a node.

• Impose an extra requirement of consistency or monotonicity

on h.

'

&

$

%

Consistent Heuristics

Definition. A heuristic h is called consistent if for all nodes n, n′

such that n′ is a successor of n generated by any action a,

h(n) ≤ c(n, a, n′) + h(n′).

This is a form of the general triangle inequality: the sum of the

lengths of any two sides of a triangle is greater than the length of

the remaining side.

Proposition. Every consistent heuristic is also admissible.

Most admissible heuristics that one can think of are also consistent

(e.g., hSLD)!

�� !"��#��$�%�&('*)��+'

&

$

%


Proposition. If h is consistent then the values of f for nodes

expanded by A∗ along any path are non-decreasing.

Proof: Let n be a node and n′ its successor. Then

g(n′) = g(n) + c(n, a, n′)

for some action a, and we have

f(n′) = g(n′)+h(n′) = g(n)+c(n, a, n′)+h(n′) ≥ g(n)+h(n) = f(n).

Thus we can conceptually draw contours in the state space like

contours in a topographic map.

'

&

$

%

The behaviour of A∗

O

Z

A

T

L

M

DC

R

F

P

G

BU

H

E

V

I

N

380

400

420

S

�� !"��#��$�%�&('*)��+'

&

$

%


A∗ search is complete: as we add contours of increasing f , we

must eventually reach a contour where f is equal to the cost of the

path to a goal state.

In fact, A∗ works as follows:

• It expands all nodes with f(n) < C∗

• It may then expand some of the nodes right on the “goal

contour”, for which f(n) = C∗, before selecting a goal node.

'

&

$

%


A∗ expands no nodes with cost f(n) > C∗ where C∗ is the cost of

the optimal solution.

There is no other optimal algorithm that is guaranteed to expand

fewer nodes than A∗.

�� !"��#��$�%�&('*)��+'

&

$

%

Heuristic Functions

What is a good heuristic for the 8-puzzle problem?

Start State Goal State

2

45

6

7

8

1 2 3

4

67

81

23

45

6

7

81

23

45

6

7

8

5

'

&

$

%

The 8-puzzle Problem


• States: a state description specifies the location of each tile and

blank

• Actions: blank moves L, R, U, D

• Goal state

• Path cost: length of the path

�� !"��#��$�%�&('*)��+'

&

$

%

Heuristic Functions (cont’d)

Heuristics for the 8-puzzle:

• h1 = the number of tiles in the wrong position

• h2 = the sum of the horizontal and vertical distances of all tiles

from their goal positions (Manhattan distance).

Both heuristics are admissible. Which one is better?

'

&

$

%


A way of characterizing the quality of a heuristic is to find its

effective branching factor b∗.

If the total number of nodes expanded by A∗ for a particular

problem is N , and the solution depth is d then

N = 1 + b∗ + (b∗)2 + · · ·+ (b∗)d.

Usually, b∗ is fairly constant over a large number of instances. A

well-defined heuristic would have a value of b∗ close to 1.

�� !"��#��$�%�&('*)��+

Comparing A∗ and IDS

Search Cost Effective Branching Factor

d IDS A*(h1) A*(h2) IDS A*(h1) A*(h2)

2 10 6 6 2.45 1.79 1.794 112 13 12 2.87 1.48 1.456 680 20 18 2.73 1.34 1.308 6384 39 25 2.80 1.33 1.24

10 47127 93 39 2.79 1.38 1.2212 364404 227 73 2.78 1.42 1.2414 3473941 539 113 2.83 1.44 1.2316 – 1301 211 – 1.45 1.2518 – 3056 363 – 1.46 1.2620 – 7276 676 – 1.47 1.2722 – 18094 1219 – 1.48 1.2824 – 39135 1641 – 1.48 1.26

'

&

$

%


If h2(n) ≥ h1(n) for all nodes n then h2 dominates h1 (or h2 is

more informed) than h1.

Example: In the 8-puzzle h2 dominates h1.

Theorem: If h2 dominates h1 then A∗ using h2 will expand fewer

nodes, on average, than A∗ using h1.

Lesson: It is always better to use an admissible heuristic

function with higher values.

�� !"��#��$�%�&('*)��+'

&

$

%

Heuristic Functions: How do we find them?

It is possible to find heuristic functions by considering relaxed

versions of the given problem.

The cost of an optimal solution to a relaxed problem is an

admissible heuristic for the original problem.

Relaxed problems can sometimes be generated automatically and

then heuristics can be discovered automatically! Otherwise, we

have to consider the problem at hand carefully and use our brain!

'

&

$

%

Heuristic Functions: How do we find them?

If we have admissible heuristics h1, . . . , hn such that no one

dominates any of the others then we can choose

h = max(h1, . . . , hn).

Final note: The cost of computing the heuristic function for each

node must be taken into account.

�� !"��#��$�%�&('*)��+'

&

$

%

Extensions of A∗

The main problem with A∗ is its excessive use of memory for large

problems. Several algorithms have been invented to tackle this

problem: IDA∗, RBFS, MA∗, SMA∗. See AIMA for more details.

'

&

$

%

Branch-and-Bound

Another class of traditional intelligent search algorithms pioneered

originally in the Operations Research community is

branch-and-bound.

Branch-and-bound has been designed for optimization

problems. The main idea is to eliminate parts of the search

space where we know that a solution cannot be found.

In Operations Research courses branch-and-bound is usually

presented in the context of solving of integer linear

programming problems. We will present a general formulation of

branch-and-bound and examples from the following book:

Christos Papadimitriou and Kenneth H. Steiglitz.

Combinatorial Optimization - Algorithms and Complexity.

Prentice-Hall, 1982.

�� !"��#��$�%�&('*)��+'

&

$

%

Branch-and-Bound (cont’d)

In branch-and-bound the search space is organized as a tree with

the following two features:

• Branching or partitioning. Each node represents a set of

solutions which can be partitioned into mutually exclusive sets.

Each subset in the partition is represented by a child of the

node.

• Lower bounding. There is an algorithm for computing a

lower-bound on the cost of each solution in a given subset (i.e.,

obtained as a child of a node). Actually, we need a

lower-bound if we are minimizing but an upper bound if

we are maximizing.

'

&

$

%


The tree can be searched in any way we choose (e.g., DFS or BFS).

However, if we have already found a solution with cost c by

traversing the tree, and we are at a node with lower bound ≥ c,

then we can safely ignore (prune) this branch of the tree and

carry on our search with another branch.

This is the step in branch-and-bound where heuristic knowledge

about the problem domain is used.

Notice the differences with the Artificial Intelligence terminology

we have used so far:

• Partition the current solution set – refine – branch (OR).

• Extend a partial solution – create – generate-and-test (AI).

�� !"��#��$�%�&('*)��+'

&

$

%


algorithm BranchAndBound(problem)

activeset := {problem}

U :=∞; currentbest := anything

while activeset is not empty do

choose a branching node k ∈ activeset

remove node k from activeset

generate the children of node k: child i, i = 1, . . . , nk,

and the corresponding lower bounds zi

for i = 1, . . . , nk do

if zi ≥ U then kill child i

else if child i is a complete solution and zi < U then

U := zi; currentbest := i

else add child i to activeset

end

end

'

&

$

%

Example

The shortest-path problem for directed weighted graphs.

Definition. Let G = (V,E) be a directed graph with non-negative

weight cj ≥ 0 associated with each arc ej ∈ E. The shortest-path

problem is to find a directed path from a distinguished source

node s to distinguished terminal node t, with the minimum total

weight.

Note that this is just an example. Dijkstra’s algorithm for the

shortest-path problem is more efficient than using branch and

bound and runs in time O(n2) where n is the number of nodes in

the graph.

�� !"��#��$�%�&('*)��+'

&

$

%

Example Graph

a 2 u 3 e

2 j

2 n

4

b 3

a f 9

k 3

o 4

c 4

2 g l

5 p

2

h 2

m 1

q 1 r 2

s t

d 7

i 1

'

&

$

%

Applying Branch-and-Bound

s

2 3

12

4

9 4 5

14 7

5

6 10 8

6 10 11 7

6

7

5

8

a

u

f g q k

m l

d e

r p

m l

g f

b c

h

l m

�� !"��#��$�%�&('*)��+'

&

$

%

The Search Tree with Pruned Branches Shown

s

2 3

12

4

9 4 5

14

7 5

6 10 8

6 10 11

7

6

7

5

8

a

u

f g q k

m l

d e

r p

m l

g f

b c

h

l m

8 12

m l

8 8

r p

o

11 9 9

r p

'

&

$

%

Applying Branch-and-Bound (cont’d)

At each node in the search tree of this example the following is true:

• Branching is determined by considering which arc to choose

to continue the path. I.e., a subset of the feasible solutions

corresponds to all paths from s to t that start by the choices

already made.

• The lower bound used is the cumulative length of the partial

path up to the current node.

Note: if we always choose as branching node the one with the

shortest partial path, we have an algorithm similar to UCS (or A∗

with h = 0).

�� !"��#��$�%�&('*)��+'

&

$

%

Example: k-way Number Partitioning (kNUMP)

Definition. Let S be a finite bag (multi-set) of positive integers.

Partition S into k bags A1, . . . , Ak ⊆ S so as to minimize the

following difference:

∆(A1, . . . , Ak) = maxi

∑

x∈Ai

x−mini

∑

x∈Ai

x

Let us deal with 2NUMP for simplicity.

Example: How do we partition the bag of numbers {8, 7, 6, 5, 4}

into two bags so that the sum of the numbers in these two bags is

minimized?

'

&

$

%

A Greedy Algorithm

Order the given numbers in descending order. Initially, the two

subsets are empty. Then, repeatedly take the next input number

and assign it to the subset with the smallest sum so far.

For the input set {8, 7, 6, 5, 4}, this greedy algorithm will return the

subsets

{8, 5} and {7, 6, 4}

with difference 4 which is not optimal. The optimal difference is 0.

�� !"��#��$�%�&('*)��+'

&

$

%

The Search Tree

{}{}

{8} {}

{8} {7}

{8} {7,6}

{8,5} {7,6}

{8,5,4} {7,6}

{8,6} {7}

{8,7} {}

{8,6} {7,5}

{8,6} {7,5,4}

{8,7} {6}

{8,7} {6,5}

{8,7} {6,5,4}

'

&

$

%

Executing the Search Algorithm

• We can search the tree in a DFS fashion.

• We can order the search by always trying first the branch

where the next number is put in the smallest subset.

• This search strategy will actually return the greedy solution

first. We can call this algorithm anytime: it starts with a

greedy solution, and given more time, it improves it until it

finds and proves the optimal solution.

�� !"��#��$�%�&('*)��+'

&

$

%

Pruning the Search Tree by Branch and Bound

• Let us assume that we have already found a solution with

difference d and we are at a search tree node n with current

subset sum difference d′, sum of remaining numbers s and

d′ > s.

If d′ − s ≥ d then there is no need to explore the tree below n

because the difference d′ − s will not be better than d.

If d′ − s < d, we can add the remaining numbers in the smaller

sum and this is actually a better solution that the current one.

'

&

$

%

Pruning (cont’d)

• If at any point in the search we find a perfect partition then we

terminate the search.

A partition will be called perfect if it gives difference 0 when

the sum of the given numbers is even and 1 if the sum of the

given numbers is odd.

The difference corresponding to a perfect partition is a

lower-bound on any other possible difference.

�� !"��#��$�%�&('*)��+'

&

$

%

Pruning (cont’d)

• The first number should be assigned only to one subset.

• The last number should only be assigned to the smallest subset.

• When the two current subsets have equal sums, the next

number should only be assigned to one subset.

Note: The above algorithm does not correspond exactly to the

algorithm BranchAndBound as given earlier. Thus

branch-and-bound should be understood to be a family of

algorithms with the features we presented (as opposed to a single

fixed algorithm).

'

&

$

%

Readings

• Chapter 4 of AIMA (Sections 4.1 and 4.2).

• Section 18.2 from

Christos Papadimitriou and Kenneth H. Steiglitz.

Combinatorial Optimization - Algorithms and

Complexity. Prentice-Hall, 1982.

�� !"��#��$�%�&('*)��+'

&

$

%

Local Search and Optimization Problems

• Hill-climbing

• Simulated annealing

• Local beam search

• Genetic Algorithms

'

&

$

%

Local Search Algorithms

In many optimization problems the path to a goal state is

irrelevant. The goal state itself is the solution.

Example:

• Finding a configuration satisfying certain constraints, e.g., the

8-queens problem or a job-shop scheduling problem.

In such cases, we can use iterative improvement: start with a

single current state, and try to improve it!

The same framework is applicable to problems where the path

appears to be of interest (e.g., TSP) if these problems can be

casted in a more appropriate (but equivalent) way.

�� !"��#��$�%�&('*)��+'

&

$

%

Local Search Algorithms (cont’d)

Local search algorithms work as follows:

• Pick a “solution” from the search space and evaluate it. Define

this as the current solution.

• Apply a transformation to the current solution to generate and

evaluate a new solution.

• If the new solution is better than the current solution the

exchange it with the current solution; otherwise discard the

new solution.

• Repeat steps 2 and 3 until no transformation in the given set

improves the current solution.

'

&

$

%


Thus local search algorithms operate using a single current state

(rather than multiple paths as e.g., A∗) and generally move only to

neighbours of that state.

At each step of a local search algorithm we have a complete but

imperfect solution to a search problem. Other algorithms we saw

previously (e.g., A∗) work with partial solutions and extend

them to complete ones.

Good properties of local search algorithms:

• Constant space

• Suitable for on-line as well as off-line problems.

• Can find reasonable solutions in large solution spaces

where exhaustive search would fail miserably.

�� !"��#��$�%�&('*)��+'

&

$

%

Iterative Improvement Algorithms

Idea: Start with a “solution” and make modifications until you

reach a solution. Graphically:

evaluation

currentstate

'

&

$

%

Example: TSP

TSP: Let G be a (directed or undirected) graph of n nodes with

each edge assigned a non-negative cost. Find the lowest-cost path

of G that visits each node only once and returns to a given initial

node.

�� !"��#��$�%�&('*)��+'

&

$

%

Algorithm 2-Opt

Let us view the solution set for TSP as the set of permutations of

the n cities.

Algorithm:

Start with an arbitrary complete tour T (i.e., a random

permutation).

The neighbourhood of T is defined as the set of all tours that can

be reached by exchanging two non-adjacent edges (this move is

called a 2-interchange).

Search in the neighbourhood of T for a new tour T ′. If this tour is

better than T (i.e., it has lower cost), then replace T with T ′.

If you cannot find a better tour, terminate. The resulting

permutation is called 2-optimal.

'

&

$

%

Example: 4-Queens

Idea: Start with 4 queens placed arbitrarily on the board. Then repeatedly

move a single queen to another square within its column.

�� !"��#��$�%�&('*)��+'

&

$

%

The State Space Landscape

currentstate

objective function

state space

global maximum

local maximum

"flat" local maximum

shoulder

'

&

$

%

Hill-Climbing Search (Gradient Steepest Ascent)

function Hill-Climbing(problem)

returns a state that is a local maximum

inputs: problem, a problem

local variables: current, a node

neighbour, a node

current← MakeNode(RandomState[problem])

loop do

next← a highest-valued successor of current

if Value[neighbour] ≤ Value[current] then return current

current← neighbour

end

�� !"��#��$�%�&('*)��+'

&

$

%

Hill-Climbing Search (cont’d)

• Successors are searched in a systematic way.

• When choosing a highest-valued successor, break ties randomly.

• Change “highest-valued” to “lowest-valued” and ≤ to ≥ to get

“gradient steepest descent” (applicable to minimization

problems).

'

&

$

%

Example: 8-queens


• States: any arrangement of 8 queens on board

• Actions: Move a queen within its column


• Evaluation function (cost): Number of pairs of queens that

attack each other.

Thus we have a minimization problem: find a state that

minimizes the evaluation function.

�� !"��#��$�%�&('*)��+'

&

$

%

Example: 8-queens (cont’d)

14

18

17

15

14

18

14

14

14

14

14

12

16

12

13

16

17

14

18

13

14

17

15

18

15

13

15

13

12

15

15

13

15

12

13

14

14

14

16

12

14

12

12

15

16

13

14

12

14

18

16

16

16

14

16

14

The value of cost for the above state is 17. The numbers in the squares

show the new costs if a queen is moved within its column.

'

&

$

%

Hill-Climbing (cont’d)

Problems:

• Local optima

• Plateaux (flat local optima or shoulders)

• Ridges

How can we cope with these problems? The proper choice might be

problem dependent.

�� !"��#��$�%�&('*)��+'

&

$

%

Example: 8-queens (cont’d)

The value of cost for the above state is 1. All the neighbours of this

state have cost > 1 thus we have a local minimum.

'

&

$

%

Hill-Climbing (cont’d)

currentstate

objective function

state space

global maximum

local maximum

"flat" local maximum

shoulder

�� !"��#��$�%�&('*)��+'

&

$

%

Example: Ridges

'

&

$

%

Hill-Climbing for 8-queens

Let us start with a randomly generated 8-queens state. Then,

steepest ascent hill-climbing performs as follows:

• It solves 14% of the problems within 4 steps on average.

• It gets stuck in local optima 86% of the time within 3 steps on

average.

Reminder: Total state space 88 ≈ 17 million states.

�� !"��#��$�%�&('*)��+'

&

$

%

Sideways Moves

When hill-climbing reaches a plateau and there are no uphill moves

then it stops. Alternatively, we could resort to a sideways move:

a move to a state which has the same value as the current one.

However, we have to be careful so that we do not go into an infinite

loop (i.e., when we are on a plateau that is not a shoulder). An

idea that works in some cases is to limit the number of consecutive

sideways moves.

Example: If we limit the number of consecutive sideways moves to

100 in the 8-queens problem, this raises the percentage of solved

problems to 94%.

'

&

$

%

Variations of Hill-Climbing

• First-choice hill climbing: Generates successors randomly

until one is generated that is better than the current state.

This is a good strategy for states with many (e.g., thousands)

of successors.

• Stochastic hill climbing: Chooses randomly among the

uphill moves. The probability of selection can vary with

steepness.

This variation converges more slowly than steepest ascent, but

in some state landscapes it finds better solutions.

�� !"��#��$�%�&('*)��+'

&

$

%

How do we Avoid Local Optima?

We will present two algorithms that avoid local optima:

• Random-Restart Hill-Climbing

• Simulated Annealing

'

&

$

%

Random-Restart Hill-Climbing

Advice: If at first you don’t succeed, try, try again!

Random-restart hill-climbing conducts a series of hill-climbing

searches from randomly generated initial states, stopping when a

goal is found.

With probability approaching 1, we will eventually generate a goal

state as the initial state.

If each hill-climbing search has a probability p of success, then the

expected number of restarts required to reach a solution is 1/p.

�� !"��#��$�%�&('*)��+'

&

$

%

Random-Restart Hill-Climbing (cont’d)

Example: 8-queens

p ≈ 0.14

In this case we need roughly 7 iterations (6 failures and 1 success).

Expected number of steps: p times the number of steps of a

successful iteration plus (1− p)/p times the number of steps of a

failed iteration. This is roughly 22 steps in our case.

The success of random-restart hill-climbing depends very much on

the shape of the state space (there are practical problems with

state spaces with very bad shapes).

'

&

$

%

Simulated Annealing

A hill-climbing algorithm that never makes “downhill” moves

towards states with lower value can be incomplete.

A random walk, i.e., moving to a successor chosen uniformly at

random from the set of successors, is complete (proof?) but

extremely inefficient.

How can we combine both?

This is a classical tradeoff between exploration of the search space

and exploitation of the imperfect solution at hand. How do we

resolve this tradeoff?

�� !"��#��$�%�&('*)��+'

&

$

%

Simulated Annealing (cont’d)

Physical analogue: Annealing of metals is the process used to

temper or harden metals by heating them to a high temperature and

then gradually cooling them, thus allowing the material to coalesce into

a low-energy crystalline state.

The discovery of the simulated annealing algorithm is an instance of the

use of ideas from statistical mechanics (an area of condensed matter

physics) to solving large and complex optimization problems.

Statistical mechanics concentrates on analyzing aggregate properties of

large numbers of atoms to be found in samples of liquid or solid matter.

See the paper on simulated annealing by Kirkpatrick et. al. in Science,

Volume 220, Number 4598, May 1983.

'

&

$

%

Statistical Mechanics and Optimization

Physical System Optimization Problem

state feasible solution

energy evaluation function

ground state optimal solution

quenching local search

temperature control parameter T

careful annealing simulated annealing

�� !"��#��$�%�&('*)��+'

&

$

%


Simulated annealing solves the tradeoff among exploration and

exploitation as follows.

At every iteration, a random move is chosen. If it improves the

situation then the move is accepted, otherwise it is accepted with some

probability less than 1.

The probability decreases exponentially with the badness of the move.

It also decreases with respect to a temperature parameter T .

Simulated annealing starts with a high value of T and then T is

gradually reduced. At high values of T , simulated annealing is like pure

random search. Towards the end of the algorithm when the values of T

are quite small, simulated annealing resembles ordinary hill-climbing.

'

&

$

%


function Simulated-Annealing(problem, schedule)

returns a solution state

inputs: problem, a problem

schedule, a mapping from time to “temperature”

local variables: current, a node next, a node T , the temperature

current← MakeNode(RandomState[problem])

for t← 1 to ∞ do

T ← schedule[t]

if T = 0 then return current

next← a randomly selected successor of current

∆E ← Value[next]− Value[current]

if ∆E > 0 then current← next

else current← next only with probability e∆E/T

�� !"��#��$�%�&('*)��+'

&

$

%

Example

Let us assume that the current and next point in a search space differ by 13

(i.e., ∆E = −13). Then:

T e∆E/T

1 0.000002

5 0.0743

10 0.2725

20 0.52

50 0.77

1010 0.9999...

Thus, at high values of T , simulated annealing behaves like a random walk;

at low values of T , it behaves like hill-climbing.

'

&

$

%


Simulated annealing finds a global optimum with probability

approaching 1 if the schedule lowers T slowly enough.

The exact bound for parameter t and schedule for T is usually

problem dependent. Thus we need to experiment heavily with

every new problem at hand to see whether simulated annealing

makes a difference.

Simulated annealing is a very popular algorithm and has been

used to solve various classes of interesting optimization problems

(e.g., VLSI layout problems, job-shop scheduling problems etc.)

�� !"��#��$�%�&('*)��+'

&

$

%

Local Beam Search

Idea: Why not keep more than just one state (e.g., k states) in

memory?

At each iteration, all the successors of the k states are generated. If

one of them is a solution then we halt. Otherwise k states are

selected and the process is repeated. We expect good successors to

“attract the attention”.

Diversity is important so we don’t get stuck in bad regions of the

search space.

Stochastic beam search chooses k successors at random, with

the probability of choosing a successor being an increasing function

of its value.

Similar to natural selection?

'

&

$

%

Genetic Algorithms

A genetic algorithm (GA) is a variant of stochastic beam search

in which successor states are generated by combining two parent

states (sexual reproduction).

Concepts:

• Individuals represent states. They are denoted by strings over

an alphabet usually {0, 1}.

• Population are sets of individuals.

• Fitness function is an evaluation function for rating each

individual.

�� !"��#��$�%�&('*)��+'

&

$

%

Genetic Algorithms

Operations:

• Reproduction: a new individual is born by combining two

parents.

• Mutation: a new individual is slightly modified.

'

&

$

%

Example: 8-queens

+ =

�� !"��#��$�%�&('*)��+'

&

$

%

Example: 8-queens

32252124

(a)Initial Population

(b)Fitness Function

(c)Selection

(d)Cross−Over

(e)Mutation

24748552

32752411

24415124

24

23

20

32543213 11

29%

31%

26%

14%

32752411

24748552

32752411

24415124

32748552

24752411

32752124

24415411

24752411

32748152

24415417

'

&

$

%

A Genetic Algorithm

function Genetic-Algorithm(population, Fitness-Fn) returns an individual

inputs: population, a set of individuals

Fitness-Fn, a function that measures the fitness of an individual

repeat

new population← ∅

loop for i from 1 to Size(population) do

x← Random-Selection(population,Fitness-Fn)

y ← Random-Selection(population,Fitness-Fn)

child← Reproduce(x, y)

if (small random probability) then child← Mutate(child)

add child to new population

population← new population

until some individual is fit enough, or enough time has elapsed

return the best individual in population, according to Fitness-Fn

�� !"��#��$�%�&('*)��+'

&

$

%

Genetic Algorithms (continued)

Intuitively the advantage of genetic algorithms comes form the

ability of crossover to combine large blocks of letters that have

evolved independently to produce useful functions.

The theory of genetic algorithms explains how this works using the

idea of a schema.

Example: 246*****

Representation of instances is very important in genetic algorithms.

'

&

$

%

Readings

• Chapter 4, Section 4.3 of AIMA.

• Parts of Sections 3 and 5 of the book:

Z. Michalewicz and D. B. Fogel. How to Solve it:

Modern Heuristics. Springer, 2000.

�� !"��#��$�%�&('*)��+'

&

$

%

Constraint Satisfaction Problems

• Constraint satisfaction problems

• Backtracking algorithms for CSP

• Heuristics

• Local search for CSP

• Problem structure and difficulty of solving

'

&

$

%

Search Problems

The formalism of search problems we have discussed so far is a very

powerful formalism that depends on the notion of state.

From the point of view of a search algorithm, a state is a black

box with no discernible internal structure.

A state can be represented by an arbitrary data structure and can

be accessed only by problem specific functions: successor, goal test,

heuristics etc.

�� !"��#��$�%�&('*)��+'

&

$

%

Constraint Satisfaction Problems

The framework we will now present (constraint satisfaction

problems) admits a very simple standard representation.

This allows us to define search algorithms that take advantage of

this very simple representation and use general purpose

heuristics to enable solution of large problems.

The simple structure also allows us to define methods for problem

decomposition and offers us an intimate connection between the

structure of a problem and the difficulty of solving it.

'

&

$

%

Constraint Satisfaction Problems - Definitions

A constraint satisfaction problem (CSP) is defined by:

• A set of variables X1, . . . , Xn. Each variable has a domain Di

of possible values.

• A set of constraints C1, . . . , Cm. Each constraint involves

some subset of the variables and specifies the allowable

combinations of values for that subset.

Formally, an (k-ary) constraint C on a set of variables

X1, . . . , Xk is a subset of the Cartesian product D1 × · · · ×Dk.

�� !"��#��$�%�&('*)��+'

&

$

%

Definitions (cont’d)

• A solution to a CSP is a complete assignment of values to

variables such that all the constraints are satisfied.

• A CSP is called consistent it has a solution, otherwise it is

called inconsistent.

'

&

$

%

Example: Map Coloring

WesternAustralia

NorthernTerritory

SouthAustralia

Queensland

New South Wales

Victoria

Tasmania

�� !"��#��$�%�&('*)��+'

&

$

%

Example: Formal Definition

• Variables: WA,NT, SA,Q,NSW, V, T

• Domain (same for all variables): { red, green, blue }

• Constraints:

C(WA,NT ) = { (red, green), (red, blue), (green, red),

(blue, red), (blue, green) }

More succinctly WA 6= NT . Similarly for the other pairs of

variables.

'

&

$

%

Constraint Graphs

Victoria

WA

NT

SA

Q

NSW

V

T

�� !"��#��$�%�&('*)��+'

&

$

%

Example:The 8-queens problem

'

&

$

%


• Variables:

Let variable Xi (i = 1, . . . , 8) represent the column that the

i-th queen occupies in the i-th row. If columns are represented

by numbers 1 to 8 then the domain of every variable Xi is

Di = {1, 2, . . . , 8}.

�� !"��#��$�%�&('*)��+'

&

$

%


• Constraints:

There is a binary constraint C(Xi, Xj) for each pair of

variables. These constraints can be specified succinctly as

follows:

– For all variables Xi and Xj , Xi 6= Xj .

– For all variables Xi and Xj , if Xi = a and Xj = b then

i− j 6= a− b and i− j 6= b− a.

'

&

$

%

Example: Cryptarithmetic

(a)

OWTF U R+

OWTOWT

F O U R

X2 X1X3

(b)

�� !"��#��$�%�&('*)��+'

&

$

%


• Variables and domains:

F, T, U,W,R,O ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

X1, X2, X3 ∈ {0, 1}

• Constraints:

alldiff(F, T, U,W,R,O)

O + O = R+ 10X1

X1 +W +W = U + 10X2

X2 + T + T = O + 10X3

X3 = F

'

&

$

%

Constraints in Real Life

In many practical problems there are soft constraints in addition

to hard constraints. Soft constraints encode preferences.

Example: Timetabling for a University. What are the hard

constraints? What are the soft constraints?

�� !"��#��$�%�&('*)��+'

&

$

%

Other Examples of CSPs

• The satisfiability problem in Boolean logic (also a CSP with

finite domains).

• Temporal reasoning (infinite domains).

• Timetabling

• Job-shop scheduling, airline crew scheduling

• Spatial reasoning

• Integer, linear and non-linear programming (operations

research).

'

&

$

%

CSP Technology: Practical, Successful and AI!

CSPs are certainly the most successful example of ideas from AI

with many of applications. There are currently several companies

marketing such technology:

• www.ilog.com

• www.cosytec.com

• www.parc-technologies.com

• ...

�� !"��#��$�%�&('*)��+'

&

$

%

A Taxonomy of CSPs

• Discrete vs. continuous variables

• Finite vs. infinite domains

• Explicit enumeration of allowed combinations of values vs.

constraint languages

• Linear vs. non-linear constraints

• Unary vs. binary vs. ... constraints

In the rest of this presentation, we will concentrate on search

algorithms for binary finite domain CSPs.

It is possible to reduce every higher-order finite-domain constraint

to a set of binary constraints if enough auxiliary variables are

introduced.

'

&

$

%

Search Algorithms for CSPs

Let us apply a general search algorithm to CSP:

• Initial state: all variables are unassigned.

• Actions: Assign to any unassigned variable Xi any value from

Di.

• Goal test: All variables are assigned and the constraints are

satisfied.

The branching factor in this case is∑n

i=1|Di| where |Di| is the

cardinality of domain Di. The last level of the tree has

(∑n

i=1|Di|)

n nodes.

�� !"��#��$�%�&('*)��+'

&

$

%

Search Algorithms for CSPs (cont’d)

Better approach: Order the variables! CSPs are commutative

search problems i.e., the order of application of actions does not

matter.

Characteristics:

• The size of the search space is finite. If the variables are

ordered as X1, . . . , Xn the number of nodes in the search tree is

1 +∑n

i=1(|D1| · · · |Di|).

When is the number of nodes in the search tree maximal?

Minimal?

• The depth of the search tree is fixed.

• There are similar subtrees.

'

&

$

%


Which of the search algorithms presented so far is appropriate for

solving CSPs:

• BFS?

No! BFS will not be effective because goal states are located at

the leaves of the search tree.

• DFS?

Better than BFS. But it wastes time searching when

constraints are already violated.

�� !"��#��$�%�&('*)��+'

&

$

%


We will present variants of DFS for CSPs. These algorithms are

based on the idea of backtracking search: choose values for one

variable at a time, and backtrack when there are no more legal

values to assign.

We will see the following algorithms and their variants/hybrids:

• Simple or chronological backtracking (BT)

• Forward checking (FC)

• Backjumping (BJ)

• Conflict-directed Backjumping (CBJ)

• Maintaining Arc Consistency (MAC)

'

&

$

%

BT in Operation: Example

WA=red WA=blueWA=green

WA=redNT=blue

WA=redNT=green

WA=redNT=greenQ=red

WA=redNT=greenQ=blue

�� !"��#��$�%�&('*)��+'

&

$

%

Chronological Backtracking (BT)

The basic idea in any backtracking algorithm is to start with a partial

solution and to extend it until we reach a complete solution.

BT follows this general method. Additionally, when it reaches a dead-end,

it always backtracks to the last decision made (hence its name!).

function Backtracking-Search(csp)


return Recursive-Backtracking({}, csp)

'

&

$

%

BT (cont’d)

function Recursive-Backtracking(assignment, csp)


if assignment is complete then return assignment

var ← Select-Unassigned-Variable(Variables[csp], assignment, csp)

for each value in Order-Domain-Values(var, assignment, csp) do

if value is consistent with assignment according to Constraints(csp)

then

add {var = value} to assignment

result← Recursive-Backtracking(assignment, csp)

if result 6= failure then return result

remove {var = value} from assignment

return failure

�� !"��#��$�%�&('*)��+'

&

$

%

BT (cont’d)

Evaluation:

• Complete? Yes

• Time: O(dne) where d is the maximum domain size, n is the

number of variables, and e is the number of constraints.

• Space: O(nd). This is the amount of space needed for storing

the domains of the variables.

The above time and space complexity bounds are based on the

assumption that constraints can be stored using a constant amount

of space, and constraint checks can be done in constant time.

'

&

$

%

BT (cont’d)

Backtracking can be improved if we give clever answers to the

following questions:

1. Which variable should be assigned next, and in what order

should its values be tried?

2. What are the implications of the current variable assignments

to the other unassigned variables?

3. When a path fails, can the search avoid this failure in

subsequent paths?

�� !"��#��$�%�&('*)��+'

&

$

%

Variable Ordering Heuristics

By default the function Select-Unassigned-Variable selects

the next unassigned variable from the list Variables[csp]. This is

static variable assignment and seldom results in efficient search.

The minimum remaining values (MRV) heuristic: choose the

variable with the fewest remaining legal values (dynamic test).

Question: In the map coloring problem for Australia, what is the

variable to be assigned after the assignments

WA = red, NT = green

have been made?

Answer: SA because only the value blue is possible.

'

&

$

%


WesternAustralia

NorthernTerritory

SouthAustralia

Queensland

New South Wales

Victoria

Tasmania

�� !"��#��$�%�&('*)��+'

&

$

%

Variable Ordering Heuristics (cont’d)

For coloring the map of Australia, how do we start our search?

The degree heuristic: choose the variable involved in the largest

number of constraints with other unassigned variables (static

test).


variable to be assigned first?

Answer: SA that has degree 5. All the other variables have degree

2 or 3 except of T that has degree 0.

'

&

$

%

Variable Ordering Heuristics (cont’d)

MRV is usually more powerful than the degree. They can be used

together with the latter one playing the role of a tie-breaker when

the first one cannot make a distinction.

We will use just MRV to refer to this combination of heuristics.

Both heuristics enforce the well-known fail-first principle.

�� !"��#��$�%�&('*)��+'

&

$

%

Evaluation

Problem BT BT+MRV FC FC+MRV Min-con

USA (>1000K) (>1000K) 2K 60 64

n-Queens (>40000K) 13500K (>40000K) 817K 4K

Zebra 3859K 1K 35K 0.5K 2K

'

&

$

%

Value Ordering Heuristics

Once a variable has been selected, the algorithm must decide on

the order in which to examine values.

The least-constraining-value heuristic (LCV): prefer the value

that rules out the fewest choices for neighbouring variables in the

constraint graph. In other words, leave maximum flexibility for

subsequent variable assignments.

LCV is not useful if we are looking for all solutions or the problem

has no solution.

�� !"��#��$�%�&('*)��+'

&

$

%

Value Ordering Heuristics (cont’d)


value to be assigned to variable Q after the assignments

WA = red, NT = green

have been made?

Answer: We can only have blue or red.

The value blue would be a bad choice according to LCV: it

eliminates the last legal value for SA.

red is better because it has only 1 conflict: with value red for

NSW .

'

&

$

%


WesternAustralia

NorthernTerritory

SouthAustralia

Queensland

New South Wales

Victoria

Tasmania

�� !"��#��$�%�&('*)��+'

&

$

%

Constraint Propagation

The main idea behind constraint propagation is to consider the

given constraints early in the search or even before the search has

started!

For example, we can prune the search space by examining the

consequences of partial assignments. The algorithms that have

been proposed to achieve this are sometimes referred to as

look-ahead algorithms.

'

&

$

%

Forward Checking

Forward Checking (FC) belongs to the family of backtracking

algorithms based on constraint propagation and maintains the

following invariant:

For every unassigned variable, there exists at least one

value in its domain which is compatible with the values

that have been assigned to other variables.

�� !"��#��$�%�&('*)��+'

&

$

%

Forward Checking

FC works as follows: every time a value v is assigned to a variable,

FC will remove all values which are inconsistent with v from the

domains of the unassigned variables. If the domain of any of the

unassigned variables is reduced to an empty set, then v will be

rejected.

FC is typically used together with the MRV heuristic since

all the machinery required for the implementation of MRV is used

by FC as well.

'

&

$

%

FC in Operation: Example

R G B R G B R G B R G B R G B R G B R G BR G B R G B R G B R G B G B R G BR B G R B R G B B R G BR B G R B R G B

Initial domainsAfter WA=redAfter Q=greenAfter V=blue

WA NT Q NSW V SA T

�� !"��#��$�%�&('*)��+'

&

$

%

Evaluation


USA (>1000K) (>1000K) 2K 60 64

n-Queens (>40000K) 13500K (>40000K) 817K 4K

Zebra 3859K 1K 35K 0.5K 2K

'

&

$

%

Example: Map Coloring with FC

WesternAustralia

NorthernTerritory

SouthAustralia

Queensland

New South Wales

Victoria

Tasmania

WA = red, Q = green implies NT = blue, SA = blue.

This inconsistency is not detected by FC.

�� !"��#��$�%�&('*)��+'

&

$

%

Example (cont’d)

The partial assignment

WA = red, Q = green

together with the problem constraints

WA 6= NT, WA 6= SA, Q 6= NT, Q 6= SA

WA,Q ∈ {red, blue, green}

imply

NT = blue, SA = blue.

These implied constraints together with the problem constraint

NT 6= SA are inconsistent.

FC’s machinery do not allow it to make this second constraint

propagation step.

'

&

$

%

Arc Consistency

We can improve this behavior of FC by devising more sophisticated

propagation steps. For example, the constraint propagation step of

forward checking can actually become stronger by using the

concept of arc consistency.

Definition. Let X,Y be variables of a CSP P and (X,Y ) be a

directed arc in the constraint graph for P . The arc (X,Y ) is

called consistent if for every value x of X, there is some value y of

Y such that x is consistent with y.

�� !"��#��$�%�&('*)��+'

&

$

%

Arc Consistency in Operation: Example

R G B R G B R G B R G B R G B R G B R G BR G B R G B R G B R G B G B R G BR B G R B R G B B R G BR B G R B R G B

Initial domainsAfter WA=redAfter Q=greenAfter V=blue

WA NT Q NSW V SA T

'

&

$

%

Example (cont’d)

Consider the third row of the previous table for the problem of coloring

the map of Australia using FC. If the current domains of nodes SA and

NSW are { blue } and { red, blue } then arc (SA,NSW ) is

consistent.

Arc (NSW,SA) is inconsistent because assignment NSW = blue does

not have a consistent assignment for SA. In this case we should delete

the value blue from the domain of NSW to make the arc consistent.

�� !"��#��$�%�&('*)��+'

&

$

%

Example (cont’d)

Now consider the arc (SA,NT ). This arc is inconsistent. To make

the arc consistent, we remove the value blue from the domain of SA,

leaving this domain empty.

Thus applying arc consistency resulted in earlier detection of an

inconsistency during search (a path that would not lead to a

solution).

'

&

$

%

Arc Consistency (cont’d)

In a backtracking algorithm arc consistency can be applied as:

• Preprocessing step before the search starts.

• Constraint propagation step after each assignment

repeatedly until no more arc inconsistencies remain. This

algorithm is known as MAC which stands for maintaining

arc consistency.

Example: If we use MAC after the assignment

WA = red, Q = green

in our map coloring example, we immediately discover the

impossibility of extending this assignment because arc (SA,NT ) is

inconsistent.

�� !"��#��$�%�&('*)��+'

&

$

%

The Algorithm AC-3

function AC-3( csp) returns the CSP, possibly with reduced domains

inputs: csp, a binary CSP with variables {X1, X2, . . . , Xn}

local variables: queue, a queue of arcs, initially all the arcs in csp

while queue is not empty do

(Xi, Xj)←Remove-First(queue)

if Remove-Inconsistent-Values(Xi, Xj) then

for each Xk in Neighbors[Xi] do

add (Xk, Xi) to queue

'

&

$

%

The Algorithm AC-3 (cont’d)

function Remove-Inconsistent-Values(Xi, Xj) returns true

iff we remove a value

removed← false

for each x in Domain[Xi] do

if no value y in Domain[Xj] allows (x,y) to satisfy

the constraint between Xi and Xj then

delete x from Domain[Xi]; removed← true

return removed

�� !"��#��$�%�&('*)��+'

&

$

%

Stronger Notions of Consistency

There are stronger notions of consistency (3-consistency,

4-consistency ..., n-consistency) that can be employed in

backtracking algorithms in a similar way.

At each step of the search process, these consistency steps bring

into the light the implications of the problem constraints by

examining 3 variables, 4 varialbes, ..., n variables at a time.

Important: The cost of constraint propagation steps needs

to be carefully balanced with their benefits. One way to determine

this is by carrying out experiments.

'

&

$

%

Intelligent Backtracking

Backtracking algorithms can become more effective by backtracking

in a more sophisticated way (intelligently!). The techniques

proposed to achieve this are sometimes referred to as look-back

techniques.

The idea here is to avoid backtracking chronologically like BT,

but rather backtrack in a clever way to the actual cause of the

failure.

�� !"��#��$�%�&('*)��+'

&

$

%

Chronological Backtracking

WesternAustralia

NorthernTerritory

SouthAustralia

Queensland

New South Wales

Victoria

Tasmania

'

&

$

%

Chronological Backtrakcing (cont’d)

Let us consider BT in the problem of map coloring with a fixed

variable ordering: Q,NSW, V, T, SA,WA,NT . Suppose we have

generated the following partial assignment:

Q = red, NSW = green, V = blue, T = red

When we try the next variable, SA, we see that every value

violates a constraint. Now BT tell us to backtrack and try a new

color for Tasmania! This is not a good idea!

�� !"��#��$�%�&('*)��+'

&

$

%

Backjumping

Backjumping (BJ) is an intelligent backtracking algorithm.

When a dead-end occurs at variable x, BJ does not backtrack to

the previous variable like BT. Instead, it backtracks to the deepest

variable in the search tree (also called the most recent variable)

which caused a value in the domain of x to be eliminated.

The set of these variables is called the conflict set of x.

'

&

$

%

Example with BJ

WesternAustralia

NorthernTerritory

SouthAustralia

Queensland

New South Wales

Victoria

Tasmania

�� !"��#��$�%�&('*)��+'

&

$

%

BJ (cont’d)

Let us consider BJ in the problem of map coloring with a fixed

variable ordering:

Q,NSW, V, T, SA,WA,NT

Suppose we have generated the following partial assignment:

Q = red, NSW = green, V = blue, T = red

When we try the next variable, SA, we see that every value

violates a constraint.

The variables that cause elimination of all possible values for SA

are {Q, NSW, V }. Now BJ tells us to backtrack to V .

'

&

$

%

BJ vs. FC

When backjumping occurs, all values of a domain are in conflict

with the current assignment. This would have already been

detected by FC!

Proposition. Every branch of a search tree pruned by BJ is also

pruned by FC.

Thus BJ is redundant in a search using FC or a stronger constraint

propagation algorithm such as MAC.

�� !"��#��$�%�&('*)��+'

&

$

%

Example

WesternAustralia

NorthernTerritory

SouthAustralia

Queensland

New South Wales

Victoria

Tasmania

'

&

$

%

From BJ to CBJ

Let us consider again BJ in the problem of map coloring with the

fixed variable ordering

WA,NSW, T,NT,Q, V, SA.

Suppose we have generated the following partial assignment:

WA = red, NSW = red

This assignment cannot lead us to a solution.

But let us assign T = red, and continue with NT,Q, V, SA. This is

not going to work and, eventually, we run out of values at NT .

Where should we backtrack?

�� !"��#��$�%�&('*)��+'

&

$

%

From BJ to CBJ (cont’d)

BJ cannot tell us anything useful because the conflict set of NT is

empty (i.e., NT has values that are consistent with all previous

variables).

In this case, it is really the variables

NT,Q, V, SA

taken together that conflict with the previous variables.

This leads to a deeper notion of a conflict set of a variable x: it is

that set of preceding variables that caused x, together with any

subsequent variables, to lead to failure.

Under this definition, the conflict set for NT is {WA,NSW} and

we should backtrack to NSW .

This is the idea behind conflict-directed backjumping.

'

&

$

%

Conflict-Directed Backjumping (CBJ)

In CBJ every variable has a conflict set as in BJ.

When a consistency check between an instantiation vi of the

current variable xi and an instantiation vk of a previously assigned

variable xk fails, then xk is added to the conflict set of xi.

When there are no more values to be tried for xi, CBJ backtracks

to the deepest variable in the conflict set of xi. At the same time,

the variables in the conflict set of xi (except of xk) are added to

the conflict set of xk so that no information about conflicts is

lost. This is the important difference with BJ.

�� !"��#��$�%�&('*)��+'

&

$

%

Hybrid Algorithms

The ideas presented in the previous algorithms can be combined to

create hybrid algorithms e.g., FC-CBJ or MAC-CBJ.

Heuristics are usually combined with these hybrid algorithms as

well.

'

&

$

%

Evaluating Backtracking Algorithms

Criteria:

1. Worst case time/space complexity

2. Run time

3. Number of nodes visited in the search tree

4. Number of consistency checks performed

5. Number of backtracks

�� !"��#��$�%�&('*)��+'

&

$

%

Evaluating Backtracking Algorithms (cont’d)

Results:

• BT, FC, BJ, CBJ, MAC and their variants have exponential

worst case time complexity.

• Run time can vary depending on implementation details.

• Visited nodes

FC-CBJ ≤ FC-BJ ≤ FC ≤ BJ ≤ BT

CBJ ≤ BJ

MAC-CBJ ≤ MAC-BJ ≤ MAC

MAC ≤ FC

'

&

$

%

Evaluating Backtracking Algorithms (cont’d)

• Consistency checks

CBJ ≤ BJ ≤ BT

FC-CBJ ≤ FC-BJ ≤ FC

FC may perform more or less consistency checks than BJ and

BT depending on the problem.

• Experimental results have shown that in most cases a good

constraint propagation algorithm (like MAC or FC) with a

good set of heuristics (like MRV and LCV) can go a long way

in solving difficult CSP problems.

�� !"��#��$�%�&('*)��+'

&

$

%

Local Search Algorithms

All the search algorithms we have presented up to now (for general

search problems or CSPs) are systematic: they explore the search

space carefully keeping track of each path explored until they find a

solution.

We have already seen how to solve n-queens problems using local

search. Can we solve arbitrary CSPs using local search

algorithms?

'

&

$

%


Idea: Start with a “solution” and make modifications until you

reach a solution. Graphically:

evaluation

currentstate

�� !"��#��$�%�&('*)��+'

&

$

%

Local Search Algorithms for CSPs

Local search is particularly useful for CSPs. A local search algorithm starts

with a random assignment of values to variables and then modifies

(repairs) this assignment until it becomes a solution.

These algorithms are also referred to as heuristic repair algorithms in the

literature.

We have already seen the application of hill-climbing to the 8-queens

problem.

'

&

$

%

The Min-Conflicts Heuristic

In choosing a new value for a variable, a useful heuristic is to

choose the one that would cause the minimum number of

conflicts with the current assignment to the other

variables.

The above heuristic is called the min-conflicts heuristic and it is

surprisingly powerful for many CSPs.

�� !"��#��$�%�&('*)��+'

&

$

%

Min-Conflicts and N-Queens

When given a reasonable initial state, it can solve problems with

millions of queens in around 50 steps (in fact, its running time is

roughly independent of the problem size).

We can compute an initial state by iterating through the rows,

placing each queen in the column where it conflicts with the least

number of previously placed queens.

The repair can be accomplished in O(n) time by keeping a list of

all queens that are in conflict (i.e., are attacked by others) together

with counters showing the number of attacking queens for each

alternative position of these queens.

'

&

$

%

Min-Conflicts

function Min-Conflicts(csp,max-steps) returns a solution or failure

inputs: csp, a constraint satisfaction problem

max-steps, the number of steps allowed before giving up

local variables: current, a complete assignment

var, a variable

value, a value for a variable

current← an initial complete assignment for csp

for i = 1 to max-steps do

if current is a solution for csp then return current

var← a randomly chosen, conflicted variable from Variables[csp]

value← the value v for var that minimizes Conflicts(var, v, current, csp)

set var=value in current

return failure

�� !"��#��$�%�&('*)��+'

&

$

%

Example

2

2

1

2

3

1

2

3

3

2

3

2

3

0

'

&

$

%

Evaluation


USA (>1000K) (>1000K) 2K 60 64

n-Queens (>40000K) 13500K (>40000K) 817K 4K

Zebra 3859K 1K 35K 0.5K 2K

�� !"��#��$�%�&('*)��+'

&

$

%

Min-Conflicts and Scheduling Problems

The min-conflicts heuristic has been used in observation scheduling

algorithms for the Hubble telescope reducing the scheduling time

from 3 weeks to 10 minutes!

The importance of local search algorithms in problems such as

scheduling is that on-line re-scheduling is an important

operation. This can be gracefully carried out by local search as well.

The local search techniques we have discussed (hill climbing,

simulated annealing etc.) can also be applied to constraint

optimization problems.

'

&

$

%

The Structure of Problems

There are ways to exploit the structure of a CSP to find

solutions quickly. For example:

• Independent subproblems

• Tree CSPs and directed arc consistency

• Tree decomposition

�� !"��#��$�%�&('*)��+'

&

$

%

Readings

Chapter 5 of AIMA.

'

&

$

%

Knowledge-Based Agents

Knowledge-based agents are best understood as agents that

know about their world and reason about their courses of action.

Basic concepts:

• The knowledge-base (KB): a set of representations of facts

about the world.

• The knowledge representation language: a language

whose sentences represent facts about the world.

�� !"��#��$�%�&('*)��+'

&

$

%

Knowledge-Based Agents (cont’d)

• TELL and ASK interface: operations for adding new

sentences to the KB and querying what is known. This is

similar to updating and querying in databases.

• The inference mechanism: a mechanism for determining

what follows from what has been TELLed to the knowledge

base. The ASK operation utilizes this inference mechanism.

'

&

$

%

A Generic Knowledge-based Agent

function KB-Agent(percept) returns an action

static KB, a knowledge-base

t, a counter, initially 0, indicating time

Tell(KB,Make-Percept-Sentence(percept, t))

action← Ask(KB,Make-Action-Query(t))

Tell(KB,Make-Action-Sentence(action, t))

t← t+ 1

return action

This agent design is similar to the one for agents with internal

state.

�� !"��#��$�%�&('*)��+'

&

$

%

Knowledge-based Agents (cont’d)

We can describe a knowledge-based agent at three levels:

• The knowledge level: In this level the agent is specified by

saying what it knows about the world and what its goals are.

• The logical level: This is the level at which the knowledge is

encoded into sentences of some logical language.

• The implementation level: This is the level where sentences

are implemented. This level runs on the agent architecture.

Note: Declarative vs. procedural way of system building

'

&

$

%

Knowledge-based Agents (cont’d)

Example:

• Knowledge level or epistemological level:

The automated taxi driver knows that Golden Gate Bridge

links San Francisco and Marin County.

• Logical level:

The automated taxi driver has the FOL sentence

Links(GGBridge, SF,Marin) in its KB.

• Implementation level:

The sentence Links(GGBridge, SF,Marin) is implemented by

a Pascal record (or a C structure).

�� !"��#��$�%�&('*)��+'

&

$

%

Knowledge-Based Agents (cont’d)

We can build a knowledge-based agent by TELLing it what it

needs to know before it starts perceiving the world.

We can also design learning mechanisms that output general

knowledge about the environment given a series of percepts.

Autonomous agent=Knowledge-based agent + Learning mechanism

'

&

$

%

The Wumpus World (WW)

Breeze Breeze

Breeze

BreezeBreeze

Stench

Stench

BreezePIT

PIT

PIT

1 2 3 4

1

2

3

4

START

Gold

Stench

�� !"��#��$�%�&('*)��+'

&

$

%

The WW (cont’d)

• Environment: 4x4 grid of rooms with agent, wumpus, gold

and pits.

• Actuators: The agent can move forward, turn left or turn

right. The agent dies if it enters a room with a pit or a live

wumpus.

The agent has action Grab and Shoot (one arrow only) at its

disposal.

• Sensors: The percept is a list of 5 symbols:

(Stench, Breeze, Glitter, Bump, Scream)

Any of the above values can be None.

'

&

$

%

Reasoning and Acting in the WW

ABG

PS

W

= Agent = Breeze = Glitter, Gold

= Pit = Stench

= Wumpus

OK = Safe square

V = Visited

A

OK

1,1 2,1 3,1 4,1

1,2 2,2 3,2 4,2

1,3 2,3 3,3 4,3

1,4 2,4 3,4 4,4

OKOKB

P?

P?A

OK OK

OK

1,1 2,1 3,1 4,1

1,2 2,2 3,2 4,2

1,3 2,3 3,3 4,3

1,4 2,4 3,4 4,4

V

(a) (b)

�� !"��#��$�%�&('*)��+'

&

$

%

Reasoning and Acting in the WW (cont’d)

BB P!

A

OK OK

OK

1,1 2,1 3,1 4,1

1,2 2,2 3,2 4,2

1,3 2,3 3,3 4,3

1,4 2,4 3,4 4,4

V

OK

W!

VP!

A

OK OK

OK

1,1 2,1 3,1 4,1

1,2 2,2 3,2 4,2

1,3 2,3 3,3 4,3

1,4 2,4 3,4 4,4

V

S

OK

W!

V

V V

BS G

P?

P?

(b)(a)

S

ABG

PS

W

= Agent = Breeze = Glitter, Gold

= Pit = Stench

= Wumpus

OK = Safe square

V = Visited

'

&

$

%

KR languages: Syntax and Semantics

A KR language is defined by specifying its syntax and semantics.

The syntax of a KR language specifies the well-formed formulas

and sentences.

The semantics of a KR language defines a correspondence

between formulas/sentences of the language and facts in the world

to which these formulas/sentences refer.

A sentence of a KR language does not mean anything by itself.

The semantics or meaning of a sentence must be provided by its

writer by means of an interpretation.

�� !"��#��$�%�&('*)��+'

&

$

%

Truth and Entailment

Truth. A sentence will be called true under a particular

interpretation if the state of affairs it represents is the case.

Entailment. We will write KB |= α to denote that whenever the

sentences of KB are true, then the sentence α is also true. In this

case we will say that the sentences of KB entail the sentence α.

Given a knowledge-base KB and a sentence α, how do we design

an algorithm that verifies whether KB |= α?

'

&

$

%

Entailment (cont’d)

Follows

Sentences

Facts

Sentence

Fact

Entails

Se

ma

ntic

s

Se

ma

ntic

s

Representation

World

�� !"��#��$�%�&('*)��+'

&

$

%

Inference, proof and proof-theory

Inference is the process of mechanically deriving sentences

entailed by a knowledge-base. If sentence α is derived from KB

using inference mechanism i then we will write KB ì α.

An inference mechanism is called sound if it derives only sentences

that are entailed.

An inference mechanism is called complete if it derives all the

sentences that are entailed.

The steps used to derive a sentence α from a set of sentences KB is

called a proof.

A proof theory is a set of rules for deriving the entailments of a

set of sentences.

'

&

$

%

Logic

The KR languages we will consider will be based on propositional

logic and first-order logic.

In general, a logic is a formal system consisting of:

• Syntax

• Semantics

• Proof theory

Why do we use logical languages for KR? Why don’t we use

natural language or programming languages?

�� !"��#��$�%�&('*)��+'

&

$

%

Propositional Logic (PL): Syntax

The symbols of PL are:

• A countably infinite set of proposition symbols P1, P2, . . ..

This set will be denoted by P.

• The logical connectives ¬, ∧ , ∨ , =⇒ and ⇐⇒ .

• Parentheses: (, ).

Note: Logicians usually introduce only the connectives ¬ and ∨

and define the rest in terms of them.

'

&

$

%

PL: Syntax (cont’d)

The following context-free grammar defines the well-formed

sentences of propositional logic.

Sentence→ AtomicSentence | ComplexSentence

AtomicSentence→ True | False | Symbol

Symbol→ P1 | P2 | · · ·

ComplexSentence → (Sentence) | ¬Sentence

| Sentence BinaryConnective Sentence

BinaryConnective→ ∧ | ∨ | =⇒ | ⇐⇒

Precedence: ¬, ∧ , ∨ , =⇒ and ⇐⇒ .

�� !"��#��$�%�&('*)��+'

&

$

%

PL: Semantics

A proposition symbol can mean anything we want. Its

interpretation can be any arbitrary fact. This fact will be either

true or false in the world. This is not the same in other logics (e.g.,

fuzzy logic!).

This is formalized by introducing the notion of interpretation.

Definition. Let P be the set of proposition symbols. An

interpretation for P is a mapping

I : P → {true, false}.

'

&

$

%

PL: Semantics (cont’d)

The notion of interpretation can be extended to arbitrary

well-formed sentences of PL using the following recursive

definitions:

• I(True) = true.

• I(False) = false.

• I(¬φ) = true if I(φ) = false; otherwise it is false.

• I(φ1 ∧ φ2) = true if I(φ1) = true and I(φ2) = true; otherwise

it is false.

• I(φ1 ∨ φ2) = true if I(φ1) = true or I(φ2) = true; otherwise it

is false.

�� !"��#��$�%�&('*)��+'

&

$

%


• I(φ1 =⇒ φ2) = true if I(φ1) = false or I(φ2) = true;

otherwise it is false.

Explanation: If φ1 and φ2 are both true then most people

would agree that φ1 =⇒ φ2 (φ1 implies φ2) should be true.

Example: For all integers, if x is even then x+ 2 is even. If we

take x to be 6 then this says: “If 6 is even then 6+2 is even”.

But what about cases where the truth value of φ1 is false?

Example: If we take x to be 7 then the above formula says:

“If 7 is even then 7+2 is even”. Is this sentence true or false?

This is an instance of a “false implies false” implication.

'

&

$

%


We will take the above sentence to be true although some of us

might find it disconcerting. It would be wrong to take it to be false

given that the more general sentence of which it is an instance is

true.

We have similar difficulties for “false implying true”.

Example: If 1+1=3 then Athens is the capital of Greece.

The case “true implying false” is easier: most people would

accept such an implication to be false.

Thus we have taken “implication” to have the semantics of

material implication.

�� !"��#��$�%�&('*)��+'

&

$

%


• I(φ1 ⇐⇒ φ2) = true if I(φ1) = I(φ2); otherwise it is false.

'

&

$

%

Compositionality of PL

A language is called compositional when the meaning of a

sentence is a function of the meaning of the parts.

Compositionality is a desirable property in formal languages.

�� !"��#��$�%�&('*)��+'

&

$

%

The Ontological Commitments of PL

Ontological commitments have to do with the nature of reality.

PL assumes that the world consists of facts that either hold

or not hold.

Other logics, for example FOL, make more elaborate and detailed

ontological commitments.

'

&

$

%

Satisfaction and Models

Definition. Let φ be a PL sentence. If I is an interpretation such

that I(φ) = true then we say that I satisfies φ or I is a model of

φ.

�� !"��#��$�%�&('*)��+'

&

$

%

Satisfiability

Definition. A sentence φ of PL is satisfiable if there is an

interpretation I such that I(φ) = true.

Examples: P, P ∨ Q, (P ∧ R) ∨ Q

Definition. A sentence φ of PL is unsatisfiable if there is no

interpretation I such that I(φ) = true.

Example: P ∧ ¬P

'

&

$

%

Validity

Definition. A sentence φ of PL is valid if for all interpretations

I, I(φ) = true.

Examples: P ∨ ¬P , ((P ∨ H) ∧ ¬H) =⇒ P

Valid statements in PL are also called tautologies.

Theorem. Let φ be a sentence of PL. If φ is unsatisfiable then its

negation ¬φ is valid. Proof?

�� !"��#��$�%�&('*)��+'

&

$

%

Entailment

Definition. Let φ and ψ be sentences of PL. We will say that φ

entails ψ (denoted by φ |= ψ) if for all interpretations I such that

I(φ) = true then I(ψ) = true.

Example: P ∧ Q |= P

The deduction theorem. Let φ and ψ be sentences of PL.

Then φ |= ψ iff φ =⇒ ψ is valid. Proof?

Example: (P ∧ Q) =⇒ P is a valid sentence.

'

&

$

%

Entailment and Unsatisfiability

Theorem. Let φ and ψ be sentences of PL. Then φ |= ψ iff

φ ∧ ¬ψ is unsatisfiable. Proof?

Example: P ∧ Q |= P

The above theorem is the essence of proofs by contradiction or

refutation.

�� !"��#��$�%�&('*)��+'

&

$

%

Equivalence

Definition. Let φ and ψ be sentences of PL. We will say that φ

is equivalent to ψ (denoted by φ ≡ ψ) if φ |= ψ and ψ |= φ.

Example: ¬(P ∧ ¬Q) ≡ ¬P ∨ Q

'

&

$

%

Some Useful Equivalences

• (α ∧ β) ≡ (β ∧ α) commutativity of ∧

• (α ∨ β) ≡ (β ∨ α) commutativity of ∨

• ((α ∧ β) ∧ γ) ≡ (α ∧ (β ∧ γ)) associativity of ∧

• ((α ∨ β) ∨ γ) ≡ (α ∨ (β ∨ γ)) associativity of ∨

• ¬(¬α) ≡ α double-negation elimination

• (α⇒ β) ≡ (¬β ⇒ ¬α) contraposition

�� !"��#��$�%�&('*)��+'

&

$

%

Some Useful Equivalences (cont’d)

• (α⇒ β) ≡ (¬α ∨ β) implication elimination

• (α⇔ β) ≡ ((α⇒ β) ∧ (β ⇒ α)) biconditional elimination

• ¬(α ∧ β) ≡ (¬α ∨ ¬β) de Morgan law

• ¬(α ∨ β) ≡ (¬α ∧ ¬β) de Morgan law

• (α ∧ (β ∨ γ)) ≡ ((α ∧ β) ∨ (α ∧ γ)) distribution of ∧ over ∨

• (α ∨ (β ∧ γ)) ≡ ((α ∨ β) ∧ (α ∨ γ)) distribution of ∧ over ∨

'

&

$

%

Truth Tables

A ¬A

true false

false true

A B A ∧ B A ∨ B A =⇒ B A ⇐⇒ B

false false false false true true

false true false true true false

true false false true false false

true true true true true true

Why are truth tables useful?

�� !"��#��$�%�&('*)��+'

&

$

%

Truth tables (cont’d)

Example: A truth table for showing the validity of sentence

((P ∨ H) ∧ ¬H) =⇒ P .

P H P ∨ H (P ∨ H) ∧ ¬H ((P ∨ H) ∧ ¬H)

=⇒ P

false false false false true

false true true false true

true false true true true

true true true false true

'

&

$

%

PL Satisfiability as a CSP

The satisfiability problem for PL is fundamental. The entailment

and validity problems can be rephrased as satisfiability problems.

Notice that the satisfiability problem for PL can be phrased as a

CSP. What are the variables, values and constraints?

�� !"��#��$�%�&('*)��+'

&

$

%

Complexity of PL Satisfiability and Validity

Theorem. The problem of determining whether a sentence of PL

is satisfiable is NP-complete (Cook, 1971).

Corollary. The problem of determining whether a sentence of PL

is valid is co-NP-complete.

The above results mean that it is highly unlikely that we will ever

find a polynomial time algorithm for these problems.

'

&

$

%

Horn Sentences

Definition. A PL sentence will be called Horn if it is in one of

the following two forms:

Q

or

P1 ∧ P2 ∧ . . . ∧ Pn =⇒ Q

The second of the above forms is equivalent to

¬P1 ∨ ¬P2 ∨ . . . ∨ ¬Pn ∨ Q.

Theorem. If φ is a conjunction of Horn sentences then the

satisfiability of φ can be decided in polynomial time.

�� !"��#��$�%�&('*)��+'

&

$

%

Inference Rules for PL

An inference rule is a rule of the form

α1, α2, . . . , αn

β

where α1, α2, . . . , αn are sentences called conditions and β is a

sentence called conclusion.

Whenever we have a set of sentences that match the conditions of

an inference rule then we can conlude the sentence in the

conclusion.

'

&

$

%

Inference Rules for PL (cont’d)

• Modus Ponens: α =⇒ β, αβ

• And-Elimination: α1 ∧ α2 ∧ ... ∧ αn

αi

• And-Introduction: α1,α2,...,αn

α1 ∧ α2 ∧ ... ∧ αn

• Or-Introduction: αi

α1 ∨ α2 ∨ ... ∨ αn

• Double-Negation Elimination: ¬¬αα

• Unit Resolution: α ∨ β, ¬βα

• Resolution: α ∨ β, ¬β ∨ γα ∨ γ

�� !"��#��$�%�&('*)��+'

&

$

%

Why Is Inference Important?W

orld

input sentences

conclusions

User

?

'

&

$

%

Formalizing the WW in PL

BG

PS

W

A = Agent = Breeze = Glitter, Gold

= Pit = Stench

= Wumpus

OK = Safe square

V = Visited

B P!

A

OK OK

OK

1,1 2,1 3,1 4,1

1,2 2,2 3,2 4,2

1,3 2,3 3,3 4,3

1,4 2,4 3,4 4,4

V

SOK

W!

V

We can formalize the above situation in PL and use inference to

conclude that the wumpus is in room (1,3). Do it as an exercise!

�� !"��#��$�%�&('*)��+'

&

$

%

Formalizing the WW in PL (cont’d)

Consider the following WW rule:

If a square has no smell, then neither the square nor any of

its adjacent squares can house a Wumpus.

How can we formalize this rule in PL?

We have to write one rule for every relevant square! For example:

¬S11 =⇒ ¬W11 ∧ ¬W12 ∧ W21

This is a very disappointing feature of PL. There is no way in PL

to make a statement referring to all objects of some kind (e.g., to

all squares).

Not to worry: this can be done in first order logic!

'

&

$

%

A knowledge-based agent using PL

function Propositional-KB-Agent(percept) returns an action




for each action in the list of possible actions do

if Ask(KB,Make-Action-Query(t, action)) then


t← t+ 1

return action

end

�� !"��#��$�%�&('*)��+'

&

$

%

Readings

Chapter 7 of AIMA: Logical Agents, Sections 7.1 to 7.5.

'

&

$

%

First-Order Logic (FOL)

Ontological commitments of FOL:

• The world consists of objects i.e., things with individual

identities. Objects have properties that distinguish them

from other objects.

• Objects participate in relations with other objects. Some of

these relations are functions. Relations hold or do not hold.

These ontological commitments make FOL more powerful than PL.

FOL is here to stay!

�� !"��#��$�%�&('*)��+'

&

$

%

FOL: Syntax

The symbols of FOL (with equality) are the following:

• Parentheses: (, ).

• The logical connectives ¬, ∧ , ∨ , =⇒ and ⇐⇒ .

• A countably infinite set of variables. This set will be denoted

by V ars.

Examples: x, y, v, . . .

'

&

$

%

FOL: Syntax (cont’d)

• The quantifier symbols: ∀, ∃

• A countably infinite set of constant symbols.

Examples: John, Mary, 5, 6, Ball, . . .

• The equality symbol: =

• Predicate symbols: For each positive integer n, some set

(possibly empty) of symbols, called n-place predicate symbols.

Examples: Happy(.), Brother(., .), Arrives(., ., .), . . .

• Function symbols: For each positive integer n, some set

(possibly empty) of symbols, called n-place function symbols.

Examples: FatherOf(.), Cosine(.), . . .

Logicians usually introduce only the connectives ¬ and ∨ and one

of the quantifiers.

�� !"��#��$�%�&('*)��+'

&

$

%


Terms are expressions of FOL that refer to objects. The set of all

terms will be denoted by Terms.

The following BNF grammar gives the syntax of terms:

Term→ ConstantSymbol | V ariable

| FunctionSymbol(Term, . . . , T erm)

Examples:

John, x, FatherOf(John), WifeOf(FatherOf(x)), . . .

'

&

$

%


Atomic formulas are expressions of FOL that refer to simple

facts.

The following BNF grammar gives the syntax of atomic formulas:

AtomicFormula→ Term = Term

| PredicateSymbol(Term, . . . , T erm)

Examples:

John = ElderSonOf(FatherOf(John)), Happy(John),

Lives(John, London), Arrives(John,Athens,Monday)

�� !"��#��$�%�&('*)��+'

&

$

%


Well-formed formulas (wffs) are the most complex kind of

expressions in FOL. They can be used to refer to any complicated state

of affairs.

The following BNF grammar gives the syntax of wffs:

Wff → AtomicFormula | ( Wff ) | ¬ Wff

| Wff BinaryConnective Wff

| ( Quantifier V ariable ) Wff

'

&

$

%


Examples of wffs:

• ¬Loves(Tony,Mary)

• Loves(Tony, Paula) ∨ Loves(Tony, F iona)

• Loves(John, Paula) ∧ Loves(John, F iona)

• (∀x)(SportsCar(x) ∧ HasDriven(Mike, x) =⇒ Likes(Mike, x))

• (∃x)(SportsCar(x) ∧ Owns(John, x))

�� !"��#��$�%�&('*)��+'

&

$

%

Free Variables

The following recursive definition defines the notion of free variables

of a wff.

• If φ is an atomic formula, x occurs free in φ iff x is a symbol of φ.

• x occurs free in ¬φ iff x is a symbol of φ.

• x occurs free in φ ∧ ψ iff x is a symbol of φ or ψ. Similarly for the

remaining binary connectives.

• x occurs free in (∀v)φ iff x is a symbol of φ and x is different than

v. Similarly for ∃.

The opposite of free is bound.

Definition. If no variable occurs free in the wff φ, then φ is a

sentence.

'

&

$

%

Free Variables (cont’d)

Examples:

• x is free in Brother(x, John) but not in

(∀x)(Cat(x) =⇒ Mammal(x)).

• y is free in

(∀x)(Friend(x, y) =⇒ Loves(x, y))

but not in

(∀x)(∀y)(Friend(x, y) =⇒ Loves(x, y)).

�� !"��#��$�%�&('*)��+'

&

$

%

FOL: Semantics

The meaning of FOL formulas is provided by interpretations.

An interpretation is a mapping between symbols of FOL and

objects, functions or relations in the world. More precisely:

• An interpretation maps each constant symbol to an object in

the world.

Example: In one particular interpretation the symbol John

might refer to John Major, the British PM. In another

interpretation it might refer to the evil King John, king of

England from 1199 to 1216.

'

&

$

%

FOL: Semantics (cont’d)

• An interpretation maps each predicate symbol to a relation in

the world.

Example: In one particular interpretation the symbol

Brother(., .) might refer to the relation of brotherhood. In a

world with three objects, King John, John Major, and Richard

the Lionheart, the relation of brotherhood is defined by the

following set of tuples:

{ 〈King John,Richard the Lionheart〉,

〈Richard the Lionheart,King John〉 }

�� !"��#��$�%�&('*)��+'

&

$

%

FOL: Semantics (cont’d)

• An interpretation always maps the equality symbol to the

identity relation in the world. The identity relation is:

id = {〈o, o〉 : o is an object in the world}

• An interpretation maps each function symbol to a functional

relation (or function) in the world.

Example: In one particular interpretation the symbol

FatherOf(.) might refer to the relation of fatherhood.

'

&

$

%

FOL: Formal Semantics

An interpretation I is a function which makes the following

assignments to the symbols of FOL:

1. I assigns to the quantifier symbol ∀ a non-empty set |I| called

the universe or domain of I.

2. I assigns to each constant symbol c a member cI of the

universe |I|.

3. I assigns to each n-place predicate symbol P an n-ary relation

P I ⊆ |I|n; i.e., P I is a set of n-tuples of members of the

universe.

4. I assigns to each n-place function symbol f an n-ary function

f I on |I|; i.e., f I : |I|n→ |I|.

�� !"��#��$�%�&('*)��+'

&

$

%

Satisfaction

Definition. A variable assignment is a function s : V ars→ |I|

for some set of variables V ars and interpretation I.

Let φ be a wff of FOL, I an interpretation and s : V ars→ |I| a

variable assignment.

We will define what it means for I to satisfy φ with variable

assignment s. This will be denoted by

|=I φ[s].

Intuitively |=I φ[s] if and only if the state of affairs denoted by φ is

true according to I (where any variable x which occurs in φ, stands

for s(x) wherever it occurs free).

'

&

$

%

Satisfaction (cont’d)

The formal definition of satisfaction proceeds as follows:

Terms. We define the function

s : Terms→ |I|

from the set of all terms Terms into the universe |I|. This function

is an extension of s, and maps each FOL term to the object in the

universe denoted by this term:

• For each variable x, s(x) = s(x).

• For each constant symbol c, s(c) = cI .

• If t1, . . . , tn are terms and f is an n-place function symbol, then

s(f(t1, . . . , tn)) = f I(s(t1), . . . , s(tn)).

�� !"��#��$�%�&('*)��+'

&

$

%


Atomic formulas. The definition of satisfaction for atomic

formulas is as follows:

• For atomic formulas involving the equality symbol,

|=I t1 = t2[s] iff s(t1) is identical to s(t2).

• For an n-place predicate symbol P ,

|=I P (t1, . . . , tn)[s] iff 〈s(t1), . . . , s(tn)〉 ∈ P I .

'

&

$

%


Other wffs.

• |=I ¬φ[s] iff 6|=I φ[s] (i.e., iff |=I φ[s] is not the case).

• |=I (φ ∧ ψ)[s] iff |=I φ[s] and |=I ψ[s].

• |=I (φ ∨ ψ)[s] iff |=I φ[s] or |=I ψ[s].

• |=I (φ =⇒ ψ)[s] iff 6|=I φ[s] or |=I ψ[s].

• |=I (φ ⇐⇒ ψ)[s] iff |=I φ[s] and |=I ψ[s], or 6|=I φ[s] and

6|=I ψ[s].

�� !"��#��$�%�&('*)��+'

&

$

%


• |=I (∀x)φ [s] iff for all d ∈ |I|, we have |=I φ[s(x|d)].

The function s(x|d) is defined as follows:

s(x|d)(y) =

s(y) if y 6= x

d if y = x

• |=I (∃x)φ [s] iff there exists d ∈ |I| such that |=I φ[s(x|d)].

'

&

$

%

Example: the WW in FOL

Breeze Breeze

Breeze

BreezeBreeze

Stench

Stench

BreezePIT

PIT

PIT

1 2 3 4

1

2

3

4

START

Gold

Stench

�� !"��#��$�%�&('*)��+'

&

$

%

Example (cont’d)

If we want to formalize the WW in FOL, we can use the following

symbols:

• Constant symbols:

Agent,Wumpus,Gold,Breeze, Stench,Rm11, Rm12, . . . , Rm44

• Function symbols:

– The unary function symbol NorthOf to denote the unique

room which is north of the room denoted by the argument

of the function. For example, the room north of room 11 is

room 21.

– The unary function symbols SouthOf,WestOf,EastOf

with similar meanings.

'

&

$

%

Example (cont’d)

• Predicate symbols:

– The binary predicate Location will be used to denote the

location (i.e. room) of each object (agent, wumpus and

gold).

– The binary predicate Percept will be used to denote the

percept (i.e., breeze or stench) in each room.

– The unary predicate Bottomless will be used to denote that

a room contains a pit.

�� !"��#��$�%�&('*)��+'

&

$

%

Example (cont’d)

Let us now provide an interpretation I of the above symbols which

corresponds to the previous picture:

• The universe of I is the objects we see in the picture:

|I| = {agent, wumpus, gold, breeze, stench, rm11, . . . , rm44}.

• I makes the following assignments to constant symbols:

AgentI = agent, WumpusI = wumpus, GoldI = gold,

BreezeI = breeze, StenchI = stench,

Rm11I = rm11, . . . , Rm44I = rm44

'

&

$

%

Example (cont’d)

• I assigns to the unary function symbol NorthOf the function

NorthOf I : |I| → |I| which is defined as follows:

NorthOf I(rm11) = rm21,

NorthOf I(rm21) = rm22, . . . , NorthOf I(rm34) = rm44

• I assigns to the unary function symbols

SouthOf,WestOf,EastOf the function symbols

SouthOf I ,WestOf I , EastOf I that are defined similarly with

NorthOf I .

�� !"��#��$�%�&('*)��+'

&

$

%

Example (cont’d)

• I assigns to the unary predicate symbol Bottomless the

following relation:

{〈rm13〉, 〈rm33〉, 〈rm44〉}

• I assigns to the binary predicate symbol Location the following

relation:

{〈agent, rm11〉, 〈wumpus, rm31〉, 〈gold, rm32〉}

'

&

$

%

Example (cont’d)

• I assigns to the binary predicate symbol Percept the following relation:

{〈rm12, breeze〉, 〈rm14, breeze〉, 〈rm21, stench〉, 〈rm23, breeze〉,

〈rm32, breeze〉, 〈rm32, stench〉, 〈rm34, breeze〉, 〈rm41, stench〉,

〈rm43, breeze〉}

Note: To describe interpretation I we used words like agent, breeze, etc.

which start with a lowercase letter. These are not symbols of our FOL

language; they are just English words referring to what is in the picture.

Instead, we could have drawn little pictures to describe the elements of the

universe of the interpretation.

�� !"��#��$�%�&('*)��+'

&

$

%

Example (cont’d)

We now give examples of satisfaction:

• |=I x = y[s] for any variable assignment s which maps x and y

to identical objects of the universe (e.g.,

s(x) = s(y) = wumpus). Why?

Because if s(x) = s(y) = wumpus then s(x) = s(x) = wumpus

is identical to s(y) = s(y) = wumpus.

• |=I Agent = Agent[s] for any variable assignments s.

This is trivial.

'

&

$

%

Example (cont’d)

• |=I Rm21 = NorthOf(Rm11)[s] for any variable assignment s.

Why?

Because s(Rm21) = Rm21I = rm21 and

s(NorthOf(Rm11)) = NorthOf I(s(Rm11)) =

= NorthOf I(Rm11I) = NorthOf I(rm11) = rm21.

• |=I Rm21 = NorthOf(x)[s] for any variable assignment s such

that s(x) = rm11. Why?

Because s(Rm21) = Rm21I = rm21 and

s(NorthOf(x)) = NorthOf I(s(x)) =

NorthOf I(s(x)) = NorthOf I(rm11) = rm21.

�� !"��#��$�%�&('*)��+'

&

$

%

Example (cont’d)

• |=I Bottomless(x)[s] for any variable assignment s such that

s(x) = rm13 or s(x) = rm33 or s(x) = rm44. Why?

Because if s(x) = rm13 then

〈s(x)〉 = 〈s(x)〉 = 〈rm13〉 ∈ BottomlessI .

Similarly, for the other cases.

• |=I Location(Agent, Rm11)[s] for any variable assignments s. Why?

Because

〈s(Agent), s(Rm11)〉 = 〈AgentI , Rm11I〉 = 〈agent, rm11〉 ∈ LocationI .

'

&

$

%

Example (cont’d)

• |=I ¬Location(Gold,Rm44)[s] for any variable assignment s. Why?

Because

〈s(Gold), s(Rm44)〉 = 〈GoldI , Rm44I〉 = 〈gold, rm44〉 6∈ LocationI

therefore 6|=I Location(Gold,Rm44)[s] for any variable assignment s.

• |=I Location(Gold,Rm32) ∨ Location(Gold,Rm44)[s] for any

variable assignment s. Why?

Because

〈s(Gold), s(Rm32)〉 = 〈GoldI , Rm32I〉 = 〈gold, rm32〉 ∈ LocationI

therefore |=I Location(Gold,Rm32)[s] for any variable assignment s.

�� !"��#��$�%�&('*)��+'

&

$

%

Example (cont’d)

• |=I (∃x)Location(x,Rm11)[s] for any variable assignment s. Why?

Because

〈s(Agent), s(Rm11)〉 = 〈AgentI , Rm11I〉 = 〈agent, rm11〉 ∈ LocationI

thus |=I Location(x,Rm11)[s(x|agent)].

• 6|=I (∀x)Location(Wumpus, x)[s] for any variable assignment s.

Why?

Because

〈s(Wumpus), s(Rm11)〉 = 〈WumpusI , Rm11I〉 =

〈wumpus, rm11〉 6∈ LocationI

thus 6|=I Location(Wumpus, x)[s(x|rm11)].

'

&

$

%


When we want to verify whether or not an interpretation satisfies a

wff φ with s, we do not really need all of the (infinite amount of)

information that s gives us. All that matters are the values of the

function s at the (finitely many) variables which occur free in s. In

particular, if φ is a sentence, then s does not matter at all. This is

made formal by the following theorem.

Theorem. Let s1 and s2 be variable assignments from V ars into

|I| which agree at all variables (if any) which occur free in the wff

φ. Then

|=I φ[s1] iff |=I φ[s2].

�� !"��#��$�%�&('*)��+'

&

$

%


The previous theorem has the following corollary.

Corollary. Let φ be a sentence and I an interpretation. Then,

either

(a) I satisfies φ with every variable assignment s : V ars→ |I|, or

(b) I does not satisfy φ with any variable assignment.

The above corollary allows us to ignore variable assignments

whenever we talk about satisfaction of sentences. Thus if φ is a

sentence and I an interpretation we can just say that I satisfies

(or does not satisfy) φ.

'

&

$

%

Satisfiability

Definition. A formula φ is called satisfiable iff there exists an

interpretation I and variable assignment s such that |=I φ[s].

Otherwise, the formula is called unsatisfiable.

Examples: The formulas

Location(Wumpus,Rm31), Location(Agent, Rm11), (∃x)R(y, x)

are satisfiable. The following formulas are unsatisfiable:

P (x) ∧ ¬P (x), (∀x)P (x) ∧ ¬P (A)

Can you write an algorithm which discovers whether a given wff is

satisfiable?

�� !"��#��$�%�&('*)��+'

&

$

%

Truth and Models

Definition. Let φ be a sentence and I an interpretation. If I

satisfies φ then we will say that φ is true in I or I is a model of φ.

Example: The interpretation I defined in the WW example is a

model of the following sentences:

Location(Wumpus,Rm31), Location(Agent, Rm11),

(∃x)Percept(Breeze, x)

Definition. An interpretation I is a model of a set of sentences

KB iff it is a model of every member of KB.

'

&

$

%

Entailment

Definition. Let KB be a set of wffs, and φ a wff. Then KB entails

φ, denoted by KB |= φ, iff for every interpretation I and every variable

assignment s : V ars→ |I| such that I satisfies every member of KB

with s, I also satisfies φ with s.

Examples:

{ Happy(John), (∀x)(Happy(x) =⇒ Laughs(x)) } |= Laughs(John)

{WellPaid(John), ¬WellPaid(John) ∨ Happy(John) } |= Happy(John)

Can you give an algorithm that discovers whether a set of wffs entail a

wff?

�� !"��#��$�%�&('*)��+'

&

$

%

Validity and Equivalence

Definition. A wff φ is valid iff for every interpretation I and

every variable assignment s : V ars→ |I|, I satisfies φ with s.

Examples: The formulas

P (A) ∨ ¬P (A), P (A) =⇒ P (A), (∀x)P (x) =⇒ (∃x)P (x)

are valid.

Can you write an algorithm which discovers whether a given wff is

valid?

Definition. Two formulas φ and ψ will be called logically

equivalent, denoted by φ ≡ ψ, iff φ |= ψ and ψ |= φ.

'

&

$

%

Satisfiability, Equivalence and Validity

Theorem. φ |= ψ iff φ =⇒ ψ is valid. Proof?

Theorem. φ is unsatisfiable iff ¬φ is valid. Proof?

Theorem. φ ≡ ψ iff φ ⇐⇒ ψ is valid. Proof?

�� !"��#��$�%�&('*)��+'

&

$

%

Some Important Logical Equivalences

Let φ and ψ be wffs. Then:

1. ¬(φ ∧ ψ) ≡ ¬φ ∨ ¬ψ

2. ¬(φ ∨ ψ) ≡ ¬φ ∧ ¬ψ

3. φ ∧ ψ ≡ ¬(¬φ ∨ ¬ψ)

4. φ ∨ ψ ≡ ¬(¬φ ∧ ¬ψ)

5. φ =⇒ ψ ≡ ¬φ ∨ ψ

6. φ ⇐⇒ ψ ≡ (φ =⇒ ψ) ∧ (ψ =⇒ φ)

Proofs?

'

&

$

%

Some Important Logical Equivalences (cont’d)

1. (∀x)φ ≡ ¬(∃x)¬φ

2. (∃x)φ ≡ ¬(∀x)¬φ

3. (∀x)¬φ ≡ ¬(∃x)φ

4. (∃x)¬φ ≡ ¬(∀x)φ

Proofs?

�� !"��#��$�%�&('*)��+'

&

$

%

Some Important Logical Equivalences (cont’d)

1. (∃x)(φ ∨ ψ) ≡ (∃x)φ ∨ (∃x)ψ

2. (∃x)(φ ∧ ψ) |= (∃x)φ ∧ (∃x)ψ

3. (∀x)φ ∨ (∀x)ψ |= (∀x)(φ ∨ ψ)

4. (∀x)(φ ∧ ψ) ≡ (∀x)φ ∧ (∀x)ψ

Proofs?

'

&

$

%

An Exercise

Prove that

(∃x)(φ(x) ∧ ψ(x)) |= (∃x)φ(x) ∧ (∃x)ψ(x).

Proof: Let I be an interpretation such that

|=I (∃x)(φ(x) ∧ ψ(x)).

Then according to the definition of satisfaction for existential

statements, there exists a variable assignment s and d ∈ |I| such

that

|=I (φ(x) ∧ ψ(x))[s(x|d)].

Then according to the definition of satisfaction for conjunctive

statements, we have

|=I φ(x)[s(x|d)]

�� !"��#��$�%�&('*)��+'

&

$

%

and

|=I ψ(x)[s(x|d)].

Now from the definition of satisfaction for existential statements

again, we have

|=I (∃x)φ(x)

and

|=I (∃x)ψ(x).

Now from the definition of satisfaction for conjunctive statements

we have:

|=I (∃x)φ(x) ∧ (∃x)ψ(x).

The proof is now finished.

'

&

$

%

Representing Knowledge Using FOL

Definition. In knowledge representation, a domain is a section

of the world about which we wish to express some knowledge.

• The domain of family relationships.

• The domain of sets.

• The wumpus domain.

• The domain of web resources (HTML pages, images, programs

etc. on the WWW)

• ...

�� !"��#��$�%�&('*)��+'

&

$

%

Knowledge Engineering

The process of knowledge-base construction is called knowledge

engineering.

A knowledge engineer is someone who investigates a particular

domain, determines what concepts are important in that domain,

and creates a formal representation of the objects and relations in

that domain.

You will become knowledge engineers for some of the exercises!

'

&

$

%

Other Logics in Computer Science

FOL is certainly the most important logic in use today by

computer scientists. But there are others too:

• Second-order logic

• Modal logic (with operators such as “possible” and “certain”)

• Temporal logic (with operators such as “in the past”, etc.)

• Logics of knowledge and belief

• Logics for databases

• ....

�� !"��#��$�%�&('*)��+'

&

$

%

Readings

Chapter 8 of AIMA: First-Order Logic

Other formal presentations of FOL can be found in:

1. M.R. Genesereth and N.J. Nilsson, “Logical Foundations of

Artificial Intelligence”, Morgan Kaufmann, 1987.

2. Any mathematical logic textbook. Most of the formal material

in these notes is from:

H.B. Enderton, “A Mathematical Introduction to Logic”,

Academic Press, 1972.

'

&

$

%

Inference in First-Order Logic

Inference (or reasoning) is the process of mechanically deriving

sentences entailed by other sentences.

We would like to find an inference mechanism i such that

KB |= α iff KB ì α

for any set of FOL sentences KB and any sentence α.

If this inference mechanism can be implemented by a program then

it could form the basis of any knowledge-based agent!

�� !"��#��$�%�&('*)��+'

&

$

%

A Brief History of Reasoning

450b.c. Stoics propositional logic, inference (maybe)

322b.c. Aristotle “syllogisms” (inference rules), quantifiers

1847 Boole propositional logic (again)

1879 Frege first-order logic

1922 Wittgenstein proof by truth tables

1930 Godel ∃ complete algorithm for proofs in FOL

1930 Herbrand complete algorithm for proofs in FOL

(reduce to propositional)

1931 Godel ¬∃ complete algorithm for arithmetic proofs

1960 Davis/Putnam “practical” algorithm for propositional logic

1965 Robinson “practical” algorithm for FOL—resolution

'

&

$

%

Inference rules for FOL

The following inference rules of PL are valid for FOL as well:

• Modus Ponens: α, α =⇒ ββ

• And-Elimination: α1 ∧ α2 ∧ ... ∧ αn

αi

• And-Introduction: α1,α2,...,αn

α1 ∧ α2 ∧ ... ∧ αn

• Or-Introduction: αi

α1 ∨ α2 ∨ ... ∨ αn

• Double-Negation Elimination: ¬¬αα

• Unit Resolution: α ∨ β, ¬βα

• Resolution: α ∨ β, ¬β ∨ γα ∨ γ

�� !"��#��$�%�&('*)��+'

&

$

%

The Concept of Substitution

Definition. A substitution θ is a finite set of the form

{v1/t1, . . . , vn/tn} where

• each vi is a variable and each ti is a term distinct from vi,

• the variables vi, . . . , vn are distinct, and

• no variable vi occurs in any of the ti’s.

Each element ti is called a binding for vi. The variables with

bindings are called bound.

Definition. A substitution is called ground if the terms ti

contain no variables (i.e., they are ground terms).

'

&

$

%

The Concept of Substitution (cont’d)

The empty substitution will be denoted by {}.

Examples: The sets

{x/John, y/Mary} and {x/John, y/MotherOf(z)}

are substitutions.

The sets

{x/F (x)} and {x/G(y), y/F (x)}

are not substitutions.

�� !"��#��$�%�&('*)��+'

&

$

%

Substitution (cont’d)

Definition. Let θ = {v1/t1, . . . , vn/tn} be a substitution and α

be any FOL term or formula without quantifiers. Then

SUBST (θ, α) is the expression obtained from α by replacing each

occurrence of the variable vi in α by the term ti (i = 1, . . . , n).

Example:

SUBST ({x/John, y/Mary}, Loves(x, y)) = Loves(John,Mary)

SUBST ({x/John, y/HouseOf(z)}, Likes(x, y)) =

Likes(John,HouseOf(z))

Note: SUBST ({}, α) = α for any FOL formula α.

'

&

$

%

Inference rules for FOL

The three new inference rules are the following:

• Universal Elimination: For any sentence α, variable v and

ground term g:

(∀v)α

SUBST ({v/g}, α)

Example: From (∀x)Likes(x, IceCream), we can use the

substitution {x/Ben} and infer Likes(Ben, IceCream).

�� !"��#��$�%�&('*)��+'

&

$

%

Inference rules for FOL (cont’d)

• Existential Elimination: For any sentence α, variable v and

brand new constant symbol k:

(∃v)α

SUBST ({v/k}, α)

Example: From (∃x)Likes(x, IceCream), we can infer

Likes(Somebody, IceCream) as long as Somebody is a brand

new constant that has not been used before.

'

&

$

%

Inference rules for FOL (cont’d)

• Existential Introduction: For any sentence α, variable v

that does not occur in α, and ground term g that does occur in

α:

α

(∃v)SUBST ({g/v}, α)

Example: From Likes(John, IceCream), we can infer

(∃x)Likes(John, x).

�� !"��#��$�%�&('*)��+'

&

$

%

An Example Proof

Let us consider the following text:

The law says that it is a crime for an American to sell

weapons to hostile nations. The country Nono, an enemy

of America, has some missiles, and all of its missiles were

sold to it by Colonel West, who is an American.

How can we formalize this text in FOL, and use the above inference

rules to conclude that West is a criminal?

'

&

$

%

Example: Formalization in FOL

• “... it is a crime for an American to sell weapons to hostile

nations”:

(∀x, y, z) (American(x) ∧ Weapon(y) ∧ Nation(z) ∧

Hostile(z) ∧ Sells(x, z, y) =⇒ Criminal(x))

• “Nono ... has some missiles”:

(∃x) (Owns(Nono, x) ∧ Missile(x))

• “All of its missiles were sold to it by Colonel West”:

(∀x) (Owns(Nono, x) ∧Missile(x) =⇒ Sells(West,Nono, x))

�� !"��#��$�%�&('*)��+'

&

$

%

Example: Formalization in FOL (cont’d)

• Missiles are weapons:

(∀x) (Missile(x) =⇒ Weapon(x))

• An enemy of America is a “hostile nation”:

(∀x) (Enemy(x,America) =⇒ Hostile(x))

• “West, who is an American”: American(West)

• “The country Nono ...”: Nation(Nono)

• “Nono, an enemy of America ...”:

Enemy(Nono,America), Nation(America)

'

&

$

%

Example: Proof

• From (∃x) (Owns(Nono, x) ∧ Missile(x)) and Existential

Elimination:

Owns(Nono,M1) ∧ Missile(M1)

• From Owns(Nono,M1) ∧ Missile(M1) and And-Elimination:

Owns(Nono,M1), Missile(M1)

• From (∀x) (Missile(x) =⇒ Weapon(x)) and Universal

Elimination:

Missile(M1) =⇒ Weapon(M1)

�� !"��#��$�%�&('*)��+'

&

$

%

Example: Proof (cont’d)

• From Missile(M1),Missile(M1) =⇒ Weapon(M1) and

Modus Ponens:

Weapon(M1)

• From

(∀x) (Owns(Nono, x) ∧Missile(x) =⇒ Sells(West,Nono, x))

and Universal Elimination:

Owns(Nono,M1) ∧Missile(M1) =⇒ Sells(West,Nono,M1)

'

&

$

%


• From Owns(Nono,M1) ∧ Missile(M1),

Owns(Nono,M1) ∧Missile(M1) =⇒ Sells(West,Nono,M1)

and Modus Ponens:

Sells(West,Nono,M1)

• From (∀x, y, z) (American(x) ∧ Weapon(y) ∧ Nation(z) ∧

Hostile(z) ∧ Sells(x, z, y) =⇒ Criminal(x)) and Universal

Elimination (three times):

American(West) ∧ Weapon(M1) ∧ Nation(Nono) ∧

Hostile(Nono) ∧ Sells(West,Nono,M1) =⇒ Criminal(West)

�� !"��#��$�%�&('*)��+'

&

$

%


• From (∀x) (Enemy(x,America) =⇒ Hostile(x)) and

Universal Elimination:

Enemy(Nono,America) =⇒ Hostile(Nono)

• From Enemy(Nono,America),

Enemy(Nono,America) =⇒ Hostile(Nono)

and Modus Ponens:

Hostile(Nono)

'

&

$

%


• From

American(West), Weapon(M1), Nation(Nono),

Hostile(Nono), Sells(West,Nono,M1)

and And-Introduction:


Hostile(Nono) ∧ Sells(West,Nono,M1)

�� !"��#��$�%�&('*)��+'

&

$

%


• From


Hostile(Nono) ∧ Sells(West,Nono,M1),


Hostile(Nono) ∧ Sells(West,Nono,M1) =⇒ Criminal(West)

and Modus Ponens:

Criminal(West)

'

&

$

%

Finding a Proof: a Search Problem

Finding a proof can be formalized as a search problem:

• Initial state: the initial KB.

• Operators: applicable inference rules

• Goal test: the KB contains the sentence we are trying to prove.

Then any search algorithm can be used to find a proof for a given

sentence. Unfortunately the search space is infinite!

�� !"��#��$�%�&('*)��+'

&

$

%

Composition of Substitutions

Definition. Let θ1 = {u1/s1, . . . , um/sm} and

θ2 = {v1/t1, . . . , vn/tn} be substitutions such that no variable

bound in θ1 occurs anywhere in θ2. The composition

COMPOSE(θ1, θ2) of θ1 and θ2 is the substitution

{u1/SUBST (θ2, s1), . . . , um/SUBST (θ2, sm), v1/t1, . . . , vn/tn}.

Note: Other authors denote the composition of θ1 and θ2 by θ1θ2.

'

&

$

%

Examples of Composition

• Let θ1 = {x/y, z/G(w)} and θ2 = {y/A,w/D}. Then

COMPOSE(θ1, θ2) = {x/A, z/G(D), y/A,w/D}.

• Let θ1 = {x/y, z/G(w, y)} and θ2 = {y/v, w/D}. Then

COMPOSE(θ1, θ2) = {x/v, z/G(D, v), y/v, w/D}.

• Let θ1 = {x/y, z/G(w)} and θ2 = {y/x}. The composition is

not defined in this case because x is bound in θ1 and occurs in

θ2.

• Let θ1 = {x/F (y), z/y, w/D} and θ2 = {y/A, v/E}. Then

COMPOSE(θ1, θ2) = {x/F (A), z/A,w/D, y/A, v/E}.

�� !"��#��$�%�&('*)��+'

&

$

%

Properties of Composition

Theorem. Let θ1, θ2 and θ3 be substitutions and φ a FOL

expression. Then

1. COMPOSE(θ1, {}) = COMPOSE({}, θ1) = θ1

2. SUBST (θ2, SUBST (θ1, φ)) = SUBST (COMPOSE(θ1, θ2), φ)

whenever the composition is defined.

3. COMPOSE(COMPOSE(θ1, θ2), θ3) =

COMPOSE(θ1, COMPOSE(θ2, θ3))

whenever the compositions are defined.

'

&

$

%

Unification

Definition. A set of expressions {φ1, . . . , φn} is unifiable if and

only if there is a substitution θ that makes the expressions

identical; i.e.,

SUBST (θ, φ1) = · · · = SUBST (θ, φn).

In such a case, θ is called a unifier for the set.

Examples:

• The set of expressions {P (A, y, z), P (x,B, z)} is unifiable. The

substitution {x/A, y/B, z/C} is a unifier. There are other

unifiers as well; e.g., {x/A, y/B, z/F (w)}, {x/A, y/B} etc.

• The set of expressions {P (F (x), A), P (y, F (w))} is not

unifiable.

�� !"��#��$�%�&('*)��+'

&

$

%

Most General Unifiers

Definition. A most general unifier, or mgu, of a set of

expressions S is a unifier γ with the following property. For each

unifier σ of S, there exists a substitution θ such that

σ = COMPOSE(γ, θ).

Examples:

• A mgu of the set {P (A, y, z), P (x,B, z)} is {x/A, y/B}.

Notice that

{x/A, y/B, z/F (w)} = COMPOSE({x/A, y/B}, {z/F (w)}).

• A mgu of the set {P (F (x), z), P (y,A)} is {y/F (x), z/A}.

'

&

$

%

Most General Unifiers (cont’d)

Example: A mgu of P (x) and P (y) is {x/y}. Another mgu is

{y/x}.

A most general unifier is unique up to variable renaming. This is

why we usually speak of the most general unifier of a set of

expressions.

We will now present algorithm Unify which finds the most general

unifier of two input FOL expressions.

�� !"��#��$�%�&('*)��+'

&

$

%

A Unification Algorithm

function Unify(x, y) returns the mgu of x and y

if x = y then return {}

if Variable(x) then return Unify-Var(x, y)

if Variable(y) then return Unify-Var(y, x)

if Constant(x) or Constant(y) then return failure

if not(Length(x)=Length(y)) then return failure

i← 0; γ ← {}

tag if i =Length(x) then return γ

σ ← Unify(Part(x, i),Part(y, i))

if σ = failure then return failure

γ ← COMPOSE(γ, σ)

x← SUBST (γ, x)

y ← SUBST (γ, y)

i← i + 1

goto tag

'

&

$

%

A Unification Algorithm (cont’d)

function Unify-Var(x, y) returns a substitution

if x occurs in y then return failure

return { x/y }

�� !"��#��$�%�&('*)��+'

&

$

%


• Unify was first presented by Robinson in 1965.

• You should use the above algorithm to compute the mgu whenever

the given expressions are complex.

• The inputs of Unify can be constants, variables, terms or atomic

formulas.

• The length of a term or an atomic formula is the number of its

arguments.

• The top-level function or relation symbol in a term or atomic

formula is its 0-th part, and the arguments are the other parts.

• The condition in the if-statement in function Unify-Var is called

the occurs-check. It ensures that terms such as z and F (z) do

not unify.

'

&

$

%


The worst-case time complexity of Unify is exponential in the size

of the input expressions.

Example: If we unify the following two terms

H(x1, x2, . . . , xn, F (y0, y0), . . . , F (yn−1, yn−1), yn)

H(f(x0, x0), F (x1, x1), . . . , F (xn−1, xn−1), y1, . . . , yn, xn)

each xi and yi will be bound to a term with 2i+1 − 1 symbols.

The problem is that the mgu of these two terms contains many

duplicate copies of the same subterms.

There are better (linear time!) algorithms for unification. The

main idea in these algorithms is to use good data structures for

representing FOL expressions and to apply substitutions carefully.

�� !"��#��$�%�&('*)��+'

&

$

%

Generalized Modus Ponens (GMP)

For atomic formulas pi, p′i and q, where there is a substitution θ

such that SUBST (θ, p′i) = SUBST (θ, pi), for all i:

p′1, p′2, . . . , p

′n, (p1 ∧ p2 ∧ · · · ∧ pn =⇒ q)

SUBST (θ, q)

Example: From Missile(M1), Owns(Nono,M1) and

Missile(x) ∧ Owns(Nono, x) =⇒ Sells(West,Nono, x)

we can infer Sells(West,Nono,M1).

The substitution θ in this case is {x/M1}.

'

&

$

%

Generalized Modus Ponens (cont’d)

Comments:

• GMP does in one step what would otherwise require an

And-Introduction, Universal Elimination, and Modus-Ponens.

• GMP applies only to Horn formulas.

�� !"��#��$�%�&('*)��+'

&

$

%

Literals

Definition. A literal is an atomic formula or a negated atomic

formula. In the first case we have a positive literal and in the

second a negative literal.

Examples:

Drives(John,BMW ), ¬Drives(John,BMW ), Drives(x,BMW ),

Loves(Mary, FatherOf(Mary)), ¬P (x, F (G(x)))

'

&

$

%

Horn Formulas

Definition. A FOL formula will be called Horn if it is a

disjunction of literals of which at most one is positive.

In other words, a Horn formula is in one of the following three

forms:

q

¬p1 ∨ ¬p2 ∨ . . . ∨ ¬pn ∨ q (or p1 ∧ p2 ∧ . . . ∧ pn =⇒ q)

¬p1 ∨ ¬p2 ∨ . . . ∨ ¬pn

where p1, . . . , pn, q are atomic formulas.

�� !"��#��$�%�&('*)��+'

&

$

%

Horn Formulas

Horn formulas of the first kind are also called facts.

Horn formulas of the second kind are also called rules. In this case

q is called the head of the rule and and p1 ∧ p2 ∧ . . . ∧ pn is

called the body of the rule.

Horn formulas of the last kind can be used as queries or integrity

constraints in logic programming systems.

In a Horn formula the (free) variables of p1, . . . , pn, q are

interpreted as being universally quantified in front of the formula.

If a Horn formula has exactly one positive literal, it is called

definite.

'

&

$

%

Examples

The formulas

Drives(John,BMW )

Drives(x,BMW )

Person(x) =⇒ Animal(x)

Person(x) ∧ Knows(John, x) =⇒ Loves(John, x)

¬Person(x) ∨ ¬Knows(John, x)

are Horn formulas. The first two are facts. The second two are

rules.

The formula

Drives(John,BMW ) ∨ Drives(John, Porsche)

is not Horn.

�� !"��#��$�%�&('*)��+'

&

$

%

Horn Formulas (cont’d)

Definition. A KB is in Horn form iff it consists only of Horn

formulas.

Can we solve our crime problem using GMP? Yes, if it is possible

to transform our KB into Horn form.

Later on, we will give a general algorithm for this transformation.

'

&

$

%

Example: Horn Formulas


nations”:

American(x) ∧ Weapon(y) ∧ Nation(z) ∧

Hostile(z) ∧ Sells(x, z, y) =⇒ Criminal(x)




Owns(Nono, x) ∧ Missile(x) =⇒ Sells(West,Nono, x)

�� !"��#��$�%�&('*)��+'

&

$

%

Example (cont’d)


Missile(x) =⇒ Weapon(x)


Enemy(x,America) =⇒ Hostile(x)





'

&

$

%

Example: Proof

• From Missile(M1) and Missile(x) =⇒ Weapon(x), using

GMP:

Weapon(M1)

• From Enemy(x,America) =⇒ Hostile(x) and

Enemy(Nono,America), using GMP:

Hostile(Nono)

• From Owns(Nono,M1), Missile(M1) and


using GMP:

Sells(West,Nono,M1)

�� !"��#��$�%�&('*)��+'

&

$

%

Example: Proof

• From

American(West),Weapon(M1), Nation(Nono), Hostile(Nono),

Sells(West,Nono,M1) and

American(x) ∧ Weapon(y) ∧ Nation(z) ∧

Hostile(z) ∧ Sells(x, z, y) =⇒ Criminal(x),

using GMP:

Criminal(West)

This proof illustrates that reasoning using GMP is natural, and

easy to follow. We will now present two important reasoning

algorithms based on GMP. Because they use GMP these algorithms

are applicable only to KBs in Horn form.

'

&

$

%

Standardization of Variables

Example:

Likes(x,Mary)

Likes(John, x) =⇒ GreetsCheerfully(John, x)

We would expect the above KB to entail:

GreetsCheerfully(John,Mary)

But GMP cannot be used to infer it because Likes(x,Mary) and

Likes(John, x) do not unify!

�� !"��#��$�%�&('*)��+'

&

$

%

Standardization of Variables (cont’d)

The previous problem can be solved by standardization of variables:

1. No variable should occur in more than one formula in the

initial KB.

2. Whenever we apply GMP we rename the variables in the

formulas involved. The new names must not include any

variable names already in the KB.

'

&

$

%

Forward Chaining

Forward chaining is an inference method that starts with

sentences in the knowledge base, and generates new conclusions

using GMP. These conclusions in turn can allow more inferences to

be made.

The following algorithm triggers forward chaining after the

addition of a new fact p in KB:

procedure Forward-Chain(KB, p)

if there is a sentence in KB that is a renaming of p then return

Add p to KB

for each (p1 ∧ . . . ∧ pn =⇒ q) in KB such that

for some i, Unify(pi, p) = θ succeeds do

Find-And-Infer(KB, [p1, . . . , pi−1, pi+1, . . . , pn], q, θ)

end

�� !"��#��$�%�&('*)��+'

&

$

%

Forward Chaining (cont’d)

procedure Find-And-Infer(KB, premises, conclusion, θ)

if premises = [ ] then

Forward-Chain(KB,SUBST (θ, conclusion))

else for each p′ in KB such that

Unify(p′, SUBST (θ,First(premises))) = θ2 do

Find-And-Infer(KB,Rest(premises),

conclusion, COMPOSE(θ, θ2))

end

'

&

$

%

Forward Chaining

Comments:

• Forward-chaining is a data-driven or data-directed

procedure.

• Forward chaining needs a policy for standardization of

variables like the one we discussed for GMP. This policy is

implicit in the previous algorithm.

�� !"��#��$�%�&('*)��+'

&

$

%

Forward Chaining: Example 1

Let KB be

Missile(M1)


and p be Owns(Nono,M1).

Forward-Chain(KB, p) will work as follows:

• Forward-Chain adds Owns(Nono,M1) to KB.

• Owns(Nono,M1) unifies with the first premise of the rule

Owns(Nono, x1) ∧ Missile(x1) =⇒ Sells(West,Nono, x1)

with mgu θ = {x1/M1} (note the standardization of variables!).

Find-And-

Infer(KB, [Missile(M1)], Sells(West,Nono, x1), {x1/M1}) is

called now.

'

&

$

%

Example 1 (cont’d)

• First([Missile(M1)])=Missile(M1) matches with

Missile(M1) with mgu θ2 = {} so

Find-And-Infer(KB, [ ], Sells(West,Nono, x1), {x1/M1}) is

called.

• In this call of Find-And-Infer we have premises = [ ] thus

Forward-Chain(KB,Sells(West,Nono,M1)) is called.

• Forward-Chain adds Sells(West,Nono,M1) to the KB. No

unifications are possible now, thus the procedure returns.

�� !"��#��$�%�&('*)��+'

&

$

%

Example 2

Let KB be

Q(x) =⇒ Q(F (x))

and p be Q(A).

Forward-Chain(KB, p) will work as follows:

• Forward-Chain adds Q(A) to KB.

• Q(A) unifies with the premise of the rule

Q(x1) =⇒ Q(F (x1))

with mgu θ = {x1/A} (note the standardization of variables!).

Find-And-Infer(KB, [ ], Q(F (x1)), {x1/A}) is called now.

• In this call of Find-And-Infer we have premises = [ ] thus

Forward-Chain(KB,Q(F (A))) is called.

'

&

$

%


• Forward-Chain adds Q(F (A)) to the KB. Q(F (A)) unifies

with the premise of the rule

Q(x2) =⇒ Q(F (x2))

with mgu θ = {x2/A} (note the standardization of variables!).

At this point the algorithm goes into an infinite loop! It will keep

on adding to KB the following facts:

Q(F (F (A))), Q(F (F (F (A)))), . . .

�� !"��#��$�%�&('*)��+'

&

$

%

Backward Chaining

Backward chaining is an inference method which starts with

something that we want to prove, finds implications that would

allow us to conclude it, and then attempts to establish their

premises in turn.

The following algorithm Back-Chain takes as input a knowledge

base KB and an atomic formula q (with free variables), and

returns a set of substitutions θ such that SUBST (θ, q) can be

inferred, using GMP, from the formulas in KB.

function Back-Chain(KB, q) returns a set of substitutions

Back-Chain-List(KB, [q], {})

'

&

$

%

Backward Chaining (cont’d)

function Back-Chain-List(KB, qlist, θ) returns a set of substitutions

inputs: KB, a knowledge base,

qlist, a list of conjuncts forming a query (θ already applied)

θ the current substitution

local: answers, a set of substitutions, initially empty

if qlist is empty then return {θ}

q ← First(qlist)

for each q′

i in KB such that θi ← Unify(q, q′

i) succeeds do

Add COMPOSE(θ, θi) to answers

end

�� !"��#��$�%�&('*)��+'

&

$

%


for each sentence (p1 ∧ . . . ∧ pn =⇒ q′

i) in KB

such that θi ← Unify(q, q′

i) succeeds do

answers← answers ∪

Back-Chain-List (KB, SUBST (θi, [p1, . . . , pn]), COMPOSE(θ, θi))

end

return the union of Back-Chain-List (KB, SUBST (θ,(Rest(qlist))), θ)

for each θ ∈ answers

'

&

$

%


Comments:

• Backward chaining needs a policy for standardization of

variables like the one we discussed for GMP. This policy is

implicit in the previous algorithm.

• The call BackChain(KB, q) can be used to find all answers to

a query q posed to any knowledge base KB.

The answer to a query q is a set of substitutions which is

formed by considering all substitutions returned by

Back-Chain, and keeping only the bindings for the variables

of q.

All logic programming languages (e.g., Prolog) and all

deductive database systems are based on this observation.

�� !"��#��$�%�&('*)��+'

&

$

%

Backward Chaining: Example 1

Let KB be

Q(A), Q(B)

and let the query q be Q(x).

Back-Chain(KB, q) will work as follows:

• Back-Chain calls Back-Chain-List with arguments

KB, [Q(x)] and {}.

• Q(x) unifies with Q(A) with mgu {x/A}, and Q(B) with mgu

{x/B}. Thus variable answers is assigned the set

{{x/A}, {x/B}}.

'

&

$

%


• The last line of Back-Chain-List generates the calls

Back-Chain-List(KB, [ ], {x/A}) and

Back-Chain-List(KB, [ ], {x/B}).

The first of the above calls returns {{x/A}} and the second

returns {{x/B}}. Thus finally the function returns the result

{{x/A}, {x/B}}.

This list of substitutions is the answer to query Q(x).

�� !"��#��$�%�&('*)��+'

&

$

%

Example 2

Let KB be

Q(A), Q(B), R(C), R(D)

and qlist be [Q(x), R(y)].

Then Back-Chain-List(KB, qlist, {}) will work as follows:

• Q(x) unifies with Q(A) with mgu {x/A}, and Q(B) with mgu

{x/B}. Thus variable answers is assigned the set

{{x/A}, {x/B}}.

'

&

$

%



Back-Chain-List(KB, [R(y)], {x/A}) and

Back-Chain-List(KB, [R(y)], {x/B}).

The first of the above calls returns

{{x/A, y/C}, {x/A, y/D}}

and the second returns

{{x/B, y/C}, {x/B, y/D}}.

Thus finally the function returns the result

{{x/A, y/C}, {x/A, y/D}, {x/B, y/C}, {x/B, y/D}}.

�� !"��#��$�%�&('*)��+'

&

$

%


Let us now see why Back-Chain-List(KB, [R(y)], {x/A}) returns

{{x/A, y/C}, {x/A, y/D}}.

We can deal similarly with the call

Back-Chain-List(KB, [R(y)], {x/B})

• R(y) unifies with R(C) with mgu {{y/C}}, and R(D) with

mgu {{y/D}}. Thus variable answers is assigned the set

{{x/A, y/C}, {x/A, y/D}}.

'

&

$

%



Back-Chain-List(KB, [ ], {x/A, y/C}) and

Back-Chain-List(KB, [ ], {x/A, y/D}).

The first of the above calls returns {{x/A, y/C}} and the

second returns {{x/A, y/D}}. Thus finally the function returns

the result

{{x/A, y/C}, {x/A, y/D}}.

�� !"��#��$�%�&('*)��+'

&

$

%

Example 3

Let KB be

Missile(M1), Owns(Nono,M1),


and the query q be Sells(West,Nono, y).



KB, [Sells(West,Nono, y)] and {}.

• Sells(West,Nono, y) does not unify with any atomic formula in

KB. However it unifies with the conclusion of the rule

Owns(Nono, x1) ∧ Missile(x1) =⇒ Sells(West,Nono, x1)

with substitution θi = {y/x1} (note the standardization of

variables!).

'

&

$

%


• Variable answers is assigned its old value (empty set) union

the value returned by Back-Chain-

List(KB, [Owns(Nono, x1),Missile(x1)], {y/x1}).

This call returns {{y/M1, x1/M1}}.

• The last line of Back-Chain-List calls Back-Chain-List

again with arguments KB, [ ] and {y/M1, x1/M1}. This call

returns {{y/M1, x1/M1}} therefore the final result is

{{y/M1, x1/M1}}.

Thus the answer to the query Sells(West,Nono, y) is

{{y/M1}}

�� !"��#��$�%�&('*)��+'

&

$

%


Let us now see why

Back-Chain-List(KB, [Owns(Nono, x1),Missile(x1)], {y/x1})

returns {{y/M1, x1/M1}}.

• Owns(Nono, x1) unifies with Owns(Nono,M1) with

substitution θi = {x1/M1}. Thus

COMPOSE({y/x1}, {x1/M1}) = {y/M1, x1/M1} is added to

answers (currently empty).

• The last line of Back-Chain-List calls Back-Chain-List

again with arguments KB, [Missile(M1)] and {y/M1, x1/M1}.

This call returns {{y/M1, x1/M1}} as we can see easily. Thus

this is also the result of Back-Chain-

List(KB, [Owns(Nono, x1),Missile(x1)], {y/x1})

'

&

$

%

Example 4

Let KB be

p(3) =⇒ p(3)

and the query q be p(x).



KB, [p(x)] and {}.

• p(x) unifies with

p(3) =⇒ p(3)

with mgu θi = {x/3}. Then Back-Chain-List is called again

with arguments KB, [p(3)] and {x/3}.

�� !"��#��$�%�&('*)��+'

&

$

%


• p(3) unifies with the conclusion of the rule

p(3) =⇒ p(3)

with mgu θi = {x/3}. Then Back-Chain-List is called again

with arguments KB, [p(3)] and {x/3}. At this point the

algorithm goes into an infinite loop!

This example shows that “bad” knowledge bases can make

algorithm Back-Chain go into an infinite loop.

'

&

$

%

Proof Trees for Backward Chaining

Example: (slightly modified this time!)


nations”:

American(x) ∧ Weapon(y) ∧

Sells(x, y, z) ∧ Hostile(z) =⇒ Criminal(x)


Missile(x) =⇒ Weapon(x)



�� !"��#��$�%�&('*)��+'

&

$

%

Proof Trees (cont’d)


Missile(x) ∧ Owns(Nono, x) =⇒ Sells(West,Nono, x)


Enemy(x,America) =⇒ Hostile(x)



Enemy(Nono,America)

'

&

$

%

Proof Trees (cont’d)

Hostile(Nono)

Enemy(Nono,America)Owns(Nono,M1)Missile(M1)

Criminal(West)

Missile(y)

Weapon(y) Sells(West,M1,z)American(West)

y/M1{ } { }{ }{ }

{ } z/Nono{ }

�� !"��#��$�%�&('*)��+'

&

$

%

Soundness and Completeness of GMP

An inference rule i is called sound if it derives only sentences that

are entailed. In other words, if KB ì α then KB |= α.

An inference mechanism is called complete if it derives all the

sentences that are entailed. In other words, if KB |= α then

KB ì α.

Is GMP a sound and complete inference rule for FOL?

'

&

$

%

Soundness and Completeness of GMP (cont’d)

Theorem. GMP is a sound inference rule. Proof?

Example:

(∀x) P (x) =⇒ Q(x)

(∀x) P (x) ∨ R(x)

(∀x) Q(x) =⇒ S(x)

(∀x) R(x) =⇒ S(x)

The above KB entails S(A) but GMP will not be able to infer it.

Thus GMP is an incomplete inference rule for FOL.

�� !"��#��$�%�&('*)��+'

&

$

%

Readings

• AIMA, Chapter 9.

'

&

$

%

Sound and Complete Inference Rules in FOL

An inference procedure i is called sound if

KB |= α whenever KB ì α

An inference procedure i is called complete if

KB ì α whenever KB |= α

Generalised Modus-Ponens (equivalently, forward or backward

chaining) is sound and complete for Horn KBs but incomplete

for general first-order logic.

�� !"��#��$�%�&('*)��+'

&

$

%

Example

Let us consider the following formulas:

PhD(x) =⇒ HighlyQualified(x)

¬PhD(x) =⇒ EarlyEarnings(x)

HighlyQualified(x) =⇒ Rich(x)

EarlyEarnings(x) =⇒ Rich(x)

From the above we should be able to infer Rich(Me), but GMP

won’t do it!

Is there a complete inference procedure for FOL?

'

&

$

%

The Resolution Inference Rule

Basic propositional version:

α ∨ β, ¬β ∨ γ

α ∨ γor equivalently

¬α =⇒ β, β =⇒ γ

¬α =⇒ γ

�� !"��#��$�%�&('*)��+

'

&

$

%

The Resolution Inference Rule - FOL version

p1 ∨ . . . pj . . . ∨ pm, q1 ∨ . . . qk . . . ∨ qnSUBST (σ, (p1 ∨ . . . pj−1 ∨ pj+1 . . . pm ∨ q1 . . . qk−1 ∨ qk+1 . . . ∨ qn))

where UNIFY (pj ,¬qk) = σ.

Note: σ is the most general unifier (MGU) of pj and q′k. The literals pj

and qk are called complementary literals because each one unifies with

the negation of the other. The resulting disjunction is called a resolvent.

'

&

$

%

Examples

¬Rich(x) ∨ Unhappy(x), Rich(Me)

Unhappy(Me)

with MGU σ = {x/Me}

�� !"��#��$�%�&('*)��+'

&

$

%

Examples (cont’d)

PhD(x) =⇒ HighlyQualified(x)

¬PhD(x) =⇒ EarlyEarnings(x)

HighlyQualified(x) =⇒ Rich(x)

EarlyEarnings(x) =⇒ Rich(x)

Let us try resolution to infer Rich(Me)!

The standard way of showing that KB ` φ by resolution is to add

¬φ to the KB and show that we can reach the empty clause by

repeated application of the resolution rule.

In our case, we add ¬Rich(Me).

'

&

$

%

Examples (cont’d)

Let us first write all our formulas as disjunctions:

¬PhD(x) ∨ HighlyQualified(x)

PhD(x) ∨ EarlyEarnings(x)

¬HighlyQualified(x) ∨ Rich(x)

¬EarlyEarnings(x) ∨ Rich(x)

¬Rich(Me)

Now we can apply resolution repeatedly.

�� !"��#��$�%�&('*)��+'

&

$

%

Examples (cont’d)

From

¬Rich(Me)

and

¬HighlyQualified(z) ∨ Rich(z)

with MGU σ = {z/Me}, we infer

¬HighlyQualified(Me).

'

&

$

%

Examples (cont’d)

From

¬Rich(Me)

and

¬EarlyEarnings(w) ∨ Rich(w)

using MGU σ = {w/Me}, we infer

¬EarlyEarnings(Me).

�� !"��#��$�%�&('*)��+'

&

$

%

Examples (cont’d)

From

¬PhD(x) ∨ HighlyQualified(x)

and

PhD(y) ∨ EarlyEarnings(y)

with MGU σ = {x/y}, we infer

HighlyQualified(y) ∨ EarlyEarnings(y).

'

&

$

%

Examples (cont’d)

From

HighlyQualified(v) ∨ EarlyEarnings(v)

and

¬EarlyEarnings(Me)

using MGU σ = {v/Me}, we infer

HighlyQualified(Me).

�� !"��#��$�%�&('*)��+'

&

$

%

Examples (cont’d)

From

HighlyQualified(Me)

and

¬HighlyQualified(Me)

using MGU σ = {}, we infer the empty clause. Thus we have

reached a contradiction!

'

&

$

%

Conjunctive Normal Form

To be able to do resolution, the given formulas have to be in

conjunctive normal form.

Definition. A literal is an atomic formula or the negation of an

atomic formula. An atomic formula is also called a positive

literal, and the negation of an atomic formula is called a negative

literal. A clause is a disjunction of literals. There is a special

clause called empty which is equivalent to false.

Definition. A FOL formula is in conjunctive normal form

(CNF) if it is a conjunction of disjunctions of literals (equivalently

if it is a set of clauses).

Proposition. Every FOL formula is equivalent to a formula in

CNF.

�� !"��#��$�%�&('*)��+'

&

$

%

Conversion to CNF

1. Eliminate equivalences and implications using the laws:

(φ ⇐⇒ ψ) ≡ (φ =⇒ ψ ∧ ψ =⇒ φ)

φ =⇒ ψ ≡ ¬φ ∨ ψ

2. Move ¬ inwards using the equivalences

¬(φ ∨ ψ) ≡ ¬φ ∧ ¬ψ

¬(φ ∧ ψ) ≡ ¬φ ∨ ¬ψ

¬(∀x)φ ≡ (∃x)¬φ

¬(∃x)φ ≡ (∀x)¬φ

¬¬φ ≡ φ

'

&

$

%

Conversion to CNF (cont’d)

3. Rename variables so that each quantifier has a unique

variable.

3. Eliminate existential quantifiers.

If an existential quantifier does not occur in the scope of a

universal quantifier, we simply drop the quantifier and replace

all occurences of the quantifier variable by a new constant

called a Skolem constant.

If an existential quantifier ∃x is within the scope of universal

quantifiers ∀y1, . . . , ∀yn, we drop the quantifier and replace all

occurences of the quantifier variable x by the term f(y1, . . . , yn)

where f is a new function symbol called a Skolem function.

�� !"��#��$�%�&('*)��+'

&

$

%

Conversion to CNF (cont’d)

5. Drop all universal quantifiers.

6. Distribute ∧ over ∨ using the equivalence

(φ ∧ ψ) ∨ θ ≡ (φ ∨ θ) ∧ (ψ ∨ θ)

7. Flatten nested conjunctions or disjunctions. Then write each

disjunction on a separate line and standardize variables apart

(i.e., make sure disjunctions use different variables).

'

&

$

%

Example

Let us convert to CNF the following sentence:

(∀x)((∀y)P (x, y) =⇒ ¬(∀y)(Q(x, y) =⇒ R(x, y)))

1. Eliminate implications:

(∀x)(¬(∀y)P (x, y) ∨ ¬(∀y)(¬Q(x, y) ∨ R(x, y)))

2. Move ¬ inwards:

(∀x)((∃y)¬P (x, y) ∨ (∃y)(Q(x, y) ∧ ¬R(x, y)))

�� !"��#��$�%�&('*)��+'

&

$

%

Example (cont’d)

3. Rename variables:

(∀x)((∃y)¬P (x, y) ∨ (∃z)(Q(x, z) ∧ ¬R(x, z)))

4. Skolemize:

(∀x)(¬P (x, F1(x)) ∨ (Q(x, F2(x)) ∧ ¬R(x, F2(x))))

5. Drop universal quantifiers:

¬P (x, F1(x)) ∨ (Q(x, F2(x)) ∧ ¬R(x, F2(x)))

6. Distribute ∧ over ∨ :

(¬P (x, F1(x)) ∨ Q(x, F2(x))) ∧ (¬P (x, F1(x)) ∨ ¬R(x, F2(x)))

7. Final form:

¬P (x, F1(x)) ∨ Q(x, F2(x))

'

&

$

%

¬P (x, F1(x)) ∨ ¬R(x, F2(x))

�� !"��#��$�%�&('*)��+'

&

$

%

Resolution: Soundness and Refutation-Completeness

Theorem. (Soundness)

Let KB be a knowledge base. If φ can be proved from KB using

resolution then KB |= φ.

Theorem. (Refutation-completeness)

If a set ∆ of clauses is unsatisfiable then resolution will derive the

empty clause from ∆.

Note: The above theorem holds only if ∆ does not involve equality.

Methodology: If we are asked to prove KB |= α then we negate α

and show that KB ∧ ¬α is unsatisfiable using resolution.

'

&

$

%

Example 1

The crime example we saw in a previous lecture:

The law says that it is a crime for an American to sell

weapons to hostile nations. The country Nono, an enemy

of America, has some missiles, and all of its missiles were

sold to it by Colonel West, who is an American.

Use resolution to conclude that West is a criminal.

�� !"��#��$�%�&('*)��+'

&

$

%

Example 1: Formalization in FOL

• “... it is a crime for an American to sell weapons to hostile nations”:

(∀x, y, z) (American(x) ∧ Weapon(y) ∧ Nation(z) ∧

Hostile(z) ∧ Sells(x, z, y) =⇒ Criminal(x))


(∃x) (Owns(Nono, x) ∧ Missile(x))


(∀x) (Owns(Nono, x) ∧ Missile(x) =⇒ Sells(West,Nono, x))

'

&

$

%

Example 1: Formalization in FOL (cont’d)


(∀x) (Missile(x) =⇒ Weapon(x))


(∀x) (Enemy(x,America) =⇒ Hostile(x))





�� !"��#��$�%�&('*)��+'

&

$

%

Example 1: CNF form


nations”:

¬American(x) ∨ ¬Weapon(y) ∨ ¬Sells(x, y, z)∨

¬Hostile(z) ∨ Criminal(x)




¬Missile(x) ∨ ¬Owns(Nono, x) ∨ Sells(West, x,Nono)

'

&

$

%

Example 1: CNF form (cont’d)


¬Missile(x) ∨ Weapon(x)


¬Enemy(x,America) ∨Hostile(x)

• “West, who is an American”:

American(West)

• “The country Nono ...”:

Nation(Nono)


�� !"��#��$�%�&('*)��+'

&

$

%


'

&

$

%

Example 1: Proof

American(West)

Missile(M1)

Missile(M1)

Owns(Nono,M1)

Enemy(Nono,America) Enemy(Nono,America)

Criminal(x)Hostile(z)LSells(x,y,z)LWeapon(y)LAmerican(x)L > > > >

Weapon(x)Missile(x)L >

Sells(West,x,Nono)Missile(x)L Owns(Nono,x)L> >

Hostile(x)Enemy(x,America)L >

Sells(West,y,z)LWeapon(y)LAmerican(West)L > > Hostile(z)L>

Sells(West,y,z)LWeapon(y)L > Hostile(z)L>

Sells(West,y,z)L> Hostile(z)L>L Missile(y)

Hostile(z)L>L Sells(West,M1,z)

> > L Hostile(Nono)L Owns(Nono,M1)L Missile(M1)

> L Hostile(Nono)L Owns(Nono,M1)

L Hostile(Nono)

Criminal(West)L

�� !"��#��$�%�&('*)��+'

&

$

%

Example 2

Let us assume that we know the following:

Everyone who loves animals is loved by someone.

Anyone who kills an animal is loved by no one.

Jack loves all animals.

Either Jack or Curiosity killed the cat, who is named Tuna.

From the above facts, can we prove that Curiosity killed Tuna?

'

&

$

%


• Everyone who loves animals is loved by someone.

(∀x)((∀y)(Animal(y) =⇒ Loves(x, y)) =⇒ (∃y)Loves(y, x) )

• Anyone who kills an animal is loved by no one.

(∀x)((∃y)(Animal(y) ∧ Kills(x, y)) =⇒ (∀z)¬Loves(z, x))

• Jack loves all animals.

(∀x)(Animal(x) =⇒ Loves(Jack, x))

�� !"��#��$�%�&('*)��+'

&

$

%


• Either Jack or Curiosity killed the cat ...

Kills(Jack, Tuna) ∨ Kills(Curiosity, Tuna)

• ... who is named Tuna.

Cat(Tuna)

We will also need the formula

(∀x)(Cat(x) =⇒ Animal(x))

which is background knowledge.

The negation of the formula to be proved is:

¬Kills(Curiosity, Tuna)

'

&

$

%

Example 2: Proof

Kills(Curiosity,Tuna)LAnimal(x)>Cat(x)LCat(Tuna)

Animal(Tuna) Kills(Jack,Tuna)

>Kills(Jack,Tuna} Kills(Curiosity,Tuna)

Animal(F(x)) Loves(G(x),x)>Loves(G(x),x)Loves(x, F(x))L >

Kills(x,z)LAnimal(z)LLoves(y,x)L >>

Loves(G(Jack),Jack)Animal(F(Jack))L >

L Kills(x,Tuna)>Loves(y,x)L > Loves(Jack,x)Animal (x)L

L Loves(y,Jack)

Loves(G(Jack),Jack)

�� !"��#��$�%�&('*)��+'

&

$

%

Resolution, Validity and Unsatisfiability

Questions:

• How do we use resolution to prove that the following formula is

valid?

Happy(John) ∨ ¬Happy(John)

• How do we use resolution to prove that the following formula is

unsatisfiable?

Happy(John) ∧ ¬Happy(John)

'

&

$

%

Fill-in-the-Blank Questions

So far we have used resolution to see that something follows from a

KB. We can also use resolution to answer questions about facts

that follow from a KB. In the previous example, we can use

resolution to find the answer to the question: Who killed Tuna?

This can be expressed using a free variable and writing the

fill-in-the-blank query Kills(x, Tuna).

Definition. An answer literal for a fill-in-the-blank query φ is an

atomic formula of the form Ans(v1, . . . , vn) where the variables

v1, . . . , vn are the free variables in φ.

�� !"��#��$�%�&('*)��+'

&

$

%

Fill-in-the-Blank Questions (cont’d)

To answer the fill-in-the-blank query φ we form the disjunction

Ans(v1, . . . , vn) ∨ ¬φ

and convert it to CNF.

Then we use resolution and terminate our search when we reach a

clause containing only answer literals (instead of terminating when

we reach the empty clause).

'

&

$

%

Fill-in-the-Blank Questions (cont’d)

For fill-in-the-blank questions, we can have:

• Termination with a clause which is a single answer literal

Ans(c1, . . . , cn). In this case, the constants c1, . . . , cn gives us

an answer to the query. There might be more answers

depending on whether there are more resolution refutations of

Ans(v1, . . . , vn) ∨ ¬φ. We can go on looking for more answers

but we can never be sure that we have found them all.

• Termination with a clause which is a disjunction of more than

one answer literals. In this case, one of the answer literals

contains the answer but we cannot say which one for sure.

�� !"��#��$�%�&('*)��+'

&

$

%

Dealing with Equality

If we want to use equality in our resolution proofs, we can do it in

two ways:

• Add appropriate formulas that axiomatize equality in our

KB. What are these formulas?

• Use special versions of resolution that take resolution into

account.

The same is true for other special predicates such as arithmetic

ones <, ≤ etc.

'

&

$

%

Computational Complexity and Resolution

Resolution proofs can in general be exponentially long as the

following theorem demonstrates.

Theorem (Haken, 1985). There is a sequence of PL formulas

p1, p2, p3, . . ., each a tautology, such that the number of symbols of

¬pn when converted to CNF is O(n3), but the shortest resolution

refutation of it contains at least cn symbols (for a fixed c > 1).

There are various strategies that can be applied to make resolution

more efficient (unit preference, set of support, input resolution,

subsumption).

�� !"��#��$�%�&('*)��+'

&

$

%

Other Normal Forms: DNF

Definition. A FOL formula is in disjunctive normal form

(DNF) if it is a disjunction of conjunctions of literals.


DNF.

'

&

$

%

Other Normal Forms: PNF

Definition. A FOL formula is in prenex normal form (PNF) if

all its quantifiers appear at the front of the formula.


PNF.

�� !"��#��$�%�&('*)��+'

&

$

%

Conversion to Prenex Normal Form

• Steps 1 and 2 of conversion to CNF.

• Move quantifiers to the front of the formula using the

equivalences

(∀x)(φ ∧ ψ) ≡ (∀x)φ ∧ ψ

(∀x)(φ ∨ ψ) ≡ (∀x)φ ∨ ψ

(∃x)(φ ∧ ψ) ≡ (∃x)φ ∧ ψ

(∃x)(φ ∨ ψ) ≡ (∃x)φ ∨ ψ

The above equivalences hold only if x does not appear free in ψ.

Step 1 and 2 are not necessary if we introduce equivalences for the

rest of the connectives.

'

&

$

%

A Brief History of Reasoning

450b.c. Stoics propositional logic, inference (maybe)

322b.c. Aristotle “syllogisms” (inference rules), quantifiers

1847 Boole propositional logic (again)

1879 Frege first-order logic

1922 Wittgenstein proof by truth tables

1930 Godel ∃ complete algorithm for proofs in FOL

1930 Herbrand complete algorithm for proofs in FOL

(reduce to propositional)

1931 Godel ¬∃ complete algorithm for arithmetic proofs

1960 Davis/Putnam “practical” algorithm for propositional logic

1965 Robinson “practical” algorithm for FOL—resolution

�� !"��#��$�%�&('*)��+'

&

$

%

Soundnes and Completeness of FOL Inference

Theorem. (Godel, 1930)

KB |= φ iff KB ` φ.

Theorem. Checking entailment (equivalently: validity or

unsatisfiability or provability) of a FOL formula is a recursively

enumerable problem.

'

&

$

%

Informal Definitions

A yes/no problem P is called recursive or decidable if there is an

algorithm that, given input x, outputs “yes” and terminates

whenever x ∈ P , and “no” and terminates when x 6∈ P .

A yes/no problem P is called recursively enumerable or

semi-decidable if there is an algorithm that, given input x,

outputs “yes” and terminates whenever x ∈ P but computes for

ever when x 6∈ P .

The above algorithm is not a very useful because, if it has not

terminated, we cannot know for sure whether we have waited long

enough to get an answer.

�� !"��#��$�%�&('*)��+'

&

$

%

Godel’s Incompleteness Theorem

Theorem. (Godel, 1930)

For any set A of true sentences of number theory, and, in particular,

any set of basic axioms, there are other true sentences of

number arithmetic that cannot be proved from A.

Sad conclusion: We can never prove all the theorems of

mathematics within any given system of axioms.

'

&

$

%

Soundness and Completeness (cont’d)

Theorem. (Herbrand, 1930)

If a finite set ∆ of clauses is unsatisfiable then the Herbrand base of

∆ is unsatisfiable.

Theorem. (Robinson, 1965)

Soundness of Resolution. If there is a resolution refutation of a

clause φ from a set of clauses KB then KB |= φ.

Theorem. (Robinson, 1965)

Completeness of Resolution. If a set of clauses KB is

unsatisfiable then there is a resolution refutation of the empty

clause from KB.

�� !"��#��$�%�&('*)��+'

&

$

%

Soundness and Completeness (cont’d)

Question: How can we use a complete proof procedure to

determine whether a sentence φ is entailed by a set of sentences

KB?

Answer: We can negate φ, add it to KB and then use resolution.

But we will not know whether KB |= φ until resolution finds a

contradiction and returns.

While resolution has not returned, we do not know whether the

system has gone into a loop or the proof is about to pop out!!!

'

&

$

%

Some Good News

There are many interesting subsets of FOL that are decidable (e.g.,

monadic logic, Horn logic etc.).

Many practical problems can be encoded in these subsets!

�� !"��#��$�%�&('*)��+'

&

$

%

Knowledge-Based Agents

function KB-Agent(percept) returns an action




action← Ask(KB,Make-Action-Query(t))


t← t+ 1

return action

Using the FOL machinery we presented, how can we implement

knowledge-based agents?

'

&

$

%

Logical Reasoning Systems

• Logic programming languages (most notably Prolog).

Prolog was developed in 1972 by Alain Colmerauer and it is based on the

idea of backward chaining. Prolog’s motto (after Kowalski) is:

Algorithm = Logic + Control

Logic programming and Prolog was the basis of much exciting research and

development in logic programming in the 70’s and 80’s.

Logic programming and its extensions is still a very lively area of research

that has been applied in many areas (databases, natural language processing,

expert systems etc.). Of particular, importance is constraint logic

programming (CLP) that integrates logic programming with CSPs. CLP has

been used with success recently in many combinatorial optimisation

applications (e.g., scheduling, planning, etc.)

See www.afm.sbu.ac.uk/logic-prog/ for various Prolog implementations.

�� !"��#��$�%�&('*)��+'

&

$

%

Logical Reasoning Systems (cont’d)

• Production systems based on the idea of forward-chaining (where

the conclusion of an implication is interpreted as an action to be

executed).

Production systems were used a lot in early AI work (particularly in

rule-based expert systems).

There are various implemented production systems such as OPS-5 or

CLIPS (see http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/

areas/expert/systems/clips/0.html).

'

&

$

%

Logical Reasoning Systems (cont’d)

• Theorem provers are more powerful tools than Prolog since they can

deal with full first-order logic.

Examples: OTTER, PTTP, etc.

Theorem provers have come up with novel mathematical results (lattice

theory, a formal proof of Godel’s incompleteness theorem, Robbins

algebra).

They are also used in verification and synthesis of both hardware and

software because both domains can be given correct axiomatizations.

�� !"��#��$�%�&('*)��+'

&

$

%

Readings

• AIMA, Chapter 9.

• M. Genesereth and N. Nilsson. “Logical Foundations of

Artificial Intelligence”, Chapter 4.

'

&

$

%

Advanced FOL for KR

• Power and limitations

• Non-monotoning reasoning

• FOL and relational databases

�� !"��#��$�%�&('*)��+'

&

$

%

The Power and Limitations of FOL

The syntax, semantics and proof-theory of pure FOL offer us a

general, flexible and powerful framework for KR.

FOL has weaknesses too.

Because FOL is very general and it is based on very primitive concepts

(constants, variables, function symbols, predicates and quantifiers), it

offers no explicit help for defining higher-level abstractions:

• taxonomic information

• physical composition

• measurements

• events, actions, processes, plans, time, space, causality

'

&

$

%

The Power and Limitations of FOL (cont’d)

FOL does not allow:

• non-monotonicity

• belief revision

• uncertainty

This is a serious weakness of FOL and has been addressed by more

appropriate KR formalisms (in many cases extensions of FOL

itself).

Some formalisms for non-monotonic reasoning are presented later.

�� !"��#��$�%�&('*)��+'

&

$

%

Taxonomic Information in FOL

Mammals

Persons

Female Persons

Mary John

Male Persons

Legs

2 HasMother

SubsetOf

SubsetOf SubsetOf

MemberOf MemberOf

SisterOf Legs 1

'

&

$

%

Taxonomic Information

The concept of a category or class is an important abstraction in

knowledge representation and reasoning. Categories can be

organized into taxonomies.

Taxonomies have been used profitably for centuries in various

technical fields (biology, medicine, library science etc.).

Taxonomic information plays a central role in various database and

object-oriented models.

Taxonomies are very important in modern web applications:

knowledge management, information retrieval and dissemination,

information integration, e-commerce, e-science, etc.

�� !"��#��$�%�&('*)��+'

&

$

%

Categories in FOL

FOL offers us two ways to talk about categories:

• Predicates. For example: Person(x) or Basketball(x)

• Constants (through reification). For example: Persons or

Basketballs.

In this case we also need predicates for membership and

subclass: MemberOf (or ∈) and SubsetOf (or ⊂).

Both of the above ways are needed! But we have to be careful

when defining the semantics of the resulting languages.

Other issues: inheritance, disjointness and partitioning

'

&

$

%

Examples

• An object is a member of a category.

MemberOf(BB12, BasketBalls)

• A category is a subclass of another category.

SubsetOf(BasketBalls, Balls)

• All members of a category have some properties.

(∀x)(MemberOf(x,BasketBalls)⇒ Round(x))

�� !"��#��$�%�&('*)��+'

&

$

%

Examples (cont’d)

• All members of a category can be recognized by some

properties.

(∀x)(Orange(x) ∧ Round(x) ∧ Diameter(x) = 9.5′′

∧MemberOf(x,Balls)⇒MemberOf(x,BasketBalls))

• A category as a whole has some properties.

MemberOf(Dogs,DomesticatedSpecies)

In this case DomesticatedSpecies is a category of

categories.

'

&

$

%

Examples (cont’d)

Can we have categories of categories of categories? Are they

useful?

In various OO modeling frameworks (e.g., Telos) we have 4+ levels

of data modeling:

• Instances (e.g., John)

• Classes (e.g., Person)

• Meta-classes (e.g., the class of all classes with no instances).

• Meta-meta-classes (e.g., the class of all meta-classes we have

defined).

�� !"��#��$�%�&('*)��+'

&

$

%

Examples (cont’d)

In the Information Resource Dictionary Standard (IRDS) we have

4 levels of data description:

• Level 1: Application data (e.g., code).

• Level 2: Data dictionary for application data.

• Level 3: Schema of the data dictionary.

• Level 4: Different types of IRDS schemas.

'

&

$

%

Other Relations Among Categories

Often we want to say that two categories are disjoint or that they

form an exhaustive decomposition of some other category or

that they form a partition of some other category.

Examples:

Disjoint({Animals, V egetables})

ExhaustiveDecomposition({Americans, Canadians,Mexicans},

NorthAmericans)

Partition({Males, Females}, Animals)

�� !"��#��$�%�&('*)��+'

&

$

%

Definitions

The three predicates used above can be defined as follows:

Disjoint(s) ≡

(∀c1, c2)(c1 ∈ s ∧ c2 ∈ s ∧ c1 6= c2 ⇒ Intersection(c1, c2) = {})

ExhaustiveDecomposition(s, c) ≡ (∀i)(i ∈ c⇒ (∃c2)(c2 ∈ s ∧ i ∈ c2))

Partition(s, c) ≡ Disjoint(s) ∧ ExhaustiveDecomposition(s, c)

'

&

$

%

Categories and Definitions

Some categories can be given “if and only if” definitions.

Example: An object is a triangle if and only if it is a polygon

with three sides.

Natural kind categories cannot be defined in this way.

Example: Try to define tomatoes with an “if and only if”

definition.

For natural kind categories, we can write down “if and only if”

definitions that hold for typical instances.

�� !"��#��$�%�&('*)��+'

&

$

%

Physical Composition

The idea that one object is part of another is an important one in

many applications (e.g., engineering design or e-commerce

catalogs). We use the general predicate PartOf to represent such

information.

Example:

PartOf(Athens,Greece), PartOf(Greece,WesternEurope)

PartOf(WesternEurope, Europe), PartOf(Europe, Earth)

The relation PartOf is irreflexive and transitive:

(∀x)(¬PartOf(x, x))

(∀x, y, z)(PartOf(x, y) ∧ PartOf(y, z)⇒ PartOf(x, z))

Thus we can conclude: PartOf(Athens,Earth).

'

&

$

%

Physical Composition (cont’d)

Categories of composite objects are often characterized by the

structure of those objects i.e., the parts and how the parts relate

to the whole.

Example: How can we define a biped?

�� !"��#��$�%�&('*)��+'

&

$

%

Defining a biped

Biped(a) ≡

(∃l1, l2, b)(Leg(l1) ∧ Leg(l2) ∧ Body(b) ∧

PartOf(l1, a) ∧ PartOf(l2, a) ∧ PartOf(b, a) ∧

Attached(l1, b) ∧ Attached(l2, b) ∧

l1 6= l2 ∧ (∀l3)(Leg(l3)⇒ (l3 = l1 ∨ l3 = l2)))

Description logics are a particular kind of logics for KR that

allow us to write definitions such as the above more easily.

'

&

$

%

The Power and Limitations of FOL (Revisited)

We mentioned that FOL provides no explicit support for the

definition of:

1. taxonomic information and categories

2. physical composition

3. measurements

4. events, actions, processes, plans, time, space and causality

We have now shown how FOL can be used to represent knowledge

for the first two of the above cases. The reader interested in Cases

3 and 4 should see Chapters 10-13 of the book AIMA2ed.

�� !"��#��$�%�&('*)��+'

&

$

%

Monotonicity of FOL

Theorem. Let KB be a set of FOL formulas and α, β two

arbitrary FOL formulas. If KB |= α then KB ∪ {β} |= α.

The above theorem captures the monotonicity property of FOL.

The monotonicity of FOL becomes a very awkward feature in the

following cases:

• Closed world reasoning.

• When we want to represent defaults, exceptions or

qualifications.

• When we want to revise our beliefs in the presence of new

knowledge.

'

&

$

%

Closed World Reasoning

Example: Imagine the course schedule of a university department

available on the Web. How would you represent all relevant

information about who teaches what course in FOL?

You might have something like:

Teaches(Alex, CS100), T eaches(Bob, P100),

T eaches(Charlie, P200)

Now answer the following question:

• Who is teaching CS100?

The answer to this question is “Alex” as we can see from the above

KB.

�� !"��#��$�%�&('*)��+'

&

$

%

Closed World Reasoning (cont’d)

Now answer the following questions:

• Is Bob teaching CS100?

• Is Alex teaching CS200?

Assuming that the schedule is complete, the answer to both of

these question is “no” but this is not explicit in the schedule KB!

Here we have a situation where in the absence of information

to the contrary we assume that Bob is not teaching CS100 and

Alex is not teaching CS200.

This is how we interpret answers to queries in relational databases

as well.

This kind of reasoning is called non-monotonic and cannot be

supported directly by FOL.

'

&

$

%

The Closed World Assumption

In traditional relational databases and many knowledge bases it is

natural to make the assumption that available information is

complete.

Let KB be a knowledge base and φ a ground atomic

sentence. If KB 6|= φ then assume φ to be false.

The above assumption is usually called the closed world

assumption (CWA) originally proposed by Ray Reiter in 1978.

CWA is a non-monotonic KR feature.

�� !"��#��$�%�&('*)��+'

&

$

%

The CWA More Formally

Let KB be a knowledge base (i.e., a set of FOL formulas).

Let Closure(KB) be the closure of KB under logical entailment:

Closure(KB) = {φ : KB |= φ}

Let

KBasm = {¬ψ : ψ is ground and KB 6|= ψ}

denote the set of assumptions. Then the completion of KB

under the CWA is defined as follows:

CWA(KB) = Closure(KB) ∪KBasm

Exercise: Apply the CWA to the course KB of the previous

example.

'

&

$

%

Problems with the CWA

The CWA can result in inconsistencies (this depends critically on

syntactic features e.g., what formulas we have in the KB).

Example: Let KB be

Professor(John) ∨ Professor(Mary).

Then

CWA(KB) = {¬Professor(John), ¬Professor(Mary)}

and KB ∪ CWA(KB) is inconsistent.

Theorem. If the CNF of KB consists only of Horn clauses and is

consistent, then the CWA(KB) is consistent.

�� !"��#��$�%�&('*)��+'

&

$

%

Revising our Beliefs

Now assume that we have just learned that Alex teaches the course

CS200 as well.

The KB should now become:

Teaches(Alex, CS100), T eaches(Bob, P100),

T eaches(Charlie, P200), T eaches(Alex, CS200)


• Is Alex teaching CS200?

The answer to this question now is “yes” and it is different than

the answer we got previously.

This kind of reasoning is non-monotonic.

'

&

$

%

Defaults

Example: By default, persons have two legs.

How do we represent this information in FOL?

The sentence

(∀x)(Person(x)⇒ Legs(x, 2))

is an approximation. It is not entirely appropriate because it

talks about all persons.


• How many legs does John have?

The answer to this question is “two” since this is what the above

KB gives us.

�� !"��#��$�%�&('*)��+'

&

$

%

Revising our Beliefs

Now assume that we have just learned that John actually has one

leg only.

We should now be able to update the previous KB and in this way

revise our beliefs about John.

As a result, the answer to the previous question should become

“one” and it is different than the answer we got previously.

This kind of reasoning is non-monotonic.

'

&

$

%

Defaults, Exceptions or Qualifications

So how do we update the KB?

The sentence

(∀x)(Person(x)⇒ Legs(x, 2))

could be modified to become

(∀x)(Person(x) ∧ x 6= John⇒ Legs(x, 2)).

If we adopt this representation, we need to write down an

exception for every atypical persons. This is usually called the

qualification problem.

�� !"��#��$�%�&('*)��+'

&

$

%

Defaults, Exceptions or Qualifications (cont’d)

Problem: How do we represent default information and at the

same time deal with exceptions and belief revision in a graceful

way?

This problem has been studied in detail in the area of AI called

non-monotoning reasoning and non-monotonic logics have

been invented.

'

&

$

%

FOL and Relational Databases - Example

TEACHER

NAME

Alex

Bob

Charlie

COURSE

NUMBER

CS100

CS200

P100

P200

STUDENT

NAME

John

Mary

Pam

Paul

�� !"��#��$�%�&('*)��+'

&

$

%

Example (cont’d)

TEACHES

NAME NUMBER

Alex CS100

Alex CS200

Bob P100

Charlie P200

ENROLLED

NAME NUMBER

John CS100

John P100

Mary CS100

Pam P100

Paul CS200

Paul P200

'

&

$

%

FOL and Relational Databases

How can we use concepts of FOL to understand the theory of

relational databases?

Two perspectives have been developed in the literature:

model-theoretic and proof-theoretic.

�� !"��#��$�%�&('*)��+'

&

$

%

The Model-Theoretic Perspective

For a given database DB, we can define a FO database language

LDB as follows:

• For each relation R in DB, we have a corresponding predicate

symbol PR of the same arity in LDB.

• For each attribute value v in a relation of DB, we have a

corresponding constant Cv in LDB.

• LDB has no function symbols.

'

&

$

%

The Model-Theoretic Perspective (cont’d)

The given database DB is considered to be an interpretation

IDB of LDB with the following properties:

• The universe of the interpretation is the set of all values in the

database.

• Each constant Cv is mapped to attribute value v.

• The interpretation of each predicate PR is given by the relation

R.

�� !"��#��$�%�&('*)��+'

&

$

%

Queries and Integrity Constraints

The language LDB can be used to write queries and integrity

constraints.

Queries:

x : Teacher(x) ∧ Teaches(x,CS100)

: Teaches(Charlie, CS100)

Integrity Constraints:

(∀x)(Course(x)⇒ (∃y)(Teacher(y) ∧ Teaches(y, x)))

(∀x)(Course(x)⇒ (∃y)(Student(y) ∧ Enrolled(y, x)))

'

&

$

%

Queries and Integrity Constraints (cont’d)

Answering a query q is equivalent to determining whether the

interpretation IDB satisfies q.

Verifying that an integrity constraint C holds is equivalent

to determining whether the interpretation IDB satisfies C.

�� !"��#��$�%�&('*)��+'

&

$

%

The Proof-Theoretic Perspective

Let DB be a given database. As in the model-theoretic perspective we

can define a FO language LDB .

We can now write a FO theory (i.e., a set of FO sentences) TDB that

corresponds to DB.

Example:

Teacher(Alex), T eacher(Bob), T eacher(Charlie)

Course(CS100), Course(CS200), Course(P100), Course(P200)

Teaches(Alex, CS100), T eaches(Alex,CS200)

Teaches(Bob, P100), T eaches(Charlie,C200)

...

'

&

$

%

Queries and Integrity Constraints

In the proof-theoretic perspective, the language LDB can again be

used to write queries and integrity constraints.

Queries:

x : Teacher(x) ∧ Teaches(x,CS100)

: Teaches(Charlie, CS100)

Integrity Constraints:

(∀x)(Course(x)⇒ (∃y)(Teacher(y) ∧ Teaches(y, x)))

(∀x)(Course(x)⇒ (∃y)(Student(y) ∧ Enrolled(y, x)))

Now answering a query q could be done by determining whether

q logically follows (equivalently: can be proven) from TDB . Let

us try ...

�� !"��#��$�%�&('*)��+'

&

$

%

Example

Database:





Queries:

: Teacher(Alex)

: (∃x)Course(x)

: (∃x, y)(Teacher(x) ∧ Course(y) ∧ Teaches(x, y))

We can use resolution to see that the answer to all of these queries

is “yes”.

'

&

$

%

Example (cont’d)

Database:





Queries:

: Teacher(CS100)

: ¬Teacher(CS100)

The answers to these queries are “no” and “yes” respectively. But

resolution will not help us in this case (try it!). What is the

problem?

�� !"��#��$�%�&('*)��+'

&

$

%

Implicit Assumptions in Databases

In our database we have made two assumptions silently:

• The information in the database is complete.

• Different constants name different objects (i.e., object CS100 is

different than objects Alex, Bob and Charlie).

How can we solve the problem formally?

We can use predicate completion, the unique names

assumption and some axioms for equality to capture the

above assumptions.

'

&

$

%

Predicate Completion

Let us consider the following simple KB:

Teacher(Alex)

This KB can be written equivalently as:

(∀x)(x = Alex⇒ Teacher(x))

The above formula can be taken as the “if” part of the definition

for predicate Teacher. The assumption that there are no other

teachers can now be captured by writing the “only if” part of the

definition:

(∀x)(Teacher(x)⇒ x = Alex)

�� !"��#��$�%�&('*)��+'

&

$

%

Predicate Completion (cont’d)

If our KB was

Teacher(Alex), T eacher(Bob)

then the “if” and “only if” forms can be combined as follows:

(∀x)(x = Alex ∨ x = Bob⇔ Teacher(x))

For a knowledge base KB and predicate P , we will denote the

completion of KB with respect to P as COMP (KB;P ).

'

&

$

%

The Unique Names Assumption

In many knowledge bases it is also natural to assume that distinct

names refer to distinct objects. This is usually called the

unique names assumption (UNA).

Example: Let KB be:

Teaches(Alex, CS100), T eaches(Bob, P100)

Then UNA(KB) is:

Alex 6= Bob, CS100 6= P100,

CS100 6= Bob, CS100 6= Alex,

P100 6= Bob, P100 6= Alex

�� !"��#��$�%�&('*)��+'

&

$

%

Example: Completion+UNA+Equality

Database:





Completion:

(∀x)(Teacher(x)⇔ (x = Alex ∨ x = Bob ∨ x = Charlie))

(∀x)(Course(x)⇔ (x = CS100 ∨ x = CS200 ∨ x = P100 ∨ x = P200))

UNA:

Alex 6= Bob, Alex 6= Charlie, . . . , P100 6= P200

'

&

$

%

Example (cont’d): Equality Axioms

Reflexivity:

(∀x)(x = x)

Commutativity:

(∀x, y)(x = y ⇒ y = x)

Transitivity:

(∀x, y, z)(x = y ∧ y = z ⇒ x = z)

�� !"��#��$�%�&('*)��+'

&

$

%

Example (cont’d)

Database:





Query:

: (∀x)(Teacher(x) ∨ Course(x))

The answer to this query is yes but resolution will not give it to us.

Also, predicate completion, the UNA and equality axioms will not

help us. What is the problem now?

'

&

$

%

The Domain Closure Assumption

In many knowledge bases it is natural to assume that the only

objects in the domain are the ones that can be named

using the constants and function symbols of the language.

This is usually called the domain closure assumption (DCA).

Example: Let KB be:

Teaches(Alex, CS100), T eaches(Bob, P100)

Then DCA(KB) is:

(∀x)(x = Alex ∨ x = Bob ∨ x = CS100 ∨ x = P100)

�� !"��#��$�%�&('*)��+'

&

$

%

Queries and Answers: Proof-Theoretic Perspective

Let DB be a database expressed as a FO theory TDB and q be a

query. To answer q we can decide whether q logically follows

(equivalently: can be proven) from:

• The completion of theory TDB .

• UNA

• DCA

• The equality axioms for reflexivity, commutativity and

transitivity.

'

&

$

%

Predicate Completion in General

Definition. Let KB be a set of clauses. We will say that KB is

solitary in P if each clause with a positive occurence of P has at

most one occurrence of P .

Example:

Q(A) ∨ P (A) ∨ R(A), Q(A) ∨ ¬P (B) ∨ P (A)

The first clause is solitary in P but not the second.

�� !"��#��$�%�&('*)��+'

&

$

%

Predicate Completion in General (cont’d)

We will define predicate completion for P only for clauses solitary

in P . We can write each such solitary clause as

(∀y)(Q1 ∧ · · · ∧ Qm ⇒ P (t))

where t is an n-tuple (t1, . . . , tn) of terms.

There may be no Qi in which case the clause is just P (t). The Qi

and t may contain variables, let us say the tuple of variables y.

'

&

$

%


The above formula is equivalent to

(∀x)(∀y)(x = t ∧ Q1 ∧ · · · ∧ Qm ⇒ P (x))

where x is a tuple of variables not occurring in t and x = t is an

abbreviation for the conjunction

x1 = t1 ∧ · · · ∧ xn = tn.

�� !"��#��$�%�&('*)��+'

&

$

%


Since the variables y now occur only in the antecedent of the

implication, the above is equivalent to:

(∀x)(∃y)(x = t ∧ Q1 ∧ · · · ∧ Qm ⇒ P (x))

'

&

$

%


Let us suppose we have exactly k clauses solitary in P in our

knowledge base. Then we will transform these clauses as above to

arrive at:

(∀x)(E1 ⇒ P (x))

(∀x)(E2 ⇒ P (x))

...

(∀x)(En ⇒ P (x))

or equivalently

(∀x)(E1 ∨ E2 . . . ∨ En ⇒ P (x))

This is the “if” part of the definition of P .

�� !"��#��$�%�&('*)��+'

&

$

%


The “only if” completion of P then is:

(∀x)(P (x)⇒ E1 ∨ E2 . . . ∨ En)

Definition. Let KB be a set of clauses all of them solitary in

predicate P . The completion of P in KB (denoted by

COMP (KB;P )) is defined as follows:

KB ∧ (∀x)(E1 ∨ E2 . . . ∨ En ⇔ P (x))

'

&

$

%

Example

(∀x)(Ostrich(x)⇒ Bird(x))

Bird(Tweety)

¬Ostrich(Sam)

The above knowledge base KB represents the following

information:

All ostriches are birds. Tweety is a bird. Sam is not an

ostrich.

�� !"��#��$�%�&('*)��+'

&

$

%

Example (cont’d)

Then COMP (KB;P ) allows us to assume that the only birds are

the ones that the KB tell us about.

Thus we can conclude ¬Bird(Sam) because

COMP (KB;P ) + UNA+DCA+ Equality Axioms |= ¬Bird(Sam).

This conclusion can later on be retracted if we discover that Sam

is actually a bird. Thus predicate completion allows us to do useful

non-monotonic reasoning even in situations where we have a

KB which is more complex than a relational DB.

Predicate completion provides the basis for the semantics of

negation-as-failure in logic programming e.g., Prolog (Clarke,

1978).

'

&

$

%

Readings

• Stuart Russell and Peter Norvig. Artificial Intelligence: A

Modern Approach, Prentice Hall, 2nd edition (2002).

www.cs.berkeley.edu/~russell/aima.html.

Chapter 10.

• Michael R. Genesereth and Nils J. Nilsson. Logical

Foundations of Artificial Intelligence, Morgan Kaufmann, 1987.

Chapter 6.

• Ray Reiter. Towards a Logical Reconstruction of Relational

Database Theory. In M. L. Brodie, J. Mylopoulos and J. W.

Schmidt (eds.) On Conceptual Modelling: Perspectives from

Artificial Intelligence, Databases and Programming Languages.

Springer-Verlag, 1984.

�� !"��#��$�%�&('*)��+'

&

$

%

An Introduction to Prolog

• The programming language Prolog

• Examples of programs in Prolog

• Prolog and FOL machinery: entailment and inference

'

&

$

%

Prolog and Logic Programming

• Prolog stands for “programming in logic”. Prolog is the first

and the most widely used logic programming language.

• Logic programming is the programming language paradigm

that is based on the following view:

A problem should be formalised in logic (i.e., in a

“declarative” way as opposed to the procedural way we

see in languages such as C). Inference processes can be

run to solve the problem.

�� !"��#��$�%�&('*)��+'

&

$

%

Prolog and Logic Programming (cont’d)

Logic programming took off in the 70’s based on pioneering work

by Robert Kowalski. Prolog itself was invented by Alain

Colmerauer in 1972.

This idea is summed up in the famous slogan:


Logic programming was a very influential field of research in the

80’s and 90’s fuelled particularly by Japan’s 5th generation project.

'

&

$

%

Prolog

• Prolog is a programming language centered around a small set

of basic mechanisms, including unification, tree-based data

structures and backtracking.

• It is a great programming language for symbolic,

non-numeric computation.

• It is well suited for problems that involve objects and relations

between them.

�� !"��#��$�%�&('*)��+'

&

$

%

What is a Prolog program?

A Prolog program is simply a set of Horn formulas (or Horn

clauses or simply clauses in the Prolog terminology).

A Horn clause is a FOL formula in any of the following forms:

• An atomic formula (also called fact in the Prolog terminology).

• A formula of the form

q:- p1, p2, ..., pn.

where p1, p2,..., pn, q are atomic formulas.

Such formulas are called rules in the Prolog terminology.

'

&

$

%

Example: Defining family relations

parent(pam, bob). parent(tom, bob). parent(tom, liz). parent(bob,

ann). parent(bob, pat). parent(pat, jim).

This is the “hello world” program in Prolog.

�� !"��#��$�%�&('*)��+'

&

$

%

Prolog Programs: Facts

• The fact that Tom is parent of Bob can be written in Prolog as:

parent(tom, bob).

parent is a predicate; tom and bob are constants.

The fact parent(tom,bob) represents symbolically an instance

of the “parenthoold” relation in our world.

• Prolog syntax: Predicates, constants and functions in Prolog

are written in lowercase.

'

&

$

%

Prolog Programs: Queries

The previous Prolog program is essentially a relational database

defining a relation PARENT. This is the reason we often speak of

Prolog databases.

We can use Prolog to pose queries about “parenthood” (in

database terminology: to query the relation PARENT).

�� !"��#��$�%�&('*)��+'

&

$

%

Examples of Queries

• Is Bob a parent of Pat?

?- parent(bob, pat).

Answer: yes

• Is Liz a parent of Pat?

?- parent(liz, pat).

Answer: no

Prolog answers a query without variables with either yes, or no.

'

&

$

%

Examples of Queries (cont’d)

We can also have queries with variables.

• Who are Liz’s parents?

?- parent(X, liz).

Answer: X=tom

• Who are Bob’s children?

?- parent(bob, X).

Answer: X=ann; X=pat

�� !"��#��$�%�&('*)��+'

&

$

%

Prolog Queries

A query is an expression of the form

?-p1, p2, ..., pn.

where p1, p2, ..., pn are atomic formulas (possibly with free

variables).

Comments:

• p1, p2, ..., pn are also called goals.

• When the query has no free variables, its answer is yes or no.

• When we have variables in a query, Prolog will return all the

values of these variables such that the query logically follows

from the program.

• Prolog syntax: Variables in Prolog are in upper case.

'

&

$

%

More Examples

We can also ask more complicated queries.

• Who are the grandparents of Pat?

?- parent(Y, pat), parent(X, Y).

Answer: Y=bob, X=pam; Y=bob, X=tom

• Who are Jim’s great grandparents?

?- parent(Y, pat), parent(X, Y), parent(Z, X).

Answer: Y=pat, X=bob, Z=pam; Y=pat, X=bob, Z=tom

�� !"��#��$�%�&('*)��+'

&

$

%

The Example Revisited

Let us add information on people’s sex:

female(pam). male(tom). male(bob). female(liz). female(pat).

female(ann). male(jim).

'

&

$

%

The Example Revisited (cont’d)

An alternative representation would be:

sex(pam, feminine). sex(tom, masculine). sex(bob, masculine). ...

The relation sex is binary.

�� !"��#��$�%�&('*)��+'

&

$

%


Let us introduce the predicate offspring as the inverse of the

predicate parent.

We could provide the list of simple facts about the offspring

relation. For example: offspring(liz, tom).

Alternative: why not utilize the information available in the

predicate parent?

Rule: For all X and Y, Y is an offspring of X if X is a parent of Y.

'

&

$

%


The corresponding Prolog rule is:

offspring(Y, X) :- parent(X, Y).

Prolog rules have:

• a condition part or body (the right-hand side of the rule) and

• a conclusion part or head (the left-hand side of the rule).

The meaning of a rule is: If the body holds, the head holds as well.

�� !"��#��$�%�&('*)��+'

&

$

%


• The predicate mother can be defined by:

mother(X, Y) :- parent(X, Y), female(X).

The predicate grandparent can be defined by:

grandparent(X, Z) :- parent(X, Y), parent(Y, Z).

The predicate sister can be defined by:

sister(X, Y) :-

parent(Z, X),

parent(Z, Y),

female(X),

different(X, Y).

'

&

$

%

Recursive Rules

Prolog rules can be recursive. A rule is recursive if the predicate

in its head also appears in its body.

Example: The following rules define the predicate predecessor.

predecessor(X, Z) :-

parent(X, Z).

predecessor(X, Z) :-

parent(X, Y),

predecessor(Y, Z).

The second rule is recursive.

�� !"��#��$�%�&('*)��+'

&

$

%

The Final Program

parent(pam, bob). % Pam is a parent of Bob

parent(tom, bob). parent(tom, liz). parent(bob, ann). parent(bob,

pat). parent(pat, jim).

female(pam). % Pam is female

male(tom). % Tom is male

male(bob). female(liz). female(ann). female(pat). male(jim).

'

&

$

%

The Final Program (cont’d)

offspring(Y, X) :- % Y is an offspring of X if

parent(X, Y). % X is a parent of Y

mother(X, Y) :- % X is the mother of Y if

parent(X, Y), % X is a parent of Y and

female(X). % X is female

grandparent(X, Z) :- % X is a grand parent of Z if

parent(X, Y), % X is a parent of Y and

parent(Y, Z). % Y is a parent of Z

�� !"��#��$�%�&('*)��+'

&

$

%

The Final Program (cont’d)

sister(X, Y) :- % X is a sister of Y if

parent(Z, X),

parent(Z, Y) % X and Y have the same parents

female(X), % X is female and

different(X, Y). % X and Y are different

predecessor(X, Z) :- % Rule 1

parent(X, Z).

predecessor(X, Z) :- % Rule 2

parent(X, Y),

predecessor(Y, Z).

'

&

$

%

Rules vs. Views

Rules can be understood to define views over the database

relations defined by other rules or facts.

Example: The predicate offspring defines a view over predicate

parent.

Prolog syntax:

offspring(Y, X) :-

parent(X, Y).

SQL syntax:

CREATE VIEW OFFSPRING AS SELECT * FROM PARENT

�� !"��#��$�%�&('*)��+'

&

$

%

Entailment and Inference in Prolog

• Prolog clauses are a proper subset of FOL (Horn formulas).

• What happens with entailment and inference in this subset?

'

&

$

%

Entailment and Prolog queries

Proposition. Let q be query with no free variables posed over a

Prolog database DB. The answer to q is yes iff DB |= q.

Proposition. Let q be a query with no free variables posed over a

Prolog database DB. The answer to q is yes iff Generalized Modus

Ponens (forward chaining!) will infer q after it is applied to DB a

finite number of times.

�� !"��#��$�%�&('*)��+'

&

$

%

Entailment and Prolog queries with Free Variables

Proposition. Let q be query over a Prolog database DB. Let θ be a

substitution over the variables of q. The answer to q contains θ iff

DB |= SUBST (θ, q).

Proposition. Let q be a query over a Prolog database DB. Let θ be a

substitution over the variables of q. The answer to q contains θ iff

Generalized Modus Ponens (forward chaining!) will infer SUBST (θ, q)

after it is applied to DB a finite number of times.

'

&

$

%

Prolog, Backward Chaining and Resolution

Prolog uses backward chaining to compute answers to queries. More

precisely, it uses a certain form of resolution called SLD resolution.

Example:

predecessor(X, Z) :- %pr1

parent(X, Z).

predecessor(X, Z) :- %pr2

parent(X, Y),

predecessor(Y, Z).

parent(pam, bob). parent(bob, ann). parent(tom, bob).

parent(bob, pat). parent(tom, liz). parent(pat, jim).

�� !"��#��$�%�&('*)��+'

&

$

%

Proof Tree for predecessor(pam,bob)

predecessor(pam, bob)

parent(pam, bob)

by rule pr1 MGU{X/pam, Y/bob}

yes

'

&

$

%

Proof Tree for predecessor(pam,ann)

predecessor(pam, ann)

parent(pam, ann)

by rule pr1

parent(pam, Y) predecessor(Y, ann)

by rule pr2

no

MGU{X/pam, Z/ann}

,

�� !"��#��$�%�&('*)��+'

&

$

%

Proof Tree for predecessor(pam,ann) (cont’d)

by fact parent(pam, bob)

predecessor(pam, ann)

parent(pam, ann)

by rule pr1

parent(pam, Y) predecessor(Y, ann)

by rule pr2

no

predecessor(bob, ann)

MGU{X/bob}

parent(bob, ann)

by rule pr1

yes

MGU{X/bob, Z/ann}

MGU{X/pam, Z/ann}

'

&

$

%

Declarative vs. Procedural Programming

In Prolog we can understand the meaning of a program in two ways:

• Declarative meaning: This determines what the output of

the program will be and can be defined in terms of entailment

in FOL.

• Procedural meaning: This determines how the output of

the program is obtained and can be defined in terms of

backward chaining and proof trees.

�� !"��#��$�%�&('*)��+'

&

$

%

Execution of Prolog Programs

execute goal list

program

success/failure indicator

instantiation of variables

'

&

$

%

Execution of Prolog Programs: the Algorithm

procedure execute (Program, GoalList, Success); begin if empty(GoalList) then Success := true else begin Goal := head(GoalList); OtherGoals := tail(GoalList); Satisfied := false; while not Satisfied and "more clauses in program" do begin Let next clause in Program be H :- B1, ..., Bn. Construct a variant of this clause H' :- B1', ..., Bn'. unify(Goal, H', UnificationOK, Instant); if UnificationOK then begin NewGoals := append([B1',...,Bn'], OtherGoals); NewGoals := substitute(MGU, NewGoals); execute(Program, NewGoals, satisfied); end end ; Success := satisfied end end ;

�� !"��#��$�%�&('*)��+'

&

$

%

Goal and Clause Order Matters!

• Goals are processed left to right.

• Clauses are selected from top to bottom.

'

&

$

%

Infinite Loops

Example program:

p :- p.

Example query:

?- p.

�� !"��#��$�%�&('*)��+'

&

$

%

Infinite Loops (cont’d)

Consider our earlier predecessor example.

Version 1:

pred1(X,Z):-

parent(X,Z).

pred1(X,Z):-

parent(X,Y), pred1(Y,Z).

'

&

$

%


Version 2: Swap clauses

pred2(X,Z):-

parent(X,Y), pred2(Y,Z).

pred2(X,Z):-

parent(X,Z).

What happens if we pose the query ?-pred2(tom,pat) ?

�� !"��#��$�%�&('*)��+'

&

$

%


Version 3: Swap goals in the second clause

pred3(X,Z):-

parent(X,Z).

pred3(X,Z):-

pred3(Y,Z), parent(X,Y).


'

&

$

%


Version 4: Swap clauses and goals in the second clause

pred1(X,Z):-

pred1(Y,Z), parent(X,Y).

pred1(X,Z):-

parent(X,Z).


�� !"��#��$�%�&('*)��+'

&

$

%

Control in Prolog

In the slogan


both logic and control are important!

Prolog offers various two facilities for control:

• Ordering of clauses and goals.

• The cut operator ! to control backtracking. The cut operator

will be introduced later.

'

&

$

%

Prolog Systems

There are various nice Prolog systems available for many platforms.

For your project we propose that you choose:

• SICStus Prolog (available from http://www.sics.se/sicstus/)

�� !"��#��$�%�&('*)��+'

&

$

%

Readings

• Leon Sterling and Ehud Shapiro. The Art of Prolog. MIT

Press.

• Ivan Bratko, Prolog Programming for Artificial Intelligence,

2nd edition.

Chapters 1 and 2.

• SICStus Prolog manual.

'

&

$

%

More Features of Prolog

• Data objects

• Lists

• Operators and arithmetic

• The cut operator

• Negation as failure

�� !"��#��$�%�&('*)��+'

&

$

%

Data Objects in Prolog

The data objects in Prolog are called terms. Terms in Prolog are

like terms in FOL.

A term is a constant, a variable or a compound term.

A constant is an atom, an integer or a float.

'

&

$

%

Atoms

Atoms in Prolog can be constructed in three ways:

• Strings of letters, digits and underscores starting with a

lowercase letter.

Examples: anna, nil, x25, x 25

• Strings of special characters (depending on the

implementation).

• Strings of characters enclosed in quotes.

Example: ’Oliver Twist’

�� !"��#��$�%�&('*)��+'

&

$

%

Numbers

Integers and floats.

Details can vary depending on the implementation.

Note that Prolog is not a language aimed at arithmetic

calculations.

'

&

$

%

Variables

Variables are strings of letters, digits and underscores. Variables always

start with an upper case letter or an underscore.

Examples:

hasachild(X):- parent(X,Y).

hasachild(_p):- parent(_p,_c).

hasachild(X):- parent(X,_).

?- parent(X,_).

Variables consisting of a single underscore are called anonymous

variables. Anonymous variables are useful when they appear only in

the body of a clause or in a query as shown above.

The lexical scope of a variable is one clause.

�� !"��#��$�%�&('*)��+'

&

$

%

Structured Objects

Structured objects in Prolog are represented by compound terms.

Example:

location(bridge, segment(point(1,1),point(2,3))).

location(factory, triangle(point(4,2),point(6,4),point(7,1))).

The above facts represent geographic knowledge about the location of a

bridge and a factory.

'

&

$

%

Compound Terms

A compound term consists of a functor (called the principal

functor of a term) and a sequence of one or more terms called

arguments.

A functor is characterized by its name, which is an atom, and its

arity (i.e., the number of its arguments).

Example: point, segment and triangle are called functors.

Compound terms can be pictorially represented as trees.

�� !"��#��$�%�&('*)��+'

&

$

%

Lists

Because Prolog is a symbolic computation language, the list data

structure is very important.

A list is an ordered sequence of any number of items.

Example: [ann, tennis, tom, skiing]

Lists in Prolog are just another type of structured object and are

defined formally as follows. A list is

• either an empty list which has no elements and is represented

by [], or

• a structure that has two components

– the first element, called the head.

– the remaining elements (also a list), called the tail.

'

&

$

%

Lists (cont’d)

Lists are structures built using the functor . (dot) with arguments

the head and tail of the list. Using this notation the list

[ann, tennis, tom, skiing]

can be represented as the term

.(ann, .(tennis, .(tom, .(skiing, []))))

�� !"��#��$�%�&('*)��+'

&

$

%

Lists (cont’d)

Lists in Prolog can be represented as follows:

• The convenient notation using brackets.

Example: [ann, tom]

• The cumbersome notation using the functor dot.

Example: .(ann, .(tom, []))

• The useful notation using the vertical bar.

Examples:

[ann | [tom]], [ann, tom | []]

[ann, tennis | [tom, skiing]]

'

&

$

%

Programs for Lists: Membership

member(X, [X|Tail]). member(X, [Head|Tail]):-

member(X,Tail).

Declarative semantics: member(X,L) is true if element X occurs

in list L.

The predicate member can also be used for listing the members of a

list!

�� !"��#��$�%�&('*)��+'

&

$

%

Programs for Lists: Concatenation

conc([], L, L). conc([X|L1], L2, [X|L3]):-

conc(L1, L2, L3).

Declarative semantics: conc(L1,L2,L3) is true if list L3 is the

concatenation of lists L1 and L2.

'

&

$

%

Programs for Lists: Concatenation (cont’d)

The above program can be used for concatenating two given lists:

?- conc([a,b,c], [1,2,3], X).

X=[a,b,c,1,2,3]

�� !"��#��$�%�&('*)��+'

&

$

%

Programs for Lists: Concatenation (cont’d)

The same program can be used for decomposing a given list into two lists:

?- conc(L1,L2,[a,b,c]).

L1=[], L2=[a,b,c];

L1=[a], L2=[b,c];

L1=[a,b], L2=[c];

L1=[a,b,c], L2=[]; no

'

&

$

%

Membership via Concatenation

The following is another program for membership:

member1(X,L):-

conc(L1,[X|L2],L).

or equivalently

member1(X,L):-

conc(_,[X|_],L).

�� !"��#��$�%�&('*)��+'

&

$

%

Adding an Element

To add an element to a list L, it is easiest to put it in front of the

list so that it becomes its new head: [X|L].

If you need a program for this, it is the following:

add(X, L, [X|L]).

'

&

$

%

Deleting an Element

The following Prolog program deletes an element from a list:

delete(X, [X|Tail], Tail). delete(X, [Y|Tail], [Y|Tail1]):-

delete(X, Tail, Tail1).

The program fails if the given element is not in the list.

�� !"��#��$�%�&('*)��+'

&

$

%

Deleting an Element (cont’d)

The previous program can be used as follows:

• Non-deterministically to delete any occurence of the given

element in the list by backtracking.

?- delete(a, [a,b,a,a], L).

L=[b,a,a];

L=[a,b,a];

L=[a,b,a];

no

'

&

$

%

Deleting an Element (cont’d)

The previous program can also be used as follows:

• delete can be used in the inverse direction to add an element

anywhere in a list.

?- delete(a, L, [1,2,3]).

L=[a,1,2,3];

L=[1,a,2,3];

L=[1,2,a,3];

L=[1,2,3,a]; no

�� !"��#��$�%�&('*)��+'

&

$

%

Sublists

The following program sublist(S,L) checks whether list S occurs

within list L as its sublist.

sublist(S,L):-

conc(L1,L2,L),

conc(S,L3,L2).

The program can also be used to find all sublists of a given list.

'

&

$

%

Permutations

The following program permute(L,P) generates by backtracking all

permutations P of a given list L.

permutation([], []). permutation(L, [X|P]):-

delete(X, L, L1),

permutation(L1, P).

�� !"��#��$�%�&('*)��+'

&

$

%

Operators in Prolog

Operators in Prolog can be prefix, infix or postfix.

Each operator has a precedence and an associativity.

Operators in Prolog are merely notational convenience.

Internally Prolog will represent expressions involving operators as

terms (e.g., 2x+3y will be represented as the term +(*(2,x),

*(3,y))).

'

&

$

%

How to Define Operators

Prolog allows the definition of operators using a special type of

clause called a directive.

The syntax for operator directives is

:- op(Precedence, Associativity, Name)

where

• Name is the name of the operator (e.g., ==>).

• Precedence is a number (in SICStus Prolog it is between 1 and

1200) giving the precedence of the operator.

• Associativity is a specification of the associativity of the

operator.

�� !"��#��$�%�&('*)��+'

&

$

%

Example

The addition/subtraction operators +/- could have been defined by

the directive

:- op(500, yfx, [+, -])

where yfx specifies that they are right-associative.

You can find the details of precedence/associativity of all built-in

Prolog operators in any Prolog book or in the SICStus Prolog

manual.

'

&

$

%

Arithmetic

Arithmetic in Prolog is performed with special built-in arithmetic

predicates.

An arithmetic expression is a term involving numbers (integers and

floats), variables, and functors representing arithmetic functions.

Arithmetic expressions are simply data structures. Evaluation of

arithmetic expressions is performed using appropriate built-in

predicates.

�� !"��#��$�%�&('*)��+'

&

$

%

Some Built-in Arithmetic Predicates

The built-in predicate is is used when we want to evaluate an expression

and unify the result with a variable. Notice the difference with the built-in

predicate = which unifies two terms.

?- X is 1+2. X = 3; no

?- X = 1+2. X = 1+2; no

?- X is 1+2, Y=X. X = 3, Y = 3; no

'

&

$

%

Some Built-in Arithmetic Predicates (cont’d)

The built-in predicates

X =:= Y X =\= Y X < Y X > Y X =< Y X >= Y

are also used when we want their arguments to be evaluated.

Example:

?- X is 8/4, X =:= (3+2+1)/3. X = 2.0 ; no

�� !"��#��$�%�&('*)��+'

&

$

%

Examples

The following predicate gcd(X,Y,D) is true if and only if D is the

greatest common divisor of X and Y.

gcd(X, X, X). gcd(X, Y, D):-

X < Y,

Y1 is Y-X,

gcd(X, Y1, D).

gcd(X, Y, D):-

Y < X,

gcd(Y, X, D).


'

&

$

%

Examples (cont’d)

The following predicate length(L, N) is true if and only if N is the

length of list L.

length([], 0). length([_|Tail], N):-

length(Tail, N1),

N is N1 + 1.


�� !"��#��$�%�&('*)��+'

&

$

%

The Cut Operator (!)

The cut operator can be used to reduce the search space of Prolog

computations by dynamically pruning the search tree. The cut can

be used to prevent Prolog from following fruitless computation

paths that the programmer knows could not produce solutions.

The use of cut is controversial. Many of its uses can only be

interpreted procedurally, in a contrast to the declarative

programming style we advocate. Used sparingly, however, it can

improve the efficiency of programs without compromising their

clarity.

'

&

$

%

Green Cuts: expressing determinism

Consider the following program for computing the maximum of two

numbers. Predicate max(X,Y,Z) is true if and only if Z is the

maximum of X and Y.

max(X, Y, X):- X >= Y. max(X, Y, Y):- X < Y.

Finding the maximum of two numbers is a deterministic operation.

Only one of the two max clauses applies in a given computation

because the tests X >= Y and X < Y are mutually exclusive.

�� !"��#��$�%�&('*)��+'

&

$

%

Green Cuts: expressing determinism (cont’d)

We can use the cut operator to express the mutually exclusive

nature of the tests in the max predicate:

max(X, Y, X):- X >= Y, !. max(X, Y, Y):- X > Y.

or

max(X, Y, X):- X >= Y, !. max(X, Y, Y):- X > Y, !.

Operationally the cut is handled as follows. The goal succeeds and

commits Prolog to all the choices made since the parent goal was

unified with the head of the clause the cut occurs in.

'

&

$

%

Green Cuts: expressing determinism (cont’d)

Let us consider the following clause:

A :- B1, B2, ..., Bk, !, C1, C2, ..., Ck

If the current goal G unifies with the head of C and B1, ..., Bk

further succeed, the cut has the following effects:

• The program is committed to the choice of the above clause for

reducing G; any alternative clauses for A that might unify with

G are ignored.

• Further, should any of the C1, C2, ..., Ck fail, backtracking

goes back only as far as the cut. Other choices remaining in the

computation of B1, ..., Bk are pruned from the search tree.

If backtracking actually reaches the cut, the cut fails, and the

search proceeds from the last choice made before the above

clause was chosen for goal G.

�� !"��#��$�%�&('*)��+'

&

$

%

Red Cuts: omitting explicit conditions

We can go one step further in the use of cut. If we take into

account the execution model of Prolog, we can rewrite the program

for max in the following form:

max(X, Y, X):- X >= Y, !. max(X, Y, Y).

The above program still works as expected but now we have

modified the declarative semantics of the original program.

For example, the fact max(5,1,1) follows from the second clause.

So this is a false logic program but behaves correctly!

Cuts whose presence in a program changes the meaning of the

program are called red cuts. Using red cuts should be avoided if

possible since it is error-prone.

'

&

$

%

Red Cuts: omitting explicit conditions (cont’d)

Consider the program for member:

member(X, [X|L]). member(X, [Y|L]):-

member(X,L).

Now consider a new version of member where we have used the cut

operator to obtain efficiency.

member(X, [X|L]):- !. member(X, [Y|L]):-

member1(X,L).

Is this a good Prolog program?

�� !"��#��$�%�&('*)��+'

&

$

%

Red Cuts: omitting explicit conditions (cont’d)

If the semantics of member are:

member(X,L) is true if and only if X is a member of list L

then the above program is not correct (red cut!) because the goal

member(X,[1,2]) has only the solution X=1. Perhaps the above

program should be called memberCheck with appropriate semantics.

'

&

$

%

Negation as Failure

Prolog allows a limited form of negation called negation as

failure.

Negation as failure can be implemented by a built-in predicate not

which can be defined by the following Prolog program.

not(G):- G, !, fail. not(G).

In other words, the goal not(G) succeeds if and only if the goal G

fails. fail is a built-in predicate that simply fails. The cut used

above is a red one because the meaning of the program is different

when the cut is removed.

�� !"��#��$�%�&('*)��+'

&

$

%

Negation as Failure (cont’d)

Negation as failure comes very handy in writing various Prolog

rules or queries.

The following program defines a predicate disjoint(L,T) which is

true if lists L and T have no common elements.

disjoint(L,T) :- not(commonMembers(L,T)).

commonMembers(L,T):-

member(X,L),

member(X,T).

'

&

$

%

Negation as Failure (cont’d)

In some Prolog systems different notation is used to express

negation as failure.

In SICstus Prolog the appropriate operator is \+. Thus the above

rules should be written as:

disjoint(L,T) :- \+(commonMembers(L,T)).

commonMembers(L,T):-

member(X,L),

member(X,T).

�� !"��#��$�%�&('*)��+'

&

$

%

Readings

• Leon Sterling and Ehud Shapiro. The Art of Prolog. MIT

Press.

• Ivan Bratko. Prolog Programming for Artificial Intelligence.

2nd edition. Addison-Wesley.

Chapters 3 and 5.

• SICStus Prolog manual.

What is AI?cgi.di.uoa.gr/~ys02/siteAI2005/lectures/ai2004-2pp.pdf · AI is the eld of science and engineering which attempts to build ... What is AI (cont’d) De nitions found in

Documents