Top Banner
SCS4101: Artificial intelligence Lecturer: Mr. M. Ndlovu Office: AG16 – Computer Science Department Lecture Schedule: Tues 6:30pm-FD54 Friday 4:30pm–FD54 Reference material: 1. Artificial Intelligence, A Modern Approach, by Stuart Russel & P. Norvig 2. Expert Systems, Principles & Programming by Giarratano & Riley. 3. A Comprehensive Guide to AI & Expert Systems by Robert Levine et al 4. http://www.cs.ubc.ca/~kevinlb/teaching/cs322/index.html 5. http://www.dennisgorelik.com/ai/EffectConcept.htm Course Objectives Students should be able to: Explain the concepts of rational reasoning, human-like reasoning, rational behavior, human-like behavior. Develop descriptions of an agents and determine which agent type is applicable to a given problem. Solve problems in a LOGIC programming language (PROLOG) Formulate an efficient problem space for a problem expressed in English by expressing that problem space in terms of states, operators, an initial state, and a description of a goal state. Select an appropriate search algorithm for a problem, implement it, and characterize its time and space complexities. Select an appropriate heuristic search algorithm for a problem and implement it by designing the necessary heuristic evaluation function. Describe under what conditions heuristic algorithms guarantee optimal solution. Explain the differences among the three main styles of learning: supervised, reinforcement, and unsupervised.
72
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Artificilal 4101 Notes

SCS4101: Artificial intelligence

Lecturer: Mr. M. NdlovuOffice: AG16 – Computer Science DepartmentLecture Schedule: Tues 6:30pm-FD54

Friday 4:30pm–FD54Reference material:

1. Artificial Intelligence, A Modern Approach, by Stuart Russel & P. Norvig2. Expert Systems, Principles & Programming by Giarratano & Riley.3. A Comprehensive Guide to AI & Expert Systems by Robert Levine et al4. http://www.cs.ubc.ca/~kevinlb/teaching/cs322/index.html 5. http://www.dennisgorelik.com/ai/EffectConcept.htm

Course Objectives

Students should be able to: Explain the concepts of rational reasoning, human-like reasoning, rational

behavior, human-like behavior. Develop descriptions of an agents and determine which agent type is applicable to

a given problem. Solve problems in a LOGIC programming language (PROLOG) Formulate an efficient problem space for a problem expressed in English by

expressing that problem space in terms of states, operators, an initial state, and a description of a goal state.

Select an appropriate search algorithm for a problem, implement it, and characterize its time and space complexities.

Select an appropriate heuristic search algorithm for a problem and implement it by designing the necessary heuristic evaluation function.

Describe under what conditions heuristic algorithms guarantee optimal solution. Explain the differences among the three main styles of learning: supervised,

reinforcement, and unsupervised. Determine which of the three learning styles is appropriate to a particular problem Employ some techniques for reasoning under uncertain conditions Outline the stages one goes through in developing artificial intelligence systems.

Course outline1. Introduction

Artificial intelligence History of AI – Developments in AI Application areas of AI

2. Knowledge representation Logic Semantic networks Rules Frames

Page 2: Artificilal 4101 Notes

3. Search techniques Problem representation Search methods -Uninformed Searches

-Informed Searches

4. Natural language processing Grammars Parsing Text generation

5. Learning: General models for learning o Structure of a learning agent o Types of learning: supervised/unsupervised, reinforcement o Inductive learning

6. Logic programming Introduction to prolog Prolog Versus Conventional programming Logic specifications Structured data Arithmetic facilities Procedural programming

7. Reasoning with Incomplete Information

8. Developing an Artificial Intelligence System Defining Goals Obtaining Facts Obtaining Data Pruning

2

Page 3: Artificilal 4101 Notes

AI Course NotesLecture 1

Introduction

Intelligence is the ability to comprehend, to understand and profit from an experience.

OR

Intelligence is a general mental capability that involves the ability to reason, plan, solve problems, think abstractly, comprehend ideas and language, and learn.

Therefore intelligence can be considered as the ability to exhibit “appropriate actions for each situation”.

Aspects/Elements of Human Intelligence (Goals, Facts & Rules, Pruning & Inference)

To understand human intelligence, one needs to understand how human beings think. Human thought processes are guided by the following:

*Goals (Direct our thinking)

Thinking helps us to accomplish something. Our thought processes are set in motion by goals to be achieved. When one is carrying out the simplest or most difficult task, the mind is sharply focused on a goal.Examples of goals:

Learning to drive a car Deciding on what to have for supper Finding the shortest route from your home to town

A human does not do things because s/he thinks, but s/he thinks because they are things to be done.

*Facts & RulesThe human mind stores a database of facts & rules. Human intelligence is a collection of facts & means of utilizing these facts to reach goals (rules).e. g. Fact 1: During rush hour, streets are busy. Rule: If I try to cross a street on foot during rush hour, then I might be hit by a car. Human beings have the capacity to relate very complex sets of rules and facts in the attempt to reach some very complicated goals.

*Pruning

The act of selecting a proper response to a particular situation is called pruning. The mind has to extract the right set of rules (from a pool of facts and rules) quickly to fit into a particular situation. Pruning eliminates pathways of thought that are not relevant to the immediate objective of reaching a particular goal.

3

Page 4: Artificilal 4101 Notes

Pruning provides a vital order to our thoughts, without which life would be impossible. It helps us to carry out a quick efficient search for only those rules that pertain to the immediate goal.

*Inference Mechanism

Inference helps us to provide an answer to a question, basing on a rule previously learnt. In the process of arriving at the present goal, a new fact is derived. The inference mechanism in us is central to our ability to learn from experience as it enables us to generate new facts from existing ones by applying already acquired knowledge to new situations. It also helps in the detection of errors in our thinking & allows us to modify & improve the rules we use in arriving at the goals.

Artificial intelligence (AI)

It is a field of study that encompasses computational techniques for performing tasks that apparently require intelligence when performed by humans e.g. problem diagnosis. Fundamental issues of AI include:

(i) knowledge representation, (ii) searching and (iii) inference.

Much of AI is concerned with the design and understanding of knowledge representation schemes. The element that the fields of AI have in common is the creation of machines that can "think".

Inference (Reasoning) is the process of creating explicit (stated clearly and precisely) representations of knowledge from implicit (implied though not directly expressed.) ones. Exists in two forms:

4

Goal

Pruning

If condn.1Then Rules for 1

If condn. 2Then Rules for 2

Rules for 1If…, Then..

Rules for 2If….., Then…

Page 5: Artificilal 4101 Notes

Deductive inference - proceeds from a set of assumptions called axioms to new statements that are logically implied by the axioms. Here the conclusion is necessitated by, or reached from, previously known facts, the premises: if the premises are true, the conclusion must be true. This is distinguished from inductive reasoning, where the premises may predict a high probability of the conclusion, but do not ensure that the conclusion is true.

Inductive inference starts with a set of facts, features or observations and it produces generalizations, descriptions and laws which account for given information and which may have power to predict new facts, features and observations.Or briefly inductive inference is the process of reaching a general conclusion from specific examples.

When is a machine said to possess artificial intelligence?The traditional answer is that artificial intelligence is manifested in a machine when the machine’s performance cannot be distinguished from that of a human performing the same task.

The Turing Test

A human is put in one room and a machine in another, a human interrogator, in another separate room. The interrogator may ask questions to either the human or machine. The interrogator passes messages through an intermediary e.g. email. As they respond the two parties compete to convince the interrogator that she/he is the human. If the machine can win, on average as often as the human, then it passes the Turing test and by this particular criterion, can therefore think.

Issues of competence in AI are quality and quantity of knowledge in a system, the kinds of inference it can make with this knowledge, how well directed its search procedure is.

5

Page 6: Artificilal 4101 Notes

Lecture 2

The History of AI (Timeline of Major AI events)

In 1941 an invention, developed in both the US and Germany was the electronic computer. The first computers required large, separate air-conditioned rooms, and were a programmers nightmare, involving the separate configuration of thousands of wires to even get a program running.

In late 1955, Newell and Simon developed The Logic Theorist, considered by many to be the first AI program. The program, representing each problem as a tree model, would attempt to solve it by selecting the branch that would most likely result in the correct conclusion. The impact that the logic theorist made on both the public and the field of AI has made it a crucial stepping-stone in developing the AI field.

In 1956 John McCarthy regarded as the father of AI, organized a conference (in Dartmouth) to draw the talent and expertise of others interested in machine intelligence for a month of brainstorming. " From that point on, because of McCarthy, the field would be known as Artificial intelligence. Although not a huge success, (explain) the Dartmouth conference brought together the founders in AI, and served to lay the groundwork for the future of AI research.

In 1957, the first version of a new program The General Problem Solver (GPS) was tested. The program was developed by Newell and Simon, who developed the Logic Theorist. The GPS was capable of solving a greater extent of common sense problems. A couple of years after the GPS, IBM contracted a team to research in artificial intelligence. Herbert Gelerneter spent 3 years working on a program for solving geometry theorems. While more programs were being produced, McCarthy was busy developing a major breakthrough in AI history. In 1958 McCarthy announced his new development; the LISP(LISt Processing) language, which is still used today. In 1963 MIT received a 2.2 million dollar grant from the United States government to be used in researching Machine-Aided Cognition (artificial intelligence). The grant by the Department of Defense's Advanced research projects Agency (ARPA), was to ensure that the US would stay ahead of the Soviet Union in technological advancements. The project

6

Page 7: Artificilal 4101 Notes

served to increase the pace of development in AI research, by drawing computer scientists from around the world, and continues funding.

SHRDLU was part of the microworlds’ project, which consisted of research and programming in small worlds (such as with a limited number of geometric shapes). The MIT researchers headed by Marvin Minsky, demonstrated that when confined to a small subject matter, computer programs could solve spatial problems and logic problems. Other programs which appeared during the late 1960's were STUDENT, which could solve algebra story problems, and SIR which could understand simple English sentences. The result of these programs was a refinement in language comprehension and logic

In the 1970's the expert system was coined. Expert systems predict the probability of a solution under set conditions.

During the 1980's AI was moving at a faster pace, and further into the corporate sector. In 1986, US sales of AI-related hardware and software sowed to $425 million. Expert systems in particular demand because of their efficiency. Companies such as Digital Electronics were using XCON, an expert system designed to program the large VAX computers. DuPont, General Motors, and Boeing relied heavily on expert systems, to keep up with the demand for the computer experts, companies such as Teknowledge and Intellicorp specializing in creating software to aid in producing expert systems were formed. Other expert systems were designed to find and correct flaws in existing expert systems.

Beyond The 1980’s (Narrate what is happening)-Find out about the smart truck

Application Areas of AI

1. Expert/knowledge based systems,Expert Systems (Knowledge Based Systems)

Expert systems are programs that use human like reasoning processes rather than computational techniques to solve problems in specific problem domains. These reasoning processes in turn rely on experiential human knowledge, or expertise, stored in a structure called a knowledge base.

7

Page 8: Artificilal 4101 Notes

Expert systems cannot solve problems that human beings do not know how to solve. Instead they contain existing knowledge of experts and use that knowledge to reason like a human being. Expert systems rely not only on factual knowledge as do conventional programs, but also on uncertain knowledge and observations based on experience and intuition (collectively known as heuristics). The facts and heuristics are extracted from experts in a specialized subject area. They are then coupled with methods of analyzing, manipulating and applying the encoded knowledge so that the program can make inferences and explain its actions. Expert systems differ from conventional computer programs because their reasoning is not strait forward. Their tasks have no practical algorithmic solutions, and they must often make conclusions based on incomplete, uncertain or fuzzy information.

Knowledge bases represent knowledge of facts and general information, as well as heuristics. E.S also have control mechanisms, which organize and control strategies taken to apply the inference process. In contrast to conventional programs, knowledge systems modularize the knowledge base, inference mechanism, and control mechanism. These 3 mechanisms are separate and independent of each other. This separation allows new knowledge to be added fairly easily to knowledge systems. As a result the knowledge base can grow. The inference and control mechanisms are however essentially static.

Not all fields of knowledge are currently suitable for expert system application. In general, E.S are suitable for tasks about which people are more knowledgeable and perform a lot better if they had years of experience. Usually the number of such people is relatively small; as a result, their knowledge is scarce and poorly distributed in an organization. This results in a knowledge bottleneck for the organization.

2. Natural Language Systems (focus on)Natural Language Understanding – computer listens to a human/reads text.Speech understanding – translation of text to voice.Language generation –Machine Translation – useful for large international organizations.

3. Perception Systems Perception systems interpret perceptions of vision, speech and touch and make inferences about them.

8

Page 9: Artificilal 4101 Notes

4. Logistics PlanningPlanning attempts to order actions to achieve given goals. US forces used DART in Persian Gulf to automate logistics planning & scheduling for transportation, resulted in a cut down on time spent on planning.

5. Machine Learning – Tutoring of students6. Theorem Proving – Building of a general theorem proving solution.7. Game playing – IBM’ Deep Blue defeated the world champion in a chess match in 1997.

Lecture 3

Knowledge Representation

Knowledge – Specialized information about a situation.

Wisdom- at the pinnacle, meta-knowledge of getting the best goalsMeta-knowledge- Knowledge about how to apply knowledgeKnowledge – specialized information about a situationData – items of potential interestNoise – Data items of no use

It is necessary to represent the computer's knowledge of the world by some kind of data structures the machine can understand.

Classes of Knowledge

Procedural – knowledge of how to do thingsDeclarative – knowing that something is true/false

9

Knowledge

Information

Data

Noise

Page 10: Artificilal 4101 Notes

Tacit – unconscious knowledge that can’t be expressed by language, e.g. knowing how to blink

OR

Factual Data: Known facts about the world. General Principles: ``Every dog is a mammal.'' Hypothetical Data: The computer must consider hypothetical’s in order to reason about the effects of actions that are being contemplated.

1. Rules-Means of representing knowledge in chunks, modules-Most common form of knowledge representation, a rule is a conditional statement that specifies an action that is supposed to take place under certain conditions. -Organized in a loose arrangement with links to related chunks of knowledge.

Take the FormIf attribute A1 has value V1 andAttribute A2 has value V2 ThenAttribute A3 has value V3

The ‘if part’ is the antecedent, conditional part, pattern, LHS, the THEN part is the consequent (RHS).

Example rule1. If the car does not run, and the fuel gauge is emptyThen fill the gas tank.

2. If there is a flame then, then is a fire.

Rule based languages do not require programmers to specify a sequence of steps for a program. Instead just conditions indicating some situations are described and the action that should take place if that situation is true, and the whole set of unstructured rules is used by the program.

Rule-based systems are also known as production systems (P.S.). These systems exhibit modularity as individual productions in the rule base can be added deleted or changed independently. The rules communicate only by means of the context; they do not call each other directly. P.S also have the uniformity feature in that the rule base consists of elements of similar format, conditions and actions. This makes them easily understood. They also model the natural model of solving problems, experts would normally describe solutions by describing what to do in certain circumstances and P.S try and emulate this procedure.

Production System Cycle (Rule Firing)

10

Page 11: Artificilal 4101 Notes

Matching – requires lots of computational resources, this has led to making rule bases and contexts more complicated data structures. For example the principle of indexing can be incorporated, by partitioning the rules according to conditions that will make them fire so as to reduce searching overhead.

Conflict resolution – often more than one rule can fire in each cycle of the operation of P.S. the system is required to choose one rule from among this set (conflict set). The techniques for choosing a single rule (resolving the conflict) include:

First rule that satisfies the condition in the order of their appearance in the rule base

Highest priority, priority rankings are defined by the programmer Most specific rule, one with the most detailed condition The rule that refers to the element most recently added to the context New rule, the rule binding that has not occurred previously Arbitrary, choosing a rule at random

Firing – once the conflict is resolved the chosen rule is executed/fired resulting in a new system status and thus the cycle continues as matching is then needed to select an appropriate rule for this new context.

Advantages of Rules- Modular, easy to encapsulate knowledge & expand system.-Explanation facilities can be easily built from rule antecedents.-Similar to the human cognitive process, they are a natural way of modeling how humans solve problems.

2. Logic

Propositional logic (calculus) is concerned with declarative sentences which can be classified as either True/False. A statement whose truth value can be determined is called a statement (proposition). E.g. The sentence ‘A triangle has three sides’ is a statement. But the sentence ‘This statement is false’ is not, ie it can’t be classified. A compound statement is formed by using logical connectives such as:

AND; conjunction OR; disjunction NOT; negation if …then; conditional

if and only if; biconditional

e. g. if it is raining then carry an umbrella is equivalent top q

where p = it is raining q = carry an umbrella.

p q is equivalent to (p q) (q p) & is true iff p = q.

11

Page 12: Artificilal 4101 Notes

This has the following meanings:p iff q; q iff p; if p then q, and if q then p.

A compound statement that is always true, even if its individual statements are true or false is a tautology. e.g. p p is a tautology.

A conditional tautology is called an implication represented as p q

A bi-conditional tautology is called an equivalence represented as p p or p qA contradiction is a compound statement, which is always false. e.g. p p

Truth Table of the Binary Logical Connectives

P Q P Q P Q P Q P QT T T T T TT F F T F FF T F T T FF F F F T T

For the purposes of AI, propositional logic is not very useful as it cannot examine the internal structure of a statement.. In order to capture adequately in a formal way our knowledge of the world, we need not only to be able to express true of false propositions, but also to be able to speak of objects, and to postulate relationships over classes of objects.

We turn to PREDICATE LOGIC to accomplish these objectives. Predicate logic is an extension of the notions of propositional logic. Statements about individuals, both by themselves and in relation to other individuals are called predicates. A predicate is applied to a specific number of arguments and has a value of either TRUE or FALSE when individuals are used as the arguments. Predicate logic deals with quantifiers such as all, some & no.

Predicate Logic Quantifiers1. - universal quantifier, for all. (a conjunction of predicates about instances).

(x) (Human(x) Mortal(x)) means all humans are mortal i.e. is read as for all x, if x is human, then x is mortal.

e. g. 2 ( x) (triangle (x) polygon(x))triangle (pqr) is an instance.

2. - existential quantifier, statement is true for at least one element of the set.(x) (elephant (x) name(Clyde)).

12

Page 13: Artificilal 4101 Notes

Example Meaning(x) (P) All elephants are mammals(x) (P) Some elephants are not mammals(x) (P) Some elephants are mammals(x) (P) No elephants are mammalsWhere P represents “elephants are mammals”First two sentences are negations of each other, so are last two.

Limitations of Predicate Logic Most quantifier cannot be expressed in terms of universal & existential

quantifiers. Things that are sometimes, but not always true cannot be expressed.

Lecture 4Knowledge Representation cont’d

3. SEMANTIC NETWORKS (ASSOCIATIVE NETS)

A classic technique used for propositional information, also known as a propositional net. It is a form of a labeled directed graph. A net consists of nodes and arcs connecting nodes. Nodes represent objects, and arcs represent relations between them. Two types of arcs (links) exist, the IS-A & A-KIND-Of

Suppose we want to represent a simple fact: all robins are birds, in a semantic network

IS-A

Let’s add another fact: Clyde is a particular individual that is a robin

IS-A IS-A

Also, even though we have stated 2 facts we can deduce a third one: clyde is a birdThis is because there is an inheritance hierarchy created by the IS-A inheritance link. This is the main advantage of semantic networks in domains with complicated taxonomy.Besides classification we want to represent properties of objects, e.g birds have wings

IS-A IS-A

Has-part

Suppose we wish to represent the fact: clyde owns a nestIS-A IS-A

13

Robin Bird

Robin BirdClyde

robin birdclyde

wings

robin birdclyde

Page 14: Artificilal 4101 Notes

ownsHas-part

IS-A

this representation has short comings. Suppose we wanted to encode the information that clyde owned nest1 for a season. This is impossible to represent as owns is a binary relation. What is need is an equivalent of a predicate which can take arguments. It would then be possible to note the start and end of the ownership. Nodes are allowed to represent situations in order to get around this problem. Each situation node can have a set of outgoing arcs called case frames, which specify the arguments to the situation predicate.

IS-A IS-A

ownerHas-part

ownee IS-A

IS-A start time IS-A

end timeIS-A

The reasoning mechanism used by most semantic network systems is based on matching network structures. A network fragment is created, representing a query and then it is matched against the network database to see if such an object exists. Variable nodes in the fragment are bound in the matching process to values they must have to make the match perfect.e.g suppose we wish to answer the question: what does clyde own?We must construct the fragment:

owner

ownee

14

wingsnest1 nest

robin birdclyde

wingsownership nest1 nest

spring

summer

time

situation

clyde

?ownership

Page 15: Artificilal 4101 Notes

isa

this represents an instance of ownership in which clyde is the owner. This fragment is then matched against the network database looking for an ownership node that has an owner link to clyde. When it is found, the node that the ownee link points to is bound in the partial match and is the answer to the question. Had no match been found, the answer would have been: Clyde doesn’t own any thing

From another point of view AKO (A Kind Of) relates generic nodes to generic nodes while the IS-A relates an individual(instance) to a generic class. An AKO link points from a subclass to a class. E.g. hot air causes air balloon to rise.

e.g. net.

AKO AKO AKO Has-shape

AKO AKO AKO AKO

IS-A IS-A IS -A

15

situation

Aircraft

ballonround

Propeller driven

jet

blimpspecial DC3 DC-9

GoodYear Blimp Spirit of

St. Louis

Airforce 1

Page 16: Artificilal 4101 Notes

The net above is an example with IS-A and AKO links.

Limitations of Semantic Nets Lack of standards for link names causing ambiguity Lack of standards for node names Combinatorial explosion of searching the nodes, especially if response is

negative, may have to search all nodes to produce negative results. Logically inadequate as they cannot define knowledge logically. Cannot define heuristics of efficiently arriving at a solution.

4. FRAMES Frames are data structures that contain variable sized memory areas called slots. They provide a convenient structure for representing objects that are typical to a given situation such as sterotypes(). They work well in common sense knowledge representation.-Represent related knowledge about a narrow subject, which has much default knowledge, e.g. mechanical device such as a car.

A Car FrameSlots FillersManufacturer General MotorsModel Ford BantumYear 1980Transmission AutomaticEngine GasolineTyres 4Color Red

Frames are a kind of template for holding clusters of related knowledge about a particular subject. Since related knowledge is grouped together, frames and frame based systems that contain them, structure information in a much more organized and manageable manner than do rule based systems which are unstructured. Knowledge is not only clustered in individual frames but many frames themselves tend to cluster because they also are related. Frames can be used to show inheritance. Frames and inheritance can be used in the filler slots to build very powerful knowledge representation system.

Frames can either be Generic or Specific.Generic frames model majority of an object’s attributes, Specific frames model specific knowledge for specific cases.

Slots FillersName PropertySpecilisation_of A_kind_of objectTypes (car, boat, house)

if-added: Procedure ADD_PROPERTY

16

Page 17: Artificilal 4101 Notes

Owner Default: governmentIf-needed: Procedure FIND_OWNER

Location (work, home, mobile)Status (missing, poor, good)Under_warranty (yes,no) A Generic Frame for Property

Slots FillersName CarSpecialization_of a-kind-of propertyTypes (sedan, sports)Manufacturer (GM, Ford)Location MobileWheels 4Transmission (manual, automatic)Engine (gasoline, diesel)

Car Frame – A Generic Subframe of PropertySlots FillersName John’s carSpecialization_of Is_a carManufacturer GMOwner John SmithTransmission automaticEngine GasolineStatus GoodUnder_warranty yes

An Instance of a Car frameTypes of fillers1. Value e.g. property2. Rang of values e.g. types slot3. Procedural attachments of 3 types

If-needed – executed when a filler value is needed but none are initially present or the default value is not suitable.

If-added – executed when a value is to be added to a slot. If-removal – executed when a value is to be removed from a slot

4. Relations, e.g. a-kind-of and is-a

Limitations of frames ????

Lecture 5Problem Solving through Search

17

Page 18: Artificilal 4101 Notes

Problem Solving – analysis of how computers can be made to find solutions in well- circumscribed domains.

Given a problem representation, search is the process that builds some or all of the search tree for the representation in order to do one of the following:

– If the search tree has one or more goals, identify one (or all of them) and the sequence of operators that produce each.– If the search tree has one or more goals, identify a least costly one and the sequence of operators that produces it.– If the search tree is finite and does not contain a goal, recognize this.

An Example: The Vacuum world Problem

• Problem formulation as search:– states: one of the 8 shown below.– operators: move left, move right, or suck.– goal test: no dirt left in any square.– path cost: each action costs 1; path cost=path length.

1. The vacuum world (with a sensor-less agent) Search space

The specification of a state space problem consists of:1) a set of operators O2) a set of one or more initial states S3) a predicate defining a set of goal states G, A problem can have one/more goals.

18

Page 19: Artificilal 4101 Notes

Therefore a state space problem is then represented as (S,O,G).

A solution to the problem is a finite sequence of applications of operators that changes an initial state to its goal state without passing through failure states.

Failure or Impossible StatesAny state that flouts the rules of the problem is a failure state.

A common variation of state space problems requires finding not just any path but one of minimum cost between an initial node and a goal node. In this case, each arc is labeled as a cost.

2. Farmer, Wolf, Goat Puzzle

A farmer is on the east bank of a river with a wolf, goat and cabbage in his care. Without his presence the wolf would eat the goat or the goat would eat the cabbage. The wolf is not interested in the cabbage. The farmer wishes to bring his three belongings across the river. However the boat available to him can only carry one of the wolf, goat and cabbage besides himself.

Problem: Find the sequence of river crossings so that the farmer can transfer himself and the other three all intact to the west bank?

Search Trees

A search tree is a visualization and implementation aid, with Tree nodes corresponding to states.NB: Search tree different from state space (e.g. search trees can be arbitrarily big, while the state space could be finite)Queue: a valuable data structure for implementing various search strategies

Farmer Wolf Goat and Cabbage State Space (including illegal states) – Search Tree

19

Page 20: Artificilal 4101 Notes

State Space Search

In solving a problem, one starts from some initial state and tries to reach a goal state (e.g. all on west bank) by passing through a series of intermediate states. For example in the fwgc problem, crossing the river in a boat either on his own or with one of the others changes the state of the system. There is usually a set of rules, which tell one how to go from one state to another. Each state may be succeeded by one of a number of others.

Tree Search Strategies

Offline (i.e. compute a complete solution before acting), simulated exploration of state space by generating successors of already-explored states.

Uniformed Versus Informed Search Algorithms

Uniformed Search Algorithms: In making a decision, these look only at the structure of the search tree and not at the states inside the nodes.– Some form of blind search algorithms.

Informed Search Strategies

20

Page 21: Artificilal 4101 Notes

-At times we have additional knowledge (heuristics) about the problem that we can use to inform the agent doing the search.-Heuristics (informed guesses) are employed to direct the search.-Heuristics can estimate the goodness of a particular node (or state) n. i.e. how close is n to a goal node.

Search Strategies are evaluated based on:– Completeness (Does it always find a solution?)– Time complexity (How long does it take to find a solution?)– Space complexity (how much memory is needed to get to the solution?)– Optimality (always finds the highest quality solution?)

– Parameters used:b -maximum or average branching factor, how many alternative paths?d – depth of the least cost solution(root is depth 0)m – maximum depth of state space, might be infinity.Example Blind Search Strategies

1. Breadth First Searchfringe is a FIFO queue, i.e., new successors go at end.

Breadth first search(BFS), is an approach where one examines all the successor states of a given state before proceeding to a new level. BFS uses a queue to store states for examination. On examining a state, its successor states are added to the back of the queue. The next state for examining is taken from the front of the queue.

21

Page 22: Artificilal 4101 Notes

Complete?? Yes (if b is finite)Time?? O(bd+1), i.e., exp. in dSpace?? O(bd+1) (keeps every node in memory)Optimal?? Yes (if cost = 1 per step); not optimal in generalSpace is the big problem; can easily generate nodes at 100MB/sec, so 24hrs = 8640GB.2. Uniform Cost Search (students to read about)

3. Depth First Search

In depth first search, the program will examine a state, then proceed from it to one of its successor states and then repeats this process with the successor state, going a level deeper each time in the state space.

Depth First Search of FWGC problem, ignoring the illegal states.

22

Page 23: Artificilal 4101 Notes

•Properties of DFS.

This is not guaranteed to find any path to a goal state. Therefore incompleteIt is memory efficient even if the state space is large.If the typical branching factor is b, and the maximum depth of the tree is m (possibly ) – the space complexity is O(bm),- and the time complexity is O(bm).

but•Can find a solution faster than Breadth First (e.g. when many goal states appear)•Easily implemented with recursion.

4. Depth-limited search: depth first search, but with a bound on depth explored.DLS is a variation of DFS. If we put a limit l on how deep a depth first search can go,we can guarantee that the search will terminate (either in success or failure).

If there is at least one goal state at a depth less than l, this algorithm is guaranteed to finda goal state, but it is not guaranteed to find an optimal path. The space complexity isO(bl), and the time complexity is O(bl). For most problems we will not know what is agood limit l until we have solved the problem!

23

Page 24: Artificilal 4101 Notes

5. Iterative deepening: depth-limited but successively increase the bound until a goal is found

Depth First Iterative Deepening Search is a variation of Depth Limited Search(DLS). If the lowest depth of a goal state is not known, we can always find the best limit l for DLS by trying all possible depths l = 0, 1, 2, 3, … in turn, and stopping once we have achieved a goal state.

But its wasteful because all the DLS for l less than the goal level are useless, andmany states are expanded many times. However, in practice, most of the time is spentat the deepest part of the search tree, so the algorithm actually combines the benefits ofDFS and BFS.

PropertiesComplete and optimal since all the nodes are expanded at each level

6. Bi-directional Search (BDS)

Do search from the start state and (backwards) from a goal state until they meet.i.e. search simultaneously both forward from the initial state and backwards from the goal state, and stop when the two BFS searches meet in the middle.

The algorithm is complete and optimal, and since the two search depths are ~d/2, it has space complexity O(bd/2), and time complexity O(bd/2). However, if there is more than one possible goal state, this must be factored into the complexity.

Some requirements: Need to have a goal state identifiable (what if there are too many?)Predecessor states can be generated (easily)

24

Page 25: Artificilal 4101 Notes

Comparing Blind Search Strategies

Recall:b -maximum or average branching factor, how many alternative paths?d – depth of the least cost solution(root is depth 0)m – maximum depth of state space, might be infinity.The Big O (Order of) NotationIs a mathematical notation used to describe the asymptotic behavior of functions. Its purpose is to characterize a function's behavior for very large (or very small) inputs in a simple but rigorous way that enables comparison to other functions. It is mainly used in the analysis of the complexity of algorithms, i.e. it helps to avoid details when they do not matter, they don’t matter if the input size is big enough.

Lecture 6Informed Search Algorithms

- Use some kind of evaluation function (heuristic) to tell us how far each expanded state is from a goal state, and also to help us decide which state is likely to be the best one to expand next.

-It is hard to come up with good evaluation and/or heuristic functions. Often there is a natural evaluation function, such as distance in miles or number of objects in the wrong position. Sometimes we can learn heuristic functions by analyzing what has worked well in similar previous searches.

1/ Best-first search

–General approach of informed search, i.e., order the nodes in fringe (L) in decreasing order of desirability.-Node is selected for expansion based on an evaluation function f(n)-Usually evaluation function measures distance to the goal. –Choose a node, which appears to be the bestVariations of Best First Search

i) Greedy Best First Search

-Expands the node closest to the goal, on the grounds that this is likely to lead to a solution quickly.

25

Page 26: Artificilal 4101 Notes

-Uses a heuristic function: f(n) = h(n) where h(n) is the cost of moving from the current node to the goal.

Assume that the diagram below represents cities and distances between these cities in kilometers.

Suppose the straight - line distances between these cities and city I are given as:

A to I – 40B to I – 76C to I – 38D to I – 80E to I – 25F to I – 55G to I - 58H to I - 60I to I – 0

Our aim is to move from A to G, the goal state.

Using the greedy best first search, taking our heuristic function to be the straight - line distance between the cities and the goal city.

Thus f(n) = h(n) (Build the search tree)

The algorithm finds a path, though not an optimal path: it always chooses what looks locally best, rather than worrying about whether or not it will be best in the long run.

Greedy search is susceptible to false starts. NB, if we are not careful to detect repeated states, the solution will never be found - the search will oscillate between cities giving a false start.

Greedy search resembles depth first search (dfs) in the way that it prefers to follow a

26

12

AH

I

C

D

B

F

E

G

30

36

17

45

5525

65

50

49

10

Page 27: Artificilal 4101 Notes

single path to the goal and backup only when a dead-end is encountered. It suffers from the same defects as dfs - it is not optimal and it is incomplete because it can start down an infinite path and never try other possibilities.

The worst-case complexity for greedy search is O(bm), where m is the maximum depth of the search. Its space complexity is the same as its time complexity, but the worst case can be substantially reduced with a good heuristic function.

ii) A* search

-Combines the use of g(n), the cost of the path so far, and h(n), the cost of moving from the node n to the goal, simply by summing them: f(n)=g(n)+h(n) -f(n) is the estimated cost of the cheapest solution through node n .-Minimizes the total estimated cost, and is optimal if h(n) is an admissible heuristic(i.e. no overestimation of reaching goal).-Avoid expanding paths that are already expensive (consider the lowest g(n)).

Using A* to move from A to I

For the A* search above: along any path from the root, the f-cost never decreases. This fact holds true for almost all admissible heuristics. A heuristic with this property is said to exhibit monotonicity.

If f never decreases along any path out of the root, we can conceptually draw contours in the state space.

Because A* expands the leaf node of lowest f, an A* search fans out from the start node,

27

Page 28: Artificilal 4101 Notes

adding nodes in concentric bands of increasing f-cost.

In A*, the first solution found must be the optimal one because nodes in all subsequent contours will have higher f-cost and hence higher g-cost.

A* search is also complete because as we add bands of increasing f, we must eventually reach a band where f is equal to the cost of the path to a goal state. A* is optimally efficient for any given admissible heuristic function.

Complexity of A* The catch with A* is that even though its complete, optimal and optimally efficient, it still can't always be used, because for most problems, the number of nodes within the goal contour search space is still exponential in the length of the solution.

Similarly to breadth-first search, however, the major difficulty with A* is the amount of space that it uses.

Local Search Algorithms (Iterative improvement search)

-Only interested in solution, not path.-Employed in optimization problems, e.g. Find the min(max) objective function. Local search begins from an arbitrary state in the search space and looks for an improvement in the neighborhood of that state, until no improvement can be found. (Iterative improvement)

Local search algorithms: keep a single "current" state, try to improve it according to an objective function.

Local Search Algorithms employ special heuristics. A heuristic is a simplification or educated guess that limits the search for

solutions in domains that are difficult or poorly understood. It cannot be computed from problem definition itself A heuristic h(n) is admissible if for every node n,

h(n) ≤ h*(n), where h*(n) is the true cost to reach the goal state from n. An admissible heuristic never overestimates the cost to reach the goal, i.e., it is

optimisticExample: hSLD(n) (never overestimates the actual road distance)Why Local Search?

Uses little memory Finds reasonable solutions in large infinite spaces

Examples of Local Search Algorithms1. Hill Climbing Search Algorithm (First Choice Hill Climbing)

function Hill-Climbing( problem) returns a state that is a local maximuminputs: problem, a problemlocal variables: current, a node

28

Page 29: Artificilal 4101 Notes

neighbor, a nodecurrent Make-Node(Initial-State[problem])loop doneighbor a highest-valued successor of current

if Value[current] < Value[neighbor] then current neighborelse return STATE[current]end

-This is a simplification of Best-First Search -Just keep track of the one state you are considering, and the path that got you there from the initial state.-At every state choose the state that leads you closer to the goal (according to the heuristic estimate), and continue from there.

Hill climbing search works well (and is fast, and takes little memory) if an accurate heuristic measure is available in the domain, and if there are no local maxima. In general the idea is to:Start from initial state, loop over operators, generate a state for the current operator. If the newly generated state is better than the current state, go to that state, repeat until a goal is found.

Dis-Advantages : susceptible to local maxima, plateau's and ridges.Remember, from any given node, we go to the next node that is better than the current node. If there is no node better than the present node, then hill climbing halts, i.e. has found a solution.

Hill-climbing is a greedy strategy. The upside is it may make rapid progress towards the best state. The downside is that it can halt at a local maximum. A local maximum is a state that is better than all its successors but worse than the global maximum.

One workaround is to allow sideway moves, i.e. to allow the algorithm to choose a successor that has the same value as the current state. This can help a search get off a shoulder (why?), but the search becomes infinite on a plateau (why?). So a limit must be placed on the number of consecutive sideways moves.

The most sophisticated versions of this idea go under the name tabu search. Tabu search is hill-climbing search, often with complete-state formulations, in which sideways moves are allowed. However, a fixed size memory is maintained of the most recently visited states. This is called the tabu list. The current state may not be replaced by any state thatis currently on the tabu list.

Allowing sideway moves still doesn’t solve the problem of other kinds of local maxima.One possible solution is random re-start hill-climbing. In random re-start hill-climbing, a series of hill-climbing searches are executed from randomly-chosen initial states. In the limit (if enough random re-starts are done), this should find an optimal solution.

29

Page 30: Artificilal 4101 Notes

Lecture 7

Natural Language ProcessingNatural languages - the languages that people speak, e.g. English, Japanese, Swahili, etc as opposed to artificial languages like programming languages or logic.

Prolog ProgrammingProlog - Short for Programming in logic; based on an idea to use logic as a programmingLanguage.Often thought of as an AI programming language, useful for solving problems involving objects and relations between them;A declarative language, i.e. used to describe problems, describes what instead of howWorks at a higher level than procedural languages like CNot suitable for problems involving a lot of numeric calculation.

References

SWI Prolog Users Manual available athttp://www.swi-prolog.org/dl-doc.htmlSWI Prolog web site http://www.swi-prolog.org/

A prolog program is made out of facts & rulesFacts – state properties that are true of the system being described, & consist of one predicate e. g. male(george).

Rules – give us ways of deducing new facts from existing onesA rule consists of a predicate, followed by :- symbol, followed by a list of predicates separated by ,. e. g. likes(ann,X):- toy(X), play(ann,X).

Head BodyNB Any predicate defined in the body must either be:

Defined somewhere else in the program One of Prolog’s built in predicates

A Prolog program is often called a knowledge base as it gives information about a system & each line in the program is called a clause

Suppose the following information, representing a family tree:

30

Page 31: Artificilal 4101 Notes

To represent the above family tree in Prolog we use facts.parent(pete,ian). %Pete is the parent of Ianparent(ian,peter).parent(ian,lucy).Parent(cathy,ian)etc.

Notice the full stop at the end of every line, and that the relation name starts with a lower case letter as do its arguments (constants or atoms). Prolog is case sensitive.

We could also store information about the gender of people inthe family tree as follows.

female(cathy).female(lucy).male(ian).etc.

giving The Family Tree Program/* Family Tree */parent(pete,ian). % Pete is a parent of Ianparent(ian,peter).parent(ian,lucy).parent(lou,pete).parent(lou,pauline).parent(cathy,ian).

female(cathy). % Cathy is female.

31

Page 32: Artificilal 4101 Notes

female(lucy).female(pauline).female(lou).

male(ian). % Ian is male.male(pete).male(peter).

CommentsIn Prolog there are two ways to do this.% Ian is male.% is a comment on this line only./*Family TreeDescribes family relationships*/Where /* denotes the beginning of the comment and*/ denotes the end of the comment.-Used for comments spanning more than one line.

-Type this into a text editor & save it as a Prolog file

Querying the Family Tree Program Assume the above facts have been stored as a Prolog program and loaded into a Prolog interpreter. To query this program we can do the followingIs Ian a parent of Lucy??- parent(ian,lucy).yes

Is Ian a parent of Pauline??- parent(ian,pauline).no

Is Cathy a parent of Peggy? Note Peggy does not appear at all in our program.?- parent(cathy,peggy).no

Who is Lucy's parent??- parent(X,lucy).X = ian ?yesNote variables begin with upper case letters (here X).

32

Page 33: Artificilal 4101 Notes

Who are Lou's children??- parent(lou,X).X = pete ? ;X = pauline ? ;no

A second or further solution can be requested by typing a semi-colon. When no further solutions exist Prolog answers`no'.

Find the people who are parents of both Lucy and Peter.?- parent(X,lucy),parent(X,peter).X = ian ? ;no

Querying a program: summaryArguments to relations can be concrete objects, known as atoms e.g ian, or cathy (starting with a lower case letter), or variables eg X or Y (starting with an upper case letter).Questions to the system consist of one or more goals.If a positive answer to the question is obtained we say the goal was satisfiable and the goal succeeded. If we obtain a negative answer the goal was unsatisfiable and it failed.

Evaluating RulesSuppose we add the following rule to our family tree program:mother(X,Y):- parent(X,Y),female(X).

interpreted as:-For all X and Y, X is the mother of Y, if X is the parent of Y and X is female.

Then a query like: Is Cathy the mother of Ian??- mother(cathy,ian).yes

33

Page 34: Artificilal 4101 Notes

Since there are no facts about mothers in the program, we must use the rule about mothers.mother(X,Y):- parent(X,Y),female(X).

Applying the rulemother(X,Y):- parent(X,Y),female(X). to the query mother(cathy,ian), Prolog instantiatesX=cathy and Y=ian. In the body X and Y are replaced by cathy and ian respectively.

mother(cathy,ian):-parent(cathy,ian),female(cathy).

The new subgoal is now parent(cathy,ian),female(cathy).Both parent(cathy,ian) and female(cathy) appear in the program as facts so Prolog will answer yes.

e.g. 2. Is Pete the mother of Ian??- mother(pete,ian).no

Now, Prolog instantiates X=pete and Y=ian.mother(pete,ian):-parent(pete,ian),female(pete).

The new subgoal is now parent(pete,ian),female(pete).Whilst parent(pete,ian) appears in the program as a fact so this subgoal is successful, female(pete) fails so Prolog will answer no.

Other Examples of Rules using grand parents

Grandparents are parents of parents.grandparent(X,Z):- parent(X,Y),parent(Y,Z).

Siblings are people with the same parent.sibling(X,Y):- parent(Z,X),parent(Z,Y).

34

Page 35: Artificilal 4101 Notes

ExerciseUsing the family tree program, expand it to include the following relations:SisterAuntUncleCousin

Summary A Prolog program is constructed from facts and rules. Facts describe things that are true without conditions - like data in a database. Rules describe things that hold depending on certain conditions. Prolog programs can be queried using questions. Prolog clauses are facts, rules and questions. Queries are answered by instantiating variables, creating new subgoals from rules

and matching with facts.

Recursion in Prolog using the Predecessor RelationConsider the family tree from the previous lecture. We'd like to be able to define predecessor: a parent, or a parent of a parent, or a parent of a parent of a parent, or

predecessor(X,Z):- parent(X,Z).predecessor(X,Z):- parent(X,Y),parent(Y,Z).predecessor(X,Z):-parent(X,Y1),parent(Y1,Y2),parent(Y2,Z)., but very lengthy.

Ideally from:Offspring's point of view:You are my predecessor if either you are my parent, or you are a predecessor of my parentPredecessor's point of view:

35

Page 36: Artificilal 4101 Notes

I am your predecessor if I am your parent, or I am a parent of your predecessor

Use of Recursion to Rep PredecessorThe predecessor's point of view can be expressed aspredecessor(X,Z):- parent(X,Z).predecessor(X,Z):-parent(X,Y),predecessor(Y,Z).This type of definition is called a recursive definition. They are very important in Prolog. A set of clauses referring to the same relation is known as a procedure.

Recursive Querying“Who are the predecessors of Lucy?” is posed as follows.| ?- predecessor(X,lucy).X=ian ? ;X=pete ? ;X=lou ? ;X=cathy ? ;no

Natural Language Processing (NLP)

Defn. 1 A subfield of AI & linguistics, that studies the problems of automated generation & understanding of natural human language.

Defn2Use of computers to interpret & manipulate words as part of a language.

Applications of NLP:

1. User interfaces -- better than obscure command languages. It would be nice if you could just tell the computer what you want it to do. Of course we are talking about a textual interface -- not speech.

2. Knowledge-Acquisition – use of programs to read books and manuals or the newspaper, so as to avoid to explicitly encode all of the knowledge needed to solve problems or do whatever needs to be done.

3. Information Retrieval -- find articles about a given topic. Program has to be able to somehow determine whether the articles match a given query.

4. Translation -- it sure would be nice if machines could automatically translate from one language to another. This was one of the first tasks they tried applying computers to. It is very hard.

Linguistic levels of Language Analysis

The lowest level of language analysis deals with the actual properties of the signal stream, which includes:

36

Page 37: Artificilal 4101 Notes

phonology - speech sounds and how we make themmorphology - the structure of wordssyntax - how the sequences are structuredsemantics - meanings of the strings

There are important interfaces among these levels. For example sometimes the meaning of sentences can determine how individual words are pronounced.

Language can be more efficient by not having to say the same thing twice, so we have pronouns and other ways of making use of what has already been said:

e.g. A bear went into the woods. It found a tree.

Also, since language is most often used among people who are in the same situation, it can make use of features of the situation ("pragmatics"), i.e.:

this/thatyou/me/theyhere/therenow/then

Basic Types of Sentencesstatementsimperativesquestions

Issues in Language Syntax

Words in a sentence can be more or less naturally grouped into what are called "phrases", and those phrases can often be treated as a unit. e.g. In the sentence "The dog chased the bear," the sequence "the dog" forms a natural unit. The sequence "chased the bear" is a natural unit, as is "the bear". For example "the dog" could replaced by:

Snoopy (a name)It (a pronoun)My brother's favorite pet (a more complex description)

& "chased the bear"? can be replaced by:died (a single word)was hit by a truck (a more complex event)

This basic structure, in English, is sometimes called the "subject-predicate" structure. The subject is a nominal, something that can refer to an object or thing, the predicate is a "verb phrase", which describes an action or event.

37

Page 38: Artificilal 4101 Notes

Noun Phrase –can have a determiner, zero or more adjectives, and a noun, maybe followed by another phrase, like:

the big dog that ate my homework

Verb phrases can have complicated "verb groups" like will not be eaten

Adjuncts – add more information about what is going on e. g.it died yesterday (gives time)it died in the garage (gives location)it died because nobody fed it (gives reason)

Syntactic theories - try to predict and explain what patterns are used in a language. The general idea, in English, is that a sentence consists, of a subject and a predicate. A predicate is a verb followed by one or more prepositional phrases.

Characterization of Syntactic Structure (Using Context Free Grammars)

Context Free Grammars - the rewrite operation that they describe doesn't depend on any context in which the left-hand symbol occurs. One approach to characterizing syntactic structure involves giving rules to describe how phrases can be generated. e.g.

S -> NP VPNP -> Det {Adj} NounVP -> Verb {NP} {PP}PP -> Prep NP

{}, means that it is optional.

Assuming that we have a "lexicon" of words, with their categories represented, these rules could be used to generate some syntactic structures that sentences may exhibit.

Suppose we add this rule: NP -> Det {Adj} Noun {PP}

For example, "the man on the dock". This gives rise to the possibility that two sentences with the same sequence of words could be grouped differently.

I saw the man with a telescope.

These different configurations can be associated with different meanings. This is called "syntactic ambiguity." Ambiguity is when a word or sentence can be taken as having more than one distinct meaning. For example some words have more than one meaning:

I went to the bank.

Different meanings of words can cause sentences to be understood in very different ways:

38

Page 39: Artificilal 4101 Notes

I saw her duck.Flying planes can be dangerous.

Agreement: *She saw himself.Complements: *He put the block.Case: *They saw she.

To solve this, rules need to specify more than just what tree configurations can occur, but must somehow indicate constraints that hold among the elements in the tree.

Parsing

A parsing program is a search through the space of possible structural characterizations of the sentence, constrained by the fact that the structural characterization must be compatible with the given sequence of words.

The general idea of parsing with a set of context-free rules (can be extended to deal with non context free) is to start generating possible tree structures, until a rule generates a lexical category. This is then checked against the next word in the sentence. If it is of the appropriate category, the parse continues. If not, the parser must explore another node in the search space. Suppose we consider the general idea that there is a general theory of phrase structures where:

X -- lexical category (noun, preposition, verb)X' -- "modified" lexical category (with complements)X'' -- "specified" lexical category.

Constraints can be specified among phrases built up this way. And restrictions on movement can be stated.

e.g. 1. Consider the Grammar

S -> NP VPNP -> Det NounVP -> Verb {NP} {PP}PP -> Prep NP

Suppose we are parsing: The dog barked in the yard.

We assume we have sentence S, so we start with the tree:SWe expand it using the ruleNP VPWorking from left to right, we expand the NP node:Det Noun

39

Page 40: Artificilal 4101 Notes

Now "Det" is a lexical category, so we look at the first word of the sentence, it is indeed a determiner, so we continue. The next category "Noun" is also a lexical category, so we check, and succeed.

Now we come to a non-lexical category, VP, so we find a rule for that. This rule has optional constituents, so we treat each optional possibility as a separate node. Our first assumes that both are optional:

VP -> VerbAnd we create a node for each of the other possibilities:

VP -> Verb NPVP -> Verb PPVP -> Verb NP PP

The first node predicts a verb and one is there so we continue. However that rule says we should be done, and we aren't yet, so it fails, and we go back to the next node. This one also predicts a verb, so we continue. We expand and NP node which predicts a Determiner, but there is none there, so that one fails. The next node predicts a verb, and we expand the PP node to predict a preposition, which is what is there, and we continue on until the sentence is finished.

e. g 2. Consider the Context Free Grammar:S -> AAA -> AAA | bA | Ab | a

A parse tree of the string bbaaaab would be:

The only rule for formation of parse trees is: Every non-terminal sprouts branches leading to every symbol in the right side of the production that replaces it.

The general idea in parsing is what is called "top down" parsing, which is a depth-first search down the left side of the tree until a lexical category is predicted. This is compared with the next word in the sentence. If it matches the category a replacement is made & parsing continues.

Context Sensitive Grammars (Read About**)

40

Page 41: Artificilal 4101 Notes

To handle non-context-free phenomena, a context-free parser is sometimes augmented with some additional tests or operations to perform after the parser succeeds on the context-free operation to possibly eliminate some sentences. For example we might have:

S -> NP VP (= (number NP) (number VP))

Where 'number' returns whether is argument is singular or plural. Of course we will have to augment our representation of the syntactic structure somehow to record this and other potentially relevant syntactic properties. We will see a specific example of this nexttime, when we examine a parser that uses the machinery we developed for proving theorems.

Issues in Semantics

The reason people are interested in syntax is that the structure of a sentence is presumably related somehow to the meaning that it conveys.

To some degree, the meanings of nouns and noun phrases can be understood with the sorts of knowledge representation ideas we have already looked at. The idea of hierarchies of objects can be extended to the idea of hierarchies of actions and events. In the theory of "conceptual dependency" the claim is that the relations among complex events can be obtained by composing them out of more simple events.

A key idea in representing events is that certain kinds of events have specific "participants". For example a "buy" event has a buyer and a seller and a thing bought. A "move" event has the thing that moves and possibly an initial and a final location and maybe path along which the motion happens.

Issues in Pragmatics

Pragmatics refers to how contextual resources are used to work out the specific meanings of sentences. Sometimes the contextual resources are linguistic, for example referring expressions, and sometimes they are part of the speech situation, for example the speaker and hearer, and the time and place of the utterance.

So for example we have in English the difference between "definite" and "indefinite" reference. An "indefinite" expression gives a description and is often used to indicate that an object satisfying that description is to be newly introduced into the discourse. A"definite" referring expression is used to refer back to a previously mentioned entity. So in:

A bear came to our campsite last night.The bear was eating our garbage.It scared my brother.

41

Page 42: Artificilal 4101 Notes

The first expression "a bear" is indefinite, it introduces the entity to the store. "The bear" is definite. Refers to previously introduced bear. So does "it". All of this requires some notion of a "structure" or "context" in which referring expressions are introduced.

The discourse situation must be represented also, for many references to be understood. For example we need to represent the speaker and hearer, and perhaps onlookers, if we are to work out the intended referents of "me" and "you" and "us" and "them". Also the times of "now" and "yesterday", and the locations of "here" and "there".

Issues in Discourse

The next level of analysis is called "discourse theory". This is about the higher level relations that hold among sequences of sentences in a discourse or a narrative. It merges sometimes with literary theory, but also with pragmatics.

Information Retrieval

An important application of natural language processing is in the area of "information retrieval". In this field we aren't as much interested in working out the linguistic details of texts, but we are interested in finding information about some topic.The general model is this:

a huge database of articles (like an encyclopedia)a user "query", like "find me articles about “Natural Language Processing"

The goal is that the computer be able to figure out which articles in the database are relevant to your query.

There are two ways commonly used to assess the success of an information retreival system:Relevance: all relevant articles are foundAccuracy: all articles found are relevant

Note that it is easy to make a system that is very high in the "relevance" index -- simply return all articles in the database. Clearly this will return all relevant ones, but of course this isn't very useful. Accuracy is important, but much more difficult.

One approach to information retrieval is to use the NLP ideas described above to make a program that parses the text of all of the articles, and represents their meanings, and then does the same thing for a query. The problem with this approach is that few of thelinguistic issues needed to really do this have been solved. Parsing alone is hard, and that is one of the best understood things.

Keyword Information Retrieval

42

Page 43: Artificilal 4101 Notes

In this approach, we forget the linguistic structure, and just work with keywords. The idea is to take the query and to find articles that contain as many of the words in the query as possible.

Usually such systems use a "stemmer" to convert word forms into base words, and use them in both the query and articles to match. So we wouldn't search for "activities" But for "activity". Also we remove common words like "the" and "of".

This simple keyword approach works often, but will sometimes fail to do well in the accuracy measure. For example suppose I am looking for articles on "free markets". The problem is that lots of articles will contain the words "free" and "market" that don't say anything about free markets, for example there will be articles about free stuff youcan get at supermarkets. So some Information Retrieval systems allow you to specific ordering constraints, or boolean combinations of index k

SummaryNatural language is a very rich and powerful communication medium. If we are to build systems which can utilize this medium, we must analyze language at several levels, i.e.:syntax: what is the structure of a sentence?semantics: what is the meaning of a sentence (in isolation)?pragmatics: contextual influence of meaningdiscourse: how can a sentence be interpreted in context?dialog: how is language used to exchange information?

Machine LearningLearning Defn1 “changes in a system that enable a system to do the same task more efficiently the next time." --Herbert Simon Defn2 "Learning is constructing or modifying representations of what is being experienced." --Ryszard Michalski Defn3 "Learning is making useful changes in our minds." --Marvin Minsky

Machine learning is the programming of computers to optimize a performance criterion using data or past experience.

Appropriate when: Human expertise does not exist or cannot be used(navigating on Mars), Humans are unable to explain their expertise (speech recognition) Solution changes in time (routing on a computer network) Solution needs to be adapted to particular cases (user biometrics) There is need to expand the domain or expertise and lessen the "brittleness" of the

system Environments change over time, and machines need to adapt to changes so as to

reduce the need for constant redesign.

43

Page 44: Artificilal 4101 Notes

Example Applications of Machine Learning

1. Oil Industry (Separating oil from gases)Crude oil is often mixed with natural gas when it is extracted from the ground, and the two must be separated prior to refining. Finding the ideal settings to control the separation process is a complex task. British Petroleum used ML to create a set of rules for setting the control parameters. This enabled the task to be performed in just 10 minutes, whereas it had previously taken human experts more than a day.

2. Chemical Process Control (Manufacture of nuclear fuel pellets)Westinghouse's process for manufacturing nuclear fuel pellets is controlled by numerous control parameters that interact in complex ways. Incorrectly set, the process's throughput and yield will be low. ML was used to create a set of rules for controlling the manufacturing process. Its use in 1984 benefited Westinghouse by more than ten million dollars per year.

3. Predicting Drug ActivityIn order to design new drugs with a certain desired biological activity, or to understand the mechanisms underlying the activity (or non-activity, e.g., non-toxicity) of known drugs, it is necessary to discover the relationships between chemical structure and the activity of interest. Relationships discovered from experimental data are called SARs (Structure Activity Relationships). Because of the complex 3-D shapes involved, manual SAR analysis is infeasible except in the simplest cases. A particular type of ML, inductive logic programming (ILP), has proven particularly useful in discovering SARs because it directly reasons about the 2-D or 3-D structure of the drugs in addition to their physico-chemical properties.

4. Loan Application ScreeningIn the 1980s American Express (UK) used statistical methods to divide loan applications into three categories: those that should definitely be accepted, those that should definitely be rejected, and those which required a human expert to judge. The human experts could correctly predict if an applicant would, or would not, default on the loan in only about 50% of the cases. ML produced rules that were much more accurate – correctly predicting default in 70% of the cases – and were immediately put into use.

Components of a Learning System

Critic SensorsENVIRONMENT

feedback

Learning Element Performance Element Effectors

44

Page 45: Artificilal 4101 Notes

Learning goals ExperimentsProblem Generator

Learning Element makes changes to the system based on how it's doing Performance Element is the agent itself that acts in the world Critic tells the Learning Element how it is doing (e.g., success or failure) by

comparing with a fixed standard of performance Problem Generator suggests "problems" or actions that will generate new

examples or experiences that will aid in training the system further

In designing a learning system, there are four major issues to consider: components -- which parts of the performance element are to be improved representation of those components feedback available to the system prior information available to the system

Evaluating Performance of a Learning AgentThere are several possible criteria for evaluating a learning algorithm:

Predictive accuracy of classifier Speed of learner Speed of classifier Space requirements

**Most common criterion is predictive accuracy Major Paradigms of Machine Learning

1. Rote LearningOne-to-one mapping from inputs to stored representation. "Learning by memorization." Association-based storage and retrieval.

2. InductionUse of specific examples to reach general conclusions

3. ClusteringUnsupervised, inductive learning in which "natural classes" are found for data instances, as well as ways of classifying them.

4. AnalogyDetermine correspondence between two different representations Inductive learning in which a system transfers knowledge from one database into that of a different domain.

5. DiscoveryUnsupervised, where specific goal not given

45

Page 46: Artificilal 4101 Notes

6. ReinforcementOnly feedback (positive or negative reward) given at end of a sequence of steps. Requires assigning reward to steps by solving the credit assignment problem--which steps should receive credit or blame for a final result?

Supervised versus Unsupervised learningWant to learn an unknown function f(x) = y, where x is an input example and y is the desired output. Supervised learning assuimes we are given a set of (x, y) pairs by a "teacher."Unsupervised learning means we are only given the xs. In either case, the goal is to estimate f.

A/ Supervised learning is a machine learning technique for creating a function from training data. The training data consist of pairs of input objects (typically vectors), and desired outputs. The output of the function can be a continuous value (called regression), or can predict a class label of the input object (called classification). The task of the supervised learner is to predict the value of the function for any valid input object after having seen a number of training examples (i.e. pairs of input and target output). To achieve this, the learner has to generalize from the presented data to unseen situations in a "reasonable" way.

Steps in solving a Supervised learning problem

1. Determine the type of training examples. Decide what kind of data is to be used as an example. For instance, this might be a single handwritten character, an entire handwritten word, or an entire line of handwriting.

2. Gathering a training set. The training set needs to be characteristic of the real-world use of the function. A set of input objects is gathered and corresponding outputs are also gathered, either from human experts or from measurements.

3. Determine the input feature representation of the learned function. The accuracy of the learned function depends strongly on how the input object is represented. Typically, the input object is transformed into a feature vector, which contains a number of features that are descriptive of the object. The number of features should not be too large, but should be large enough to accurately predict the output.

5. Determine the structure of the learned function and corresponding learning algorithm. eg use artificial neural networks or decision trees.

6. Complete the design. The learning algorithm is run on the gathered training set. Parameters of the learning algorithm may be adjusted by optimizing performance on a subset (called a validation set) of the training set, or via cross-validation. After parameter adjustment and learning, the performance of the algorithm may be measured on a test set that is separate from the training set.

46

Page 47: Artificilal 4101 Notes

B/ Unsupervised learning: The data have no target attribute. We want to explore the data to find some intrinsic structures in them. Clustering is a technique for finding similarity groups in data, called clusters. i.e., it groups data instances that are similar to (near) each other in one cluster and data instances that are very different (far away) from each other into different clusters. It is a form of unsupervised learning Due to historical reasons, clustering is often considered synonymous with unsupervised learning.

A clustering algorithm needsA distance (similarity, or dissimilarity) functionInter-clusters distance - maximizedIntra-clusters distance - minimized

K Means Clustering Algorithm

K-means is a partitional clustering algorithm

Let the set of data points (or instances) D be {x1, x2, …, xn}, where xi = (xi1, xi2, …, xir) is a vector in a real-valued space and r is the number

of attributes (dimensions) in the data.

The k-means algorithm partitions the given data into k clusters. Each cluster has a cluster center, called centroid.k is specified by the user K means algorithm works as follows1) Randomly choose k data points (seeds) to be the initial centroids, or cluster centers2) Assign each data point to the closest centroid3) Re-compute the centroids using the current cluster memberships.4) If a convergence criterion is not met, go to 2).

47

Page 48: Artificilal 4101 Notes

Stopping Criteria1.no (or minimum) re-assignments of data points to different clusters, 2.no (or minimum) change of centroids, or 3.minimum decrease in the sum of squared error (SSE),

Strengths: -Simple: easy to understand and to implement-Efficient: Time complexity: O(tkn),

where n is the number of data points, k is the number of clusters, and t is the number of iterations.

-Since both k and t are small. k-means is considered a linear algorithm. -K-means is the most popular clustering algorithm.

Developing an Artificial Intelligence SystemSoftware engineering methods have been used to address software engineering problems, but these seem to fall short in addressing AI problems.

A comparison between these two forms of problems may help us appreciate some of the reasons why there is need for appropriate tools for use with AI problems.

Software Engineering Problems AI ProblemsStatic in nature Dynamic in NatureContext Free Context sensitiveSolutions to problems can either be classified as correct or incorrect

Solutions can be classified as adequate or inadequate

System completely specifiable Problem not completely specifiableAvailable tools suitable for problems with reliable & static data

Techniques deal with problems

From the comparison above we point out specific issues to be addressed during the process of building AI systems.

During the project conception, it is important to note the following in these aspects:Problem Selection – Can the problem be solved, do we have enough information in that area.Human Factors- It is important to identify(benefactors) of the system

48

Page 49: Artificilal 4101 Notes

Performance Objectives- Objectives may not be possible to state initially, but some requirements may have to be stated in terms of the performance expected from the system.Prototyping- A throw-away prototype can help to assess the feasibility & likely costs of a full-scale version of the system.

The following stages can be followed in coming up with an AI system.

1) Read about technologies, which help you to implement the type of AI system.2) Define Goals for Your System.3) Select prototype. (Prototyping is a best way of developing an AI system)4) Write detailed technical specification of the system.5) Accomplish Database logical and physical design.6) Code the prototype.

1) Read general documentation on AI (Initial Survey)You need to understand basic concepts of artificial intelligence development before you start development process.

2) Define the Goals of your System

Reaching goals is the aim of any AI system, therefore it is necessary to define goals of the system before you can go about creating a program to implement it.Define what will be the useful outcome of your system. e. g.

1. System will be able to answer to questions, or2. Translate text from one language to another, or3. Determine if a child has learning problems in arithmetic …

To fully understand the system, the goal can be further split into sub-goalse.g. for Goal 3 SUB -GOALSDetermine if child has learning problems in ADDITIONDetermine if child has learning problems in SUBTRACTIONDetermine if child has learning problems in MULTIPLICATIONDetermine if child has learning problems in DIVISION

3) Select prototype (Working model of your system)

A Prototype allows us to quickly implement and test a new idea. There are two major reasons for implementing prototype:1) Prototype achieves useful results in a relatively short time.2) Prototype helps to find out mistakes in the model and therefore to improve the model.

Prototyping is the important step in development. You need to restrict amount of functionality, which, you are going to implement. It is hard to implement even restricted AIS functionality. It’s practically impossible to implement full True AIS functionality in one step. Carefully select a small amount of functionality which:

49

Page 50: Artificilal 4101 Notes

- Requires restricted amount of new development.- Give visible result(s).

4) Write detailed technical specification of the prototype (system)Write technical specification about selected prototype, i.e.

Describe prototype functionality and how it will be achieved. Describe all modules, which should be implemented in the prototype. Create list of all tasks, which should be accomplished.

5) Accomplish Database logical and physical designDefine the exact structure of the database (main memory). The dbase is made up of the concepts (facts) stored in a concept table and a cause effect table

A Concept – is a general idea derived or inferred from outer world, e.g. earth, sun, solar system etc. A concept can either be a cause concept or an effect concept. A concept becomes meaningful only if it has relations with other concepts. Every concept in concept table is bound with other concepts by a Cause-Effect Relation table.

Cause-Effect relationA Cause-effect relation is a relation between cause-concept and effect-concept.Cause-effect relation is represented in the database by a cause-effect relation table.

Example:“Sun” is a cause for “heat”.“Fire” is a cause for “heat”.“Sun” is a cause for “sunburn”.So, there are 3 cause-effect relations in this example:{Sun->heat}{Fire->heat}{Sun->sunburn}

Why are cause-effect relations so important?

Cause-effect relations are so important because:1) Cause-effect relations help to understand what would happen as a result of current situation. Cause effect relations help to predict the future of current context.

50

Cause-effect relation example

Sunburn

HeatFire

Sun

Consequence conceptsReason

concepts

Page 51: Artificilal 4101 Notes

2) Cause-effect relations help to understand what an AIS can do in order to achieve specified goals. In order to figure out what to do, an AIS should just find cause concepts for the specified goal-concepts

Example (based on diagram above):

1) Let’s imagine that the AIS wants to find out what would be the result of the sun. In order to figure out that, the AIS should follow cause-effect relations and find out that probable results are “Heat” and “Sun-Burn”.2) Let’s imagine that the current goal of the system is “Heat”. In order to achieve this goal the system should follow cause-effect relation in reverse direction and find out that “Fire” and “Sun” concepts could help to achieve the current goal,“Heat”.

Concept table

Field Description TypeConceptId Key field Unsigned Int (4 bytes)Strength Indicates how persistent concept is Float (8 bytes)Desirability Indicates how desirable is the concept Float (8 bytes)Type Indicates type of concept:

simple concept word phrase peripheral device

Byte

CreationDate Date & time when this concept was created DateTime (8 bytes)ModificationDate The latest date of modification DateTime (8 bytes)

 Cause-Effect Relation table

Column Description TypeCauseConceptID Reference to a cause concept Unsigned Int (4 bytes)EffectConceptID Reference to an effect concept Unsigned Int (4 bytes)Coherence Indicates coherence level between the

cause concept and the effect conceptInt (4 bytes)

CreationDate Date when this concept was created DateTime (8 bytes)ModificationDate Date when this concept was updated last

timeDateTime (8 bytes)

Define all additional tables such as word dictionary and phrase dictionary.

Word dictionary is dictionary of single wordsPhrase dictionary is the dictionary of phrases. A phrase is a short sequence of words (2-3) e. g. binomial theorem; adding two numbers etc.

51

Page 52: Artificilal 4101 Notes

6) Code the prototypeCode the prototype using an appropriate language, test it and fix mistakes.The following tests should be done: -Validation – Is the system contributing to the organization’s success?Verification – Does the system meet specifications in that domain?Evaluation – Is the system contributing to the organization’s success?

52