Top Banner

of 41

Combined a.I Note

Apr 10, 2018

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/8/2019 Combined a.I Note

    1/41

    1

    Principles andConcepts of Machine Intelligence

    COVENANT UNIVERSITYCOLLEGE OF SCIENCE AND TECHNOLOGY

    ARTIFICIAL INTELLIGENCE LECTURE NOTE

    DR. J.O. DARAMOLA

    Department of Computer and Information Sciences

    COPYRIGHT 2009

  • 8/8/2019 Combined a.I Note

    2/41

    2

    Introduction

    AI is one of the newest sciences. Work started in earnest soon after World War 11 , andthe name itself was coined in 1956. Along with molecular biology, AI is regularly citedas the "field I would most like to be in" by scientists in other disciplines. A student in

    physics might reasonably feel that all the good ideas have already been taken byGalileo, Newton, Einstein, and the rest. AI, on the other hand, still has openings for several ground breaking contributions.

    AI currently encompasses a huge variety of subfields, ranging from general-purposeareas, such as learning and perception to such specific tasks as playing chess, provingmathematical theorems, writing poetry, and diagnosing diseases. AI systematizes andautomates intellectual tasks and is therefore potentially relevant to any sphere of humanintellectual activity. In this sense, it is truly a universal field.

    What is A.I.?Some definitions of artificial intelligence has been organized into four categories i nFigure 1.

    Systems that think like humans"The exciting new effort to makecomputers think . . . machines with minds,in the full and literal sense." (Haugeland,1985) "[The automation of] activities that weassociate with human thinking, activitiessuch as decision-making, problemsolving, learning . . ." (Bellman, 1978)

    Systems that think rationally"The study of mental faculties through theuse of computational models." (Chamiak and McDermott, 1985) "The study of the computations that makeit possible to perceive, reason, and act."(Winston, 1992)

    Systems that act like humans"The art of creating machines that performfunctions that require intelligence when

    performed by people." (Kurzweil, 1990) "The study of how to make computers dothings at which, at the moment, people are

    better." (Rich and Knight, 1991)

    Systems that act rationally"Computational Intelligence is the study of the design of intelligent agents." (Poole et al., 1998)"A1 . . .is concerned with intelligent

    behavior in artifacts." (Nilsson, 1998)

    Figure 1: Some Definition of A.I

    A.I Problems Domain Early work focused on formal tasks like:

    o Game playing e.g. checkers-playing programs, where experience gainedthrough playing with opponents were used to improve performance

    o Theorem proving e.g. Logic Theorist which is an early attempt to prove mathematical theorem i.e. the first and second chapter of

  • 8/8/2019 Combined a.I Note

    3/41

    3

    principia mathematica ( a book on mathematics), Gelernters theorem(Geometry).

    o It appeared that computers could perform well at these formal taskssimply by being fast at exploring a large number of solution paths andthen selecting the best one- but this assumption turned out to be false

    since no computer is fast enough to overcome the combinatorialexplosion generated by most problemso General Problem Solver (GPS) (Newell et al., 1963) an attempt to

    model commonsense reasoning and symbolic manipulation of logicalexpressions.

    The initial drawback of the early attempts was that the programs were designedto handle large amount of knowledge. But as the AI research progressed andtechniques for larger amounts of knowledge were developed new tasks couldreasonably be attempted. these include:

    Some task Domain in A.I. Mundane Tasks:

    o perception (vision, speech)o Natural Language processing (understanding, generation, translation)o Commonsense reasoningo Robot control

    Formal taskso Games (Checkers, Chess, Ayo etc.)o Mathematics (Geometry, Logic, Calculus, Proving properties of

    Program) Expert Tasks

    o Engineering (Design, fault finding, manufacturing planning)o Scientific analysiso Medical analysiso Financial analysis

    Developments in AI

    1. Problem Solving and Planning : This deals with systematic refinement of goalhierarchy, plan revision mechanisms and a focused search of important goals.2. Expert Systems : This deals with knowledge processing and complex decision-making

    problems.3. Natural Language Processing : Areas such as automatic text generation, text

    processing, machine translation, speech synthesis and analysis, grammar and style

    analysis of text etc. come under this category.4. Robotics : This deals with the controlling of robots to manipulate or grasp objects andusing information from sensors to guide actions etc.5. Computer Vision : This topic deals with intelligent visualisation, scene analysis,image understanding and processing and motion derivation.6. Learning : This topic deals with research and development in different forms of machine learning.

  • 8/8/2019 Combined a.I Note

    4/41

    4

    7. Genetic Algorithms : These are adaptive algorithms which have inherent learningcapability. They are used in search, machine learning and optimisation.8. Neural Networks : This topic deals with simulation of learning in the human brain bycombining pattern recognition tasks, deductive reasoning and numerical computations.

    Application Areas of AIAutonomous Planning and Scheduling, Game Playing, Autonomous Control,Diagnosis, Logistics Planning, Robotics, Language Understanding and ProblemSolving

    Some Pertinent Questions in A.I. What are our underlying assumptions about intelligence? What kinds of technique will be useful for solving AI problems? At what level of detail, if at all, are we trying to model human

    intelligence? How will we know when we have succeeded in building an intelligent

    program? Underlying Assumption

    The physical symbol system hypothesis [ Newell and Simon , 1976]. The physical symbol system has the necessary and sufficient means for

    general intelligent actions. Physical symbol system consist of : Symbols which are physical patterns which occur as components of

    another entity called expression or symbol structure. Contains a collection of processes that operate on expressions to

    produce other expressions. This includes processes of creation,modification, reproduction and destruction.

    Therefore, a physical symbol system is a machine that producesthrough time an existing of symbol structures.

    Using the computer as a medium for this experimentation has beenfound to be true

    The importance of the physical symbols system hypothesis is two fold: It is a significant theory of the nature of human intelligence and so is

    of great interest to psychologists. It also forms the basis of the belief that it is possible to build programs

    that can perform intelligent tasks now performed by people.A.I. Techniques

    An AI technique is a method that exploits knowledge in solving a problem. Nature of Knowledge

    o It is voluminouso It is hard to characterize accuratelyo It is constantly changingo It differs from data by being organized in a way that corresponds to the

    way it will be used. Questions

  • 8/8/2019 Combined a.I Note

    5/41

    5

    o Are there techniques that are appropriate for the solution of a variety of AI problems?

    o Can these techniques be useful in solving other problems?- An AI technique must exploit knowledge in such a way that:

    The knowledge captures generalization

    It can be understood by people who must provide it It can easily be modified to correct errors and reflect changes in the world andin our world view

    It can be used in a great many situations even it is not totally accurate or complete

    It can be used to help overcome its own sheer bulk by helping to narrow therange of possibilities that must be considered.

    Note : that it is possible to solve AI problems using non-AI techniques (although thesolution are not likely to be very good). It is also possible to apply AI techniques tothe solution of non-AI problems.

    Characteristics of AI Techniques Search provides a way of solving important problems for which no more

    direct approach is available as well as a framework into which anydirect techniques that are available can be embedded

    Use of Knowledge Provides a way of solving complex problems by exploitingthe structures of the objects that are involved.

    Abstraction Provides a way of separating important features and variationsfrom the many unimportant ones that would otherwise overwhelm any

    process.Why develop AI programs?

    Why do we attempt to model human performance: To test psychological theories of human performance To enable computers to understand human reasoning To enable people to understand computer reasoning To exploit what knowledge we can glean from people

    How do we measure systems Intelligence?Turing Test (Alan Turing, 1950): A method to determine whether a system can think.

    Problem SpacesSolving AI problems require four things:

    Define the problem precisely: this must include the specification of the initialsituation(s) and the goal of the problem solving process

    Analyze the problem: the emergent features after analysis can help determinethe appropriateness of various possible techniques for solving the problem.

    Isolate and represent the task knowledge that is necessary to solve the problem. Choose the best problem-solving technique(s) and apply it (them) to the

    particular problem.

    State Space Definition of Problems

  • 8/8/2019 Combined a.I Note

    6/41

  • 8/8/2019 Combined a.I Note

    7/41

    7

    Although such a specification must somehow be provided before we can design a program to solve the problem, producing such a specification itself a very hard problem. Generally problem can be solved by using the roles in the state space search, in

    combination with appropriate control strategy, to move goal state is found.

    Therefore the process is search is fundamental to the problem solving process.

    Production SystemsConsists of: a set of rules, each consist of a left side ( a pattern) that determines the applicability

    of the rule and a right side that describes the operation to be performed, one or more knowledge database ( knowledge base) that contains relevant and

    appropriate information for the task. A control strategy that specifies the order in which the rules will be compared to the

    database and a way of resolving the conflict that arise when several rules match atones.

    a rule applier Examples: OPSS [Brownston et. al 1985], ACT * [Anderson 1983]- Expert System Shells

    - General Problem-solving architectures like SOAR [Liard et. al 1989]

    Control strategiesRequirements

    1. Causes motion: should be able to lead to a solution2. Systematic: should be based on structure that bring motion e.g. tree or graph

    that can facilitate effective search process.

    Searching TechniquesAlgorithm:Breath-First Search1. Create a variable called NODE-LIST and set to initial state

    2. Until a goal state is found or NODE-LIST is empty do:a) Remove the first element from the NODE-LIST and call it E.

    If NODE-LIST was empty, quit b) for each way that each rule can match the state described in E do:

    i) Apply the rule to generate a new stateii) If the new state is a good state, quit and return this stateiii) Otherwise, add the new state to the end of NODE-LIST

    Description Construct a tree with the initial state as its roots. Generate all offspring of the root

    by applying each of the applicable rules to the initial state. Thereafter, for each leaf node, generate all successors by the rules that are appropriate.

    Continue this process until some rule produces a goal state.

  • 8/8/2019 Combined a.I Note

    8/41

    8

    Exercise :Apply the breadth-first search to the water jug problem?

    Advantages of Breadth-First Search

    The algorithm therefore will not get trapped after exploring a wrong path If there is a solution the breadth first will find it, though it may take time. Also,if there are multiple solutions, the minimal solution will be found. This is

    because the longer paths are never examined until all the shorter ones have been examined.

    Depth-first Search Pursues a single branch of the tree until it yields a solution or until a decision to

    terminate the path is made (i.e. when it reaches a dead end, when the length of the path exceeds the Futility Limit). Thereafter backtracking occurs. It

    backtracks to the most recently created state from which alternative moves are

    available. Chronological backtracking because the ruler in which steps are undonedepends only on temporal sequence in which the steps were originally made.The most recent steps is always the first to be undone.

  • 8/8/2019 Combined a.I Note

    9/41

    9

    Algorithm: Depth-First Search

    First the initial state is a good state, quit and return success. Otherwise, do the following until success or failure is signaled:(a) Generate a successor, E, of the initial state. If there are no more successors,

    signal failure.(b) Call Depth-First Search with E as the initial state.(c) If success is returned, signal success, otherwise continue this loop.

    Advantages of Depth-First Search Requires less memory since only the nodes on the current path are stored, this

    contrast with breadth-first search, where all of the tree that has so far been

    generated must be stored. By chance (if care is taken in ordering the alternative successor states), may findsolution without much searching.

    In breadth first all nodes at level n must be examined before any node on leveln+1 can be examined. This is particularly significant if many acceptable solutionsexist. Depth-first search can stop when one of them is found.

    Disadvantage

    One-level of a Breadth-First Search Tree

    (0,0)

    (4,0) (0,3)

    (3,0)

    (0,0)

    (4,3) (0,0) (1,3)

    (4,0) (0,3)

    (4,3) (0,0)

    Two-level of a Breadth-First Search Tree

    (0,0)

    (4,0)

    (4,3)

    A Depth-First Search Tree

  • 8/8/2019 Combined a.I Note

    10/41

    10

    A wrong path may be followed, and it can get trapped if there are loops on that path.

    May find a solution on a loop path of a tree not necessarily the nominal path. Note: the Best-First Search combines the strengths of these two algorithms to achieve

    a better implementation.

    ExerciseDiscuss the traveling salesman problem?A salesman has a list of cities, each of which he must visit exactly once. There are

    direct roads between each pair of cities on the list. Find the route the salesmanshould follow for the shortest possible trip that both starts and finishes at any one of the cities.

    - Combinatorial explosion- Branch and bound strategy- NP problem.

    No of path among cities = (N-1)!

    Total time to perform the search = N!Branch and bound Technique : Begin generating complete paths, keeping track of the shortest path found so far. Give up exploring any path as soon as its partial length

    becomes greater than the shortest path found so far.

    Heuristic Search A heuristic is a technique that improves the efficiency of a search process,

    possibly by sacrificing claims of completeness. It is a control structure that is not guaranteed to find the best answer but will

    always find a good answer. Heuristics are rules of the thumbs that can guide for correctness unlike

    algorithms. Heuristics help to find good though non-optimal solutions to NP problems. There are general purpose heuristics and also domain-specific heuristics. e.g.

    Nearest neighbour heuristic, which works by selecting the locally superior alternative at each step.

    Error bounds (applicable to general purpose heuristics) Heuristics solves the problem of combinational explosion.

    Arguments in favour of Heuristics

    OS

    IB

    LA BE

    ON

  • 8/8/2019 Combined a.I Note

    11/41

    11

    Rarely do we actually need the optimum solution in the real world, a goodapproximation will usually serve very well

    Although Approximation produced by heuristics may not be very good in theworst case, worst cases rarely arise in the real world.

    Understanding why heuristics works or why it doesnt work, often leads to a

    deeper understanding of the problem. Domain specific heuristic knowledge can be incorporated into a rule-based search

    procedure by: Putting them in rules As heuristic function that evaluates individual problem states and determines

    how desirable they are. Heuristic Function is a function that maps from problem state description to

    measures of desirability, usually represented as numbers. The value of the heuristicfunction at any given node is meant to be as good an estimate as possible whether that node is on the derived path to a solution. It specifies the level of importance of that node to the path of solution. It is designed to efficiently guide a search process

    toward a solution. The purpose of the heuristic function is to guide the search process in the most profitable direction by suggesting which path to follow firstwhen more than one is available.

    Characteristics of Problems Decomposable problems: problems can be decomposed into smaller or easier

    components? Ignorable problems: solution steps can be ignored when considered not

    necessary (e.g. theorem proving) Recoverable: in which solutions steps can be undone Irrecoverable: in which solutions steps cannot be undone Certain-outcome: lead to definite outcome Uncertain-outcome: produces a probability to lead to a solution

    (the hardest problems to solve are those that are irrecoverable, uncertain-outcome) e.g. advising a lawyer who is defending a client who is standing trialfor murder

    Problems that require absolutely good solution and those that require relativelygood solution e.g. traveling salesman algorithm (Any-path/Best path) problem.

    Problem that require a solution as state or path.Characteristics of Production systems

    Production systems are a good way to describe the operations that are performed in a search for a solution to a problem.

    Questions Can Production system be described with characteristics that shed some light on

    how they can easily be implemented? If so, what relationship are there between problem types and types of production

    systems suited to solving the problems.Types of Production systems

  • 8/8/2019 Combined a.I Note

    12/41

    12

    Monotonic production systems: the application of a rule never prevents the later application of another rule that could have been applied at the time the first rulewas selected.

    Non-Monotonic P.S: lacks the above attribute of Monotonic system Partially Commutative P.S.: if the application of a particular sequence of rules

    transforms state x into state y, then any permutation of those rules that isallowable also transforms state x into state y. (i.e. the order of selection of theset of rules is not important)

    Commutative P.S: is both monotonic and partially commutative Partially commutative, monotonic systems are useful for solving ignorable

    problems. Problems that involve creating new things rather than changing oldones are generally ignorable. e.g. Theorem proving, and making deductionsfrom know facts.

    Non-Monotonic, Partially Commutative P.S. are good for problems wherechanges occur but can be reversed and in which order of operations is notcritical. e.g. Physical manipulation systems like robot navigation on flat plane

    Non-partially commutative P.S. are useful for problems in which irreversiblechanges occur e.g. declaring the process of chemical compound production(Chemical synthesis).

    Issues in the Design of Search Programs Every search process can be viewed as the traversal of a tree structure The node represents a problem state Each arc represents a relationship between the state nodes its connects In most programs the rules are not represented explicitly, rather the tree is

    represented implicitly in the rules and generates explicitly only those paths thatthey decide to explore.

    (Therefore, keep in mind the distinction between implicit search trees and theexplicit partial search tree that are actually constructed by the search tree thatare actually constructed by search program.)

    The direction of search (forward / backward reasoning.) Selection of applicable rules (matching) How to represent each node of the search process (Knowledge representation

    problem and the Frame problem)

    Search Tree/Graph The concept of search graph eliminates the repeated traversal of previously

    expanded and process nodes that may reappear. This makes the search space to be an arbitrary directed graph rather than a tree. The graph differs from a tree inthat several paths may come together as a node. (See Figure 1.3 below)

  • 8/8/2019 Combined a.I Note

    13/41

    13

    Heuristic Search Techniques Weak methods: are varieties of heuristic search techniques whose efficacy is

    dependent on the way they exploit domain-specific knowledge, since inthemselves they are unable to overcome the problems of combinatorialexplosion to which search processes are so vulnerable.

    Hill Climbing Algorithm

    1. Evaluate the initial state. If it is also a goal state, then return and quit. Otherwisecontinue with the initial state as the current state.

    2. Loop until a solution is found or until there are no new operators to be appliedin the currents state.

    (a) Select an operator that has not yet been applied to the current state andapply it to produce a new state.

    (b) Evaluate the new statei. If it is a goal state, then return it and quit

    ii. If is not a goal state but it is better than the current state, thenmake it the current state

    iii. If it is not better then the current state, then continue in the loop The Hill climbing is a variant of the generate-and-test search (another search

    technique based on depth-first search) in which feedback from the test procedure is used to help the generator decide which direction to move in thesearch space.

    Used mostly when a good heuristic function is available for evaluating states but no other useful knowledge is available.

    The key difference between this algorithm and the generate-and-test is the useof evaluation function as a way to inject task-specific knowledge into thecontrol process.

    (0,0)

    (4,0) (0,3)

    (1,3) (4,3) (4,3) (3,0)

    Figure 1.3 Search Graph for the Water-Jug Problem

  • 8/8/2019 Combined a.I Note

    14/41

    14

    Simulated Annealing (Kirkpatrick et al. 1983) Algorithm:

    1. Evaluate the initial state. If it is a goal state, then return it and quit. Otherwisecontinue with the initial state as the current state.

    2. Initialize BEST-SO-FAR to the current state

    3. Initialize T according to the annealing schedule4. Loop until a solution is found or until there are no new operators left to beapplied in the current state

    a) Select an operator that has not yet been applied to current state and applyit to produce a new state

    b) Evaluate the new state. Compute E = (value of current) ( value of new state)

    - if the new state is good state, then return it and quit state- if is not a good state but is better than the current state, then make it thecurrent state. Also set

    BEST-SO-FAR to this new state

    - if it is not better than the current state, then make it current state with probability P where (P = e -

    E/T). This step is usually implemented byinvoking a random number generator to produce a number in the range [0,1].If that number is less than P , then the move is accepted. Otherwise donothing.(c) Revise T as necessary according to annealing schedule

    5. Return BEST-SO-FAR as the answer.

    A variation of hill climbing in which at the beginning of the process somedown-hill moves may be made. This is to make the final solution relativelyinsensitive to the starting state.

    This reduces the chances of a local maximum, a plateau or a ridge.o A local maximum is state that is better than all its neighbors but is not better

    than some other states further away. At a local maximum, all moves appear to make things worse. Local maxima are particularly frustrating becausethey often occur almost within sight of solution.

    o A plateau is a flat area of the search space in which a whole set of neighboring states has the same value. On a plateau, it is not possible todetermine the best direction in which to move by making local comparisons.

    o A ridge is a special kind of local maximum. It is an area of the search spacethat that is higher than surrounding area and that itself has slope. ( whichone would like to climb). But the orientation of the high region, compared to

    the set of available moves and the directions in which they move, makes itimpossible to traverse a ridge by single moves. Makes use an objective function as heuristic function, and the objective

    function is minimized. Uses a probability P = e

    E/T E is generalized to represent the change in the value of the objective function,

    T is the temperature.

  • 8/8/2019 Combined a.I Note

    15/41

  • 8/8/2019 Combined a.I Note

    16/41

    16

    OPEN - nodes that have been generated and have had the heuristic functionapplied to them but which have not yet been examined ( i.e. had their successors generated)

    OPEN is actually a priority queue in which the elements with the highest priority are those with the most promising value of the heuristic function.

    (Revise: Standard technique for manipulating queues.) CLOSED- nodes that have already been examined. We need to keep the nodesin memory if we want to search a graph rather than a tree, since whenever a newnode is generated, we need to check whether it has been generated before.

    Algorithm1. Start with OPEN containing just the initial start2. until a goal is found or there are no nodes left on OPEN do:

    (a) Pick the best node on OPEN(b) Generate its successors(c) For each successor do:

    i. If it has not been generated before, evaluate it, add it to OPEN,

    and record its parent.ii. If it has been generated before, change the parent if this new pathis better than the previous one. In that case, update the cost of getting to this node any successors that this node may alreadyhave.

    General Problem Solving ApproachesThere exist quite a large number of problem solving techniques in AI that rely onsearch. The simplest among them is the generate and test method. The algorithm for the generate and test method can be formally stated as follows:

    Procedure Generate & TestBegin

    RepeatGenerate a new state and call it current-state;

    Until current-state = Goal;End.

    It is clear from the above algorithm that the algorithm continues the possibility of exploring a new state in each iteration of the repeat-until loop and exits only when thecurrent state is equal to the goal. Most important part in the algorithm is to generate anew state. This is not an easy task. If generation of new states is not feasible, thealgorithm should be terminated.In our simple algorithm, we, however, did not include this intentionally to keep itsimplified. But how does one generate the states of a problem? To formalize this, wedefine a four tuple, called state space, denoted by

    { nodes, arc, goal, current } ,where

    nodes represent the set of existing states in the search space;

  • 8/8/2019 Combined a.I Note

    17/41

    17

    an arc denotes an operator applied to an existing state to cause transition to another state;

    goal denotes the desired state to be identified in the nodes; andcurrent represents the state, now generated for matching with the goal.

    The state space for most of the search problems we will cover in this chapter takes theform of a tree or graph2.

    Breadth First SearchThe breadth first search algorithm visits the nodes of the tree along its breadth, startingfrom the level with depth 0 to the maximum depth. It can be easily realized with aqueue. For instance, consider the tree, given in Figure 1. Here, the nodes in the tree aretraversed following their ascending ordered labels.

    The algorithm for traversal in a tree by depth first manner can be presented with aqueue as follows:

    Procedure Breadth-first-searchBegin

    i) Place the starting node in a queue;ii) Repeat

    Delete queue to get the front element;If the front element of the queue = goal,return success and stop;

    Else doBegin

    insert the children of the front element,if exist, in any order at the rear end of the queue;

    End

    Fig. 1: The order of traversal in a tree of depth 3 by breadth first manner.

  • 8/8/2019 Combined a.I Note

    18/41

    18

    Until the queue is empty;End.The breadth first search algorithm, presented above, rests on a simple principle . If thecurrent node is not the goal add the offspring of the current in any order to the rear end of the queue and redefine the front element of the queue as the current. The algorithm

    terminates, when the goal is found.

    Fig. 2: First few steps of breadth first search on the tree of fig. 1.

    Time ComplexityFor the sake of analysis, we consider a tree of equal branching factor from each node =

    b and largest depth = d. Since the goal is not located within depth (d-1), the number of false search [1], [2] is given by1+b+b 2 +b 3 + + b d-1 = (b d-1) / (b-1), b>>1. Further, the first state within the fringe nodes could be the goal. On the other hand, the

    goal could be the last visited node in the tree. Thus, on an average, the number of fringenodes visited is given by (1+b d) / 2.Consequently, the total number of nodes visited in an average case becomes

    (bd-1) / (b-1) + (1+b d ) / 2bd (b+1) / 2(b-1).Since the time complexity is proportional to the number of nodes visited, therefore, theabove expression gives a measure of time complexity.

    Space ComplexityThe maximum number of nodes will be placed in the queue, when the leftmost node atdepth d is inspected for comparison with the goal. The queue length under this case

    becomes bd. The space complexity of the algorithm that depends on the queue length,in the worst case, thus, is of theorder of bd. In order to reduce the space requirement, the generate and test algorithm isrealized in an alternative manner, as presented below.

    1

  • 8/8/2019 Combined a.I Note

    19/41

    19

    Depth First SearchThe depth first search generates nodes and compares them with the goal along thelargest depth of the tree and moves up to the parent of the last visited node, only whenno further node can be generated below the last visited node. After moving up to the

    parent, the algorithm attempts to generate a new offspring of the parent node. The

    above principle is employed recursively to each node of a tree in a depth first search.One simple way to realize the recursion in the depth first search algorithm is to employa stack. A stack-based realization of the depth first search algorithm is presented below.

    Procedure Depth first searchBegin1. Push the starting node at the stack,

    pointed to by the stack-top; 2. While stack is not empty do

    BeginPop stack to get stack-top element;

    If stack-top element = goal, returnsuccess and stopElse push the children of the stack-top

    element in any order into the stack;End while ;

    End

    Fig. 3. Depth first search on a tree, where the node numbers denote the order of visitingthat node.

    In the above algorithm, a starting node is placed in the stack, the top of which is pointedto by the stack-top. For examining the node, it is popped out from the stack. If it is the

  • 8/8/2019 Combined a.I Note

    20/41

    20

    goal, the algorithm terminates, else its children are pushed into the stack in any order.The process is continued until the stack is empty. The ascending order of nodes in Fig.3 represents its traversal on the tree by depth first manner. The contents of the stack atthe first few iterations are illustrated below in Fig 4. The arrowhead in the figuredenotes the position of the stack-top.

    Fig. 4: A snapshot of the stack at the first few iterations.

    Space ComplexityMaximum memory in depth first search is required, when we reach the largest depth atthe first time. Assuming that each node has a branching factor b, when a node at depthd is examined, the number of nodes saved in memory are all the unexpanded nodes upto depth d plus the node being examined. Since at each level there are (b-1) unexpandednodes, the total number of memory required = d (b -1) +1. Thus the space complexityof depth first search is a linear function of b, unlike breadth first search, where it is anexponential function of b. This, in fact, is the most useful aspect of the depth firstsearch.

    Time ComplexityIf we find the goal at the leftmost position at depth d, then the number of nodesexamined = (d +1). On the other hand, if we find the goal at the extreme right at depthd, then the number of nodes examined include all the nodes in the tree, which is1+b+b 2 +b 3 ++b d = (b d+1 -1) / (b-1)So, the total number of nodes examined in an average case= (d+1) /2 + (b d+1 -1) / 2(b-1) b( b d + d) / 2 (b -1)This is the average case time complexity of the depth first search algorithm. Since for large depth d, the depth first search requires quite a large runtime, an alternative way tosolve the problem is by controlling the depth of the search tree. Such an algorithm,where the user mentions the initial depth cut-off at each iteration, is called an IterativeDeepening Depth First Search or simply an Iterative deepening search .

  • 8/8/2019 Combined a.I Note

    21/41

    21

    Iterative Deepening SearchWhen the initial depth cut-off is one, it generates only the root node and examines it. If the root node is not the goal, then depth cut-off is set to two and the tree up to depth 2 is

    generated using typical depth first search. Similarly, when the depth cut-off is set to m,the tree is constructed up to depth m by depth first search. One may thus wonder that inan iterative happening search, one has to regenerate all the nodes excluding the fringenodes at the current depth cut-off. Since the number of nodes generated by depth firstsearch up to depth h is

    ( b h+1-1) / (b-1),the total number of nodes expanded in failing searches by an iterative deepening searchwill be

    (d-1) {1 / (b-1) } (bh+1 -1)h=0

    b(bd - d) / (b-1)2.The last pass in the algorithm results in a successful node at depth d, the average timecomplexity of which by typical depth first search is given by

    b( b d + d) / 2 (b -1).Thus the total average time complexity is given by

    b(b d - d) / (b-1) 2 + b( b d + d) / 2 (b -1). (b+1) b d+1 / 2 (b -1) 2.Consequently, the ratio of average time complexity of the iterative deepening search todepth first search is given by{(b+1) b d+1 / 2 (b -1) 2 } : { b d+1 / 2 (b-1)}= (b+1): (b -1).

    The iterative deepening search thus does not take much extra time, when compared tothe typical depth first search. The unnecessary expansion of the entire tree by depth firstsearch, thus, can be avoided by iterative deepening. A formal algorithm of iterativedeepening is presented below.

    Procedure Iterative-deepeningBegin1. Set current depth cutoff =1;2. Put the initial node into a stack, pointed to by stack-top;3. While the stack is not empty and the depth is within the

    given depth cut-off doBeginPop stack to get the stack-top element;if stack-top element = goal, return it and stopelse push the children of the stack-top in any order

    into the stack;End While;

  • 8/8/2019 Combined a.I Note

    22/41

    22

    4. Increment the depth cut-off by 1 and repeatthrough step 2;

    End.

    The breadth first, depth first and the iterative deepening search can be equally used for

    Generate and Test type algorithms. However, while the breadth first search requires anexponential amount of memory, the depth first search calls for memory proportional tothe largest depth of the tree. The iterative deepening, on the other hand, has theadvantage of searching in a depth first manner in an environment of controlled depth of the tree.

    Procedure Hill-ClimbingThe generate and test type of search algorithms presented above only expands thesearch space and examines the existence of the goal in that space. An alternativeapproach to solve the search problems is to employ a function f(x) that would give an

    estimate of the measure of distance of the goal from node x. After f(x) is evaluated atthe possible initial nodes x, the nodes are sorted in ascending order of their functionalvalues and pushed into a stack in the ascending order of their f values. So, the stack-top element has the least f value. It is now popped out and compared with the goal. If the stack-top element is not the goal, then it is expanded and f is measured for each of its children. They are now sorted according to their ascending order of the functionalvalues and then pushed into the stack. If the stack-top element is the goal, the algorithmexits; otherwise the process is continued until the stack becomes empty. Pushing thesorted nodes into the stack adds a depth first flavor to the present algorithm. The hillclimbing algorithm is formally presented below.

    Begin1. Identify possible starting states and measure the distance (f) of their

    closeness with the goal node; Push them in a stack according to theascending order of their f ;

    2. RepeatPop stack to get the stack-top element;If the stack-top element is the goal, announce it and exitElse push its children into the stack in the ascending order of their f values;

    Until the stack is empty;End .

    The hill climbing algorithm too is not free from shortcomings. One common problem istrapping at local maxima at a foothill. When trapped at local maxima, the measure of f at all possible next legal states yield less promising values than the current state. Asecond drawback of the hill climbing is reaching a plateau [2]. Once a state on a plateauis reached, all legal next states will also lie on this surface, making the searchineffective. A new algorithm, called simulated annealing, discussed below could easilysolve the first two problems. Besides the above, another problem that too gives us

  • 8/8/2019 Combined a.I Note

    23/41

    23

    trouble is the traversal along the ridge. A ridge (vide fig. 4.5 ) on many occasions leadsto a local maxima. However, moving along the ridge is not possible by a single step dueto non-availability of appropriate operators. A multiple step of movement is required tosolve this problem.

    Simulated AnnealingAnnealing is a process of metal casting, where the metal is first melted at a hightemperature beyond its melting point and then is allowed to cool down, until it returnsto the solid form. Thus in the physical process of annealing, the hot material graduallyloses energy and finally at one point of time reaches a state of minimum energy. Acommon observation reveals that most physical processes have transitions from higher to lower energy states, but there still remains a small probability that it may cross thevalley of energy states [2] and move up to a energy state, higher than the energy state of the valley. The concept can be verified with a rolling ball. For instance, consider arolling ball that falls from a higher (potential) energy state to a valley and then movesup to a little higher energy state (vide fig. 4.6 ). The probability of such:

    transition to a higher energy state, however, is very small and is given by p = exp (- E / KT)

    where p is the probability of transition from a lower to a higher energy state, Edenotes a positive change in energy, K is the Boltzman constant and T is thetemperature at the current thermal state. For small E, p is higher than the value of p,for large E. This follows intuitively, as w.r.t the example of ball movement, the

    probability of transition to a slightly higher state is more than the probability of transition to a very high state.An obvious question naturally arises: how to realize annealing in search?Readers, at this stage, would remember that the need for simulated annealing is toidentify the direction of search, when the function f yields no better next states than thecurrent state. Under this circumstance, E is computed for all possible legal next statesand p is also evaluated for each such next state by the following formula:

    p = = exp (- E / T)A random number in the closed interval of [0,1] is then computed and p is comparedwith the value of the random number. If p is more, then it is selected for the nexttransition. The parameter T, also called temperature, is gradually decreased in thesearch program. The logic behind this is that as T decreases, p too decreases, therebyallowing the algorithm to terminate at a stable state. The algorithm for simulatedannealing is formally presented below.

    Procedure Simulated AnnealingBegin1. Identify possible starting states and measure the distance (f) of their closeness withthe goal; Push them in a stack according to the ascending order of their f ;2. Repeat

    Pop stack to get stack-top element;If the stack-top element is the goal,

  • 8/8/2019 Combined a.I Note

    24/41

    24

    announce it and exit;Else doBegina) generate children of the stack-top element N andcompute f for each of them;

    b) If measure of f for at least one child of N is improvingThen push those children into stack in ascending order of their f;

    c) If none of the children of N is better in f Then doBegina) select any one of them randomly, compute its p and test whether p exceeds arandomly generated number in the interval [0,1]; If yes, select that state as the nextstate; If no, generateanother alternative legal next state and test in this way until one move can be selected;Replace stack-top element by the selected move (state);

    b) Reduce T slightly; If the reduced value is negative, set it to zero;End ;Until the stack is empty;End .

    The algorithm is similar to hill climbing, if there always exists at least one better nextstate than the state, pointed to by the stack-top. If it fails, then the last begin-end

    bracketed part of the algorithm is invoked. This part corresponds to simulatedannealing. It examines each legal next state one by one, whether the probability of occurrence of the state is higher than the random value in [0,1]. If the answer is yes, thestate is selected, else the next possible state is examined. Hopefully, at least one statewill be found whose probability of occurrence is larger than the randomly generated

    probability.

    Another important point that we did not include in the algorithm is the process of computation of E. It is computed by taking the difference of the value of f of the nextstate and that of the current (stack-top) state.The third point to note is that T should be decreased once a new state with less

    promising value is selected. T is always kept non-negative. When T becomes zero, pwill be zero and thus the probability of transition to any other state will be zero.

    4.3 Heuristic SearchThis section is devoted to solve the search problem by a new technique, called heuristicsearch. The term heuristics stands for thumb rules, i.e., rules which work successfully in many cases but its success is not guaranteed.

    In fact, we would expand nodes by judiciously selecting the more promising nodes,where these nodes are identified by measuring their strength compared to their competitive counterparts with the help of specialized intuitive functions, calledheuristic functions .

  • 8/8/2019 Combined a.I Note

    25/41

    25

    Heuristic search is generally employed for two distinct types of problems: i) forwardreasoning and ii) backward reasoning. We have already discussed that in a forwardreasoning problem we move toward s the goal state from a pre-defined starting state,while in a backward reasoning problem, we move towards the starting state from the

    given goal. The former class of search algorithms, when realized with heuristicfunctions, is generally called heuristic Search for OR-graphs or the Best First search Algorithms . It may be noted that the best first search is a class of algorithms, anddepending on the variation of the performance measuring function it is differentlynamed. One typical member of this class is the algorithm A*. On the other hand, theheuristic backward reasoning algorithms are generally called AND-OR graphsearch algorithms and one ideal member of this class of algorithms is the AO*algorithm. We will start this section with the best first search algorithm.

    4.3.1 Heuristic Search for OR GraphsMost of the forward reasoning problems can be represented by an OR-graph, where a

    node in the graph denotes a problem state and an arc represents an application of a ruleto a current state to cause transition of states. When a number of rules are applicable toa current state, we could select a better state among the children as the next state. Weremember that in hill climbing, we ordered the promising initial states in a sequenceand examined the state occupying the beginning of the list. If it was a goal, thealgorithm was terminated. But, if it was not the goal, it was replaced by its offsprings inany order at the beginning of the list. The hill climbing algorithm thus is not free fromdepth first flavor. In the best first search algorithm to be devised shortly, we start with a

    promising state and generate all its offsprings. The performance (fitness) of each of the nodes is then examined and the most promisingnode, based on its fitness, is selected for expansion. The most promising node is thenexpanded and the fitness of all the newborn children is measured. Now, instead of selecting only from the generated children, all the nodes having no children areexamined and the most promising of these fringe nodes is selected for expansion. Thusunlike hill climbing, the best first search provides a scope of corrections, in case awrong step has been elected earlier. This is the prime advantage of the best first searchalgorithm over hill climbing. The best first search algorithm is formally presented

    below.

    Procedure Best-First-SearchBegin1. Identify possible starting states and measure the distance (f) of their closeness with the goal; Put them in a list L;2. While L is not empty doBegina) Identify the node n from L that has the minimum f; If thereexist more than one node with minimum f, select any one of them(say, n) arbitrarily;

    b) If n is the goalThen return n along with the path from the starting node,

  • 8/8/2019 Combined a.I Note

    26/41

    26

    and exit;Else remove n from L and add all the children of n to the list L,with their labeled paths from the starting node;End While;End .

    As already pointed out, the best first search algorithm is a generic algorithm andrequires many more extra features for its efficient realization.For instance, how we can define f is not explicitly mentioned in the algorithm. Further,what happens if an offspring of the current node is not a fringe node. The A* algorithmto be discussed shortly is a completerealization of the best first algorithm that takes into account these issues in detail. Thefollowing definitions, however, are required for presenting the A* algorithm. These arein order.

    Definition 4.1: A node is called open if the node has been generated and the h (x) has been applied over it but it has not been expanded yet.

    Definition 4.2: A node is called closed if it has been expanded for generatingoffsprings.In order to measure the goodness of a node in A* algorithm, we require two costfunctions: i) heuristic cost and ii) generation cost . The heuristic cost easures thedistance of the current node x with respect to the goal and is denoted by h(x). The costof generating a node x, denoted by g(x), on the other hand measures the distance of node x with respect to the starting node in the graph. The total cost function at node x,denoted by f(x), is the sum of g(x) plus h(x).

  • 8/8/2019 Combined a.I Note

    27/41

    27

    Knowledge & Reasoning

    Issues in Knowledge Representation

    Knowledge are facts that can be exploited by AI programs in order to generate

    good results

    The main entities are:

    Facts: truths in some relevant world

    Representation: the formalism used to describe facts in a way that they can be

    manipulated.

    These two entities should be structured at two levels:

    Knowledge level at which facts are described

    Symbol level at which representation of objects at the knowledge level are

    defined in terms of symbols that can be manipulated by programs.

    Mapping between facts and Representation

    Approaches to Knowledge Representation

    Properties of good knowledge representation system

    Representational Adequacy the ability to represent all the kinds of knowledge

    that are needed in that domain.

    Inferential Adequacy the ability to manipulate the representational structures

    in such a way as to derive new structures corresponding to the new knowledge

    inferred from old.

    Inferential Efficiency the ability to incorporate into the knowledge structures

    additional information that can be used to focus the attention of the inference

    mechanism in the most promising directions.

    Reasoning ProgramFacts Internal Representation

    English Representation

    English Representation English generation

  • 8/8/2019 Combined a.I Note

    28/41

    28

    Acquisitional Efficiency: - the ability to acquire new information easily. It

    should be possible to make direct insertion into the database and the addition of

    new knowledge.

    Types of Knowledge Representation

    Simple relational knowledge using database tables

    Inheritable knowledge provides for property inheritance in which elements

    of specific classes inherits attributes and values from more general classes in

    which they are included. Objects are organized into classes and classes are

    arranged in a generalization hierarchy. [See Fig. 4.5 page 111: Artificial

    Intelligence, Elaine, R. and K. Knight]. Examples include Slot-Filler

    structure like Semantic network and Frames.

    Inferential Knowledge: the power of property inheritance and traditional

    logic (Predicate, Propositional) are combined to generate the inference

    (deductions) that is needed.

    Procedural knowledge: facts are not just static or declarative, procedural

    knowledge specifies what to do and when to do it. This is another kind of

    knowledge that must be represented. Mostly production rules are used to

    represent procedural knowledge. Semantic knowledge - Ontologies, Vocabulary, Thesaurus, Episodic

    knowledge

    Issues in Knowledge Representation

    The following issues cut across the various kinds of real world knowledge.

    Are any attributes of objects so basic that they occur in almost every

    problem domain? If there are, we need to make sure that they are

    handled appropriately in each of the mechanisms we propose. If such

    attributes exist, what are they?

    Are there any important relationships that exist among attributes of

    objects?

  • 8/8/2019 Combined a.I Note

    29/41

    29

    At what level should knowledge be represented? Is there a good set of

    primitives into which knowledge can be broken down? Is it helpful to

    use such primitives?

    How should sets of objects be represented

    Given a large amount of knowledge stored in a database, how can

    relevant parts be accessed when they are needed?

    (Reference E. Rich, K Knight, 1999, Artificial intelligence)

    Answers to Questions

    1. (Yes) instance and isa i.e. class membership and class inclusion are

    common and support inheritance

    2. (Yes)

    a. Inverses: if we can define a relationship from the perspective of

    objects A and B. It is also possible to define also, an inverse

    relationship from the perspective of B to A.

    b. Existence in isa hierarchy: just as there are classes of and

    specializations of attributes.

    There exist techniques for Reasoning about values:

    - information about the type of value

    - constraints on the value- rules for computing values when it is needed

    - rules that describe action that should be taken, if a value ever

    becomes known.

    Single-valued attributes

    - explicit notation to track duplication

    - replacement of an old value

    Representation should be done at a variety of granularities i.e. both the use of low level

    primitives and high-level form should be used. But the particular domain will determine

    which should be more employed.

    Exercise

    Investigate the frame problem

  • 8/8/2019 Combined a.I Note

    30/41

    30

    Knowledge based Agents

    The knowledge base is the central component of a knowledge-based agent.

    A knowledge base is a set of representation of facts about the world, that cause

    the knowledge based agent to adapt to its environment.

    Each individual representation in a knowledge base is called Sentence. The

    sentences are expressed in a knowledge representation language.

    A Knowledge base must have a Tell (assert) and Ask (query) mode

    respectively. In essence knowledge based agent takes a percept as input and

    returns an action.

    The expectation from a knowledge based agent is that when it asked a question,

    the response should follow what it has been told which will be the proof of

    reasoning.

    The knowledge base of a knowledge-based agent is referred to as background

    knowledge. Every time an agent program is called, two things happens

    o 1) it tells the knowledge base what it perceives and

    o 2) it asks the knowledge base what action it should perform.

    In doing this, the concept of logical reasoning is used to determine which action

    is better than others.

    The objective of knowledge representation is to express knowledge in acomputer-traceable form, such that it can be used to help agents perform well.

    A knowledge representation is defined by two aspects:

    o The Syntax of a language which describes the possible

    configuration that can constitute sentences

    o Semantics, which determines the facts in the world to which the

    sentence refer.

    Provided the syntax and semantics are precisely defined, we can call thelanguage a Logic. Also from the syntax and semantics we can derive an

    interface mechanism for an agent using the language.

    An Inference procedure can do one of two things:

    o 1. Given a knowledge base KB, it can generate new sentences that purport to be entailed by KB Or

  • 8/8/2019 Combined a.I Note

    31/41

    31

    o 2. Given a knowledge base KB and another sentence , it can reportwhether or not is entailed by KB. An inference procedure thatgeneralizes sentences is called sound or truth preserving.

    Entailment: In mathematical notation, the relation entailment between a

    knowledge base KB and a sentence is pronounced KB entails and written as:

    KB . An inference procedure i can be described by the sentences it can derive. If i can

    derive from KB, this could be written as KB i . i.e. alpha isderived from KB by i or i derives alpha from KB.

    The record of operation of a sound inference procedure is called a proof.

    An inference procedure is complete if it can find a proof for any sentence that is

    entailed.

    In summary:

    Logic consist of:

    1. A formal system of describing states of affairs, consisting of:

    a) The syntax of the language, which describes how to make sentences,

    b) The semantics of the language, which states the systematic constraints on how

    sentences relates to states of affairs.2. The proof Theory- a set of rules for deducing the entailments of a set of

    sentences.

    Therefore, by examining the semantics of a logical language, we can extract the

    semantics of the language.

    Types of Agent Programs

    Simple Reflex Agent

    Model-based Agent

    Goal-based Agent

    Utility-based Agent

    Learning Agent

  • 8/8/2019 Combined a.I Note

    32/41

    32

    Knowledge Representation with First-Order Logic

    Propositional logic can also be used but it lacks sufficient primitives for

    representing knowledge in a logical way. It has limited ontology, making

    only the commitment that the world consists of facts. But this is not true and

    grossly inadequate.

    Predicate logic (Firs-order logic) makes a stronger set of ontological

    commitments. It has primitives to represent objects, relations properties and

    functions. The main reason for this is that the world consists of an object,

    that is, things with individual identities and properties that distinguish them

    from other objects. Among these objects various relations hold. Some of

    these are functions- relations in which there is one value for a given

    input. Examples of objects, properties, relations and functions include:

    Objects: people, houses, numbers, ball etc.

    Relations: brotherof, biggerthan, inside, part of, hascolor etc.

    Properties: red, round, yellow, prime

    Functions: fatherof, bestfriend etc.

    First-order logic has sentences , but it also has terms, which represents

    objects. Constant symbols, variables and function symbols are used to build

    terms, and quantifiers and predicate symbols are used to build sentences.Syntax of First-Order Logic (with equality in Backus-Nomal Form)

    Sentences Atomic sentences

    | sentence connective sentence

    | Quantifier variable Sentence

    | sentence | (sentence)

    Atomic sentence Predicate ( Term, ) | Term = Term

    Term Function ( Term,) | Constant | Variable

    Connective >| | V |

    Quantifier V |

    Constant A | X 1 | John

    Variable a | x | s

    Predicate Before | Hascolor | Raining | ---

  • 8/8/2019 Combined a.I Note

    33/41

    33

    Function Mother| Leftlegof |

    A term is a logical expression that refers to an object. The formal semantics of

    terms is that an interpretation must specify which object in the world is referred

    to by the function symbol and objects referred to the terms of its arguments.

    Constant symbol : specifies which object is referred to by each constant

    symbol. e.g. A,B,ADE etc.

    Predicate symbol : refer to a particular relation in the model.

    Function symbol : These are functional relation that relates any given object to

    exactly one other object. E.g. any angle has only one number that is its cosine,

    any person has only one person as his or her father.

    Atomic sentences : is formed from a predicate symbol followed by a

    parenthesized list of terms. For example:

    Brother(Richard, John)

    This sentence states Richard is the brother of John.

    Atomic sentences can have arguments that are complex terms e.g.

    Married (fatherof (Richard), Motherof (John))- This states that the father of

    Richard married the mother of John. An atomic sentence is true if the relation

    referred to by the predicate symbols holds between the objects referred to by thearguments.

    Complex sentences: A complex sentence is combination of two or more atomic

    sentences using logical connections e.g.

    Brother(Richard, John) Boss(Richard, John) is true when Richard is the brother of John and Richard is the Boss of John.

    Older(John, 30) => Younger (John, 30) if John is older than 30, then he is

    not younger than 30 Quantifiers: Quantifiers helps to express properties of entire collections of

    objects, rather than having to enumerate objects by name. The two standard

    quantifiers in First-order logic are called Universal and Existential operators

    V cat(x) => mammal(x). This is pronounced for all x, if x is a cat then x is a

    mammal.

  • 8/8/2019 Combined a.I Note

    34/41

    34

    Thus the universal quantifier makes statements about every object in the

    universe. The existential quantifier allows us to make statements about some

    object in the universe without naming it.

    sister (x, spot) cat (x) This is pronounced There exist x, such that x is asister to spot and x is a cat

    Exercise

    Represent the following sentences in first-order logic using a consistent vocabulary you

    must define.

    a) Not all student take both History and Biology

    b) Only one student failed History

    c) Only one student failed both History and Biology

    d) The best score in History was better than the best score in Biology

    e) Every person who dislikes all vegetarian is smart

    f) No person likes smart vegetarians

    g) There is a woman who likes all men who are not vegetarians

    h) There is a barber who shaves all men in town who do not shave themselves

    i) No person likes a Professor unless the professor is smart

    j) Politicians can fool some of the people all of the time, and they can fool all of

    the people some of the time, but they cant fool all of the people all of the time.Sample Solution

    a) V student(x) => (take(x, History) take(x, Biology)) b) V student(x, onlyone) = student_failed(x, history)

    c) V student(x, onlyone) => student_failed(x, history) student_failed(x, biology)

    d) V betterthan(bestscore(x, history), bestscore(x, biology))

    (e) V person(x) dislike(x, vegetarian) smart(x)

  • 8/8/2019 Combined a.I Note

    35/41

    35

    Semantic Networks

    A semantic network or net is a graphic notation for representing knowledge in patternsof interconnected nodes and arcs. Computer implementations of semantic networkswere first developed for artificial intelligence and machine translation, but earlier

    versions have long been used in philosophy, psychology, and linguistics.

    What is common to all semantic networks is a declarative graphic representation thatcan be used either to represent knowledge or to support automated systems for reasoning about knowledge. Some versions are highly informal, but other versions areformally defined systems of logic. Following are six of the most common kinds of semantic networks:

    1. Definitional networks emphasize the subtype or is-a relation between a concepttype and a newly defined subtype. The resulting network, also called ageneralization or subsumption hierarchy, supports the rule of inheritance for

    copying properties defined for a supertype to all of its subtypes. Sincedefinitions are true by definition, the information in these networks is oftenassumed to be necessarily true.

    2. Assertional networks are designed to assert propositions. Unlike definitionalnetworks, the information in an assertional network is assumed to becontingently true, unless it is explicitly marked with a modal operator. Someassertional netwoks have been proposed as models of the conceptual structures underlying natural language semantics.

    3. Implicational networks use implication as the primary relation for connectingnodes. They may be used to represent patterns of beliefs, causality, or inferences.

    4.

    Executable networks include some mechanism, such as marker passing or attached procedures, which can perform inferences, pass messages, or search for patterns and associations.

    5. Learning networks build or extend their representations by acquiring knowledgefrom examples. The new knowledge may change the old network by adding anddeleting nodes and arcs or by modifying numerical values, called weights ,associated with the nodes and arcs.

    6. Hybrid networks combine two or more of the previous techniques, either in asingle network or in separate, but closely interacting networks.

    Some of the networks have been explicitly designed to implement hypotheses about

    human cognitive mechanisms, while others have been designed primarily for computer efficiency.

    Knowledge representation is an issue that arises in both cognitive science andartificial intelligence. In cognitive science it is concerned with how people store and

  • 8/8/2019 Combined a.I Note

    36/41

    36

    process information. In artificial intelligence (AI) the primary aim is to store knowledgeso that programs can process it and achieve the verisimilitude of human intelligence.

    In cognitive theory

    For some authors knowledge is stored either in episodic or semantic memory. Thefurther is organized in spacio-temporal dimensions, the second according semanticcontent-oriented principles, e.g. networks of concepts.

    Long-Term Memory which is a large storage system, stores factual information, procedural rules of behavior, experiential knowledge , in fact everything we know. Wehave two types of long term memory Episodic and semantic memory Episodic memoryrepresents our memory of events and experiences in a serial form. It is from thismemory that we can reconstruct the actual events that took place at a given point in our lives. The second is Semantic memory, which is a structured record of facts, conceptsand skills that we have acquired. The information in semantic memory is derived from

    that in our episodic memory, such that we can learn new facts or concepts from our experiences. Semantic memory is structured in some way to allow access toinformation representation of relationship between pieces of information and inference.One model for the way in which semantic memory is structured is as a network. Itemsare associated to each other in classes and may inherit attributes from parent classes.This model is known as a Semantic network.

    In computer science

    In computer science, a semantic network can be defined as a knowledge representationformalism which describes objects and their relationships in terms of a network

    consisting of labelled arcs and nodes.

    A semantic network is often used as a form of knowledge representation. It is adirected graph consisting of vertices which represent concepts and edges whichrepresent semantic relations between the concepts.

    A semantic network is a knowledge representation tool consisting of aframework of semantically related terms, with the purpose of allowing adefinition of those words through their relationships.

    Most ontologies use a kind of semantic network for knowledge representation.

    The advantages of knowledge representation structure like semantic network over First-

    Order Logic includes the fact that:1. It makes it easy to describe properties of relations2. It is a form of object-oriented programming and has the advantages that such

    systems normally have, including modularity and ease of viewing by people.

  • 8/8/2019 Combined a.I Note

    37/41

    37

    Here is an example of semantic nets:

    Figure 1. Animals-Birds-Tweety

    The major problem with semantic nets is that although the name of this knowledgerepresentation language is semantic nets, there is not, ironically, clear semantics of thevarious network representations. For the above example, it can be interpreted as therepresentation of a specific bird named Tweety, or it can be interpreted as arepresentation of some relationship between Tweety, birds and animals.

    Semantic networks can also include other explicit kind of relationships among conceptand concept types that adequately represent the semantic relationships among entities.As an example, Figure 2 shows a KL-ONE network that defines the concepts Truck and

    TrailerTruck as subtypes of Vehicle.

  • 8/8/2019 Combined a.I Note

    38/41

    38

    Figure 2 has nine ovals for concept nodes and nine arrows, which represent differentkinds of links. The white ovals represent generic concepts for the types, asdistinguished from the shaded oval, which is an individual concept for the instance 18.The oval marked with an asterisk * indicates that Integer is a built-in or primitive type.The concepts Truck and TrailerTruck are defined in Figure 2, but Vehicle, Trailer,WtMeasure, and VolMeasure would have to be defined by other KL-ONE diagrams.

    The double-line arrows represent subtype-supertype links from TrailerTruck to Truck and from Truck to Vehicle. The arrows with a circle in the middle represent roles . TheTruck node has four roles labeled UnloadedWt, MaxGrossWt, CargoCapacity, and

    NumberOfWheels. The TrailerTruck node has two roles, one labeled HasPart and onethat restricts the NumberOfWheels role of Truck to the value 18. The notation v/r at thetarget end of the role arrows indicates value restrictions or type constraints on the

    permissible values for those roles.

    Generally, formal graph notations annotated with specific labels can be used for semantic network representations.

    Exercises

    Show the representation of the following: i) Student ii) Teacher iii) Worker iv)Computer.

    Figure 2: Truck and TrailerTruck in KL-ONE

  • 8/8/2019 Combined a.I Note

    39/41

    39

    Learning

    What is Learning? Learning is the process through an entity acquires knowledge. Machine intelligence is not natural and automatic, an intelligent machine is one

    that exhibits intelligence after a process of learning.Approaches to Learning :

    Symbolic learning : describes systems that formulate and modify rules, facts,and relationships, explicitly represented in words or symbols. In other wordsthey create and modify their own knowledge base.

    Numerical learning : refers to systems that use numerical models, wherecertain techniques are used for optimizing the numerical parameters. Examplesinclude neural networks, genetic algorithms and simulated annealing.

    Learning can be with a teacher in which case it is said too be supervised .Unsupervised learning is learning without a teacher.

    Classification of learning Rote Learning : The system is given confirmation of correct decisions.

    When it produces incorrect decision it is spoon fed with the correct rule or relationship that it should have used.

    Learning from Advice : Rather than being given a specific rule that shouldapply in a given circumstance, the system is given a piece of general advice,such as gas is more likely to escape from a valve than from a pipe. Thesystem must sort out for itself how to move from this high level advice to animmediately usable rule.

    Learning by Induction : The system is presented with sets of example dataand is told the correct conclusion that is should draw from each. The systemcontinually refines its rules and relations so as to correctly handle each newexample.

    Learning by Analogy : The system is told the correct response to a similar, but not identical task. The system must adapt the previous response togenerate a new rule applicable to the new circumstances.

    Explanation-Based Learning (EBL) : The system analyzes a set of examples solutions and their outcomes to determine why each one wassuccessful or otherwise. Explanations are generated, which are used to guidefuture problem solving. An example of an EBL system is PRODIGY ( ageneral purpose problem-solver).

    Case-Based Reasoning (CBR) : Any case about which the system hasreasoned is filed away, together with the outcome, whether it is successfulor otherwise. Whenever a new case is encountered, the system adapts itsstored behaviour to fit the new circumstances.

    Explorative or Unsupervised Learning : This is also called discoverylearning, rather than having an explicit goal, an explorative systemcontinuously searches for patterns and relationships in the input data,

    perhaps marking some patterns as interesting and warranting further investigation. Examples of application of unsupervised learning can befound :

  • 8/8/2019 Combined a.I Note

    40/41

    40

    o data mining : where patterns are sought among large or complexdata sets;

    o Identifying clusters, possibly for compressing the data;o Feature recognition

    2. Expert systems Expert systems are meant to solve real problems which normally would

    require a specialised human expert (such as a doctor or a minerologist). Building an expert system entails extracting the relevant knowledge from

    the human expert and representing the heuristic knowledge in knowledgebase .

    A knowledge engineer has the job of extracting this knowledge and buildingthe expert system. This is called Knowledge Elicitation and Encoding.

    The most widely used knowledge representation scheme for expert systemsis rules (sometimes in combination with frame systems).

    Typically, the rules won't have certain conclusions - there will just be somedegree of certainty that the conclusion will hold if the conditions hold.Statistical techniques are used to determine these certainties. Rule-basedsystems, with or without certainties, are generally easily modifiable andmake it easy to provide reasonably helpful traces of the system's reasoning.These traces can be used in providing explanations of what it is doing.

    Expert systems have been used to solve a wide range of problems indomains such as medicine, mathematics, engineering, geology, computer science, business, law, defence.

    Expert System Architecture

    Figure 1 shows the most important modules that make up a rule-based expert system.The user interacts with the system through a user interface which may use menus,natural language or any other style of interaction). Then an inference engine is used toreason with both the expert knowledge (extracted from our friendly expert) and dataspecific to the particular problem being solved. The expert knowledge will typically bein the form of a set of IF-THEN rules. The case specific data includes both data

    provided by the user and partial conclusions (along with certainty measures) based onthis data. In a simple forward chaining rule-based system the case specific data will bethe elements in working memory .

  • 8/8/2019 Combined a.I Note

    41/41

    Almost all expert systems also have an explanation subsystem , which allows the program to explain its reasoning to the user. Some systems also have a knowledge baseeditor which help the expert or knowledge engineer to easily update and check theknowledge base. `

    One important feature of expert systems is the way they (usually) separate domainspecific knowledge from more general purpose reasoning and representationtechniques. The general purpose bit (in the dotted box in the figure) is referred to as anexpert system shell . As we see in the figure, the shell will provide the inference engine(and knowledge representation scheme), a user interface, an explanation system andsometimes a knowledge base editor. Given a new kind of problem to solve (say, car design), we can usually find a shell that provides the right sort of support for that

    problem, so all we need to do is provide the expert knowledge. There are numerouscommercial expert system shells, each one appropriate for a slightly different range of

    problems. (Expert systems work in industry includes both writing expert system shellsand writing expert systems using shells.) Using shells to write expert systems generallygreatly reduces the cost and time of development (compared with writing the expertsystem from scratch).

    Examples : MYCIN, XCON(R1),PROSPECTOR