Origin0 Rationalagentconceptfromeconomy.0 Utilitytheory:thetheoryofpreferredoutcomes.0 Decisiontheory:thedynamicsofutilitymaximizationinanunpredictableenvironment.
0 Gametheory:thedynamicsofutilitymaximizationwhenparticipantsaffecteachother’sutilityinapredictableway.
Agent0 Agent:
0 Perceivetheenvironmentthroughsensors.0 Actontheenvironmentthroughactuators.0 Theenvironmentcanbenon‐physical.
0 Percept:thesetofperceptionsatsomepointintime.0 Perceptsequence:thesetofaperception‐timepairs.0 Agentfunction:perceptsequence action0 Agentprogram:animplementationofanagentfunction.
0 Agentarchitecture
Rationality0 Arationalbeingconsidersalltheconsequencesofallpossibleactions,andmakestheseconsequencespartofthedecisionprocessesforperformingeachofthoseactions.
0 Givenanenvironmentandaperceptsequence,whatisthe‘best’thingtodo?
0 Performancemeasure:objectiveassessmentofthevalueofsuccessofanarbitraryenvironmentsequence.
RationalagentDependentvariables:1. Priorknowledgeoftheagent.2. Performancemeasureofenvironmentstate
sequence.3. Possibleactionstheagentcanperform.4. Perceptsequenceoftheagent.
0 Informationgathering:performing(3)inordertoenrich(4)andtherebyincrease(1).
0 Learning:increase(1)through(4).0 Autonomy:allof(1)relatesbackto(4).
Taskenvironment0 Fully/partiallyobservable0 Single/multiagent(competitive/cooperative)0 Deterministic/stochastic0 Episodic/sequential0 Static/dynamic/semidynamic0 Discrete/continuous0 Known/unknown
0 Blocksworld:fullyobservable,singleagent,deterministic,episodic,static,knownenvironment.
0 1990’s:partiallyobservable,multiagent,stochastic,sequential,dynamic,continuous,unknownenvironments.
Example
0 Percepts:location(A,B),contents(dirty,clean).0 Actions:left,right,suck,idle.
Table‐driven0 Lowintelligence0 Highcomplexity
0 ThetaskofAIistoimproveonthiscomplexitymetric.
Simplereflexagent0 Nomemory0 Lowcomplexity:thenumberofperceptsforwhichareactionisdefined.
0 Condition‐actionrules
Model‐basedagent
Model‐basedagentInputstodeliberation:0 Currentpercepts0 State:modelorinternalrepresentation.0 Condition‐actionrules.0 Recentactions.
0 Thestateisupdatedbasedonpreviousstate,mostrecentaction,andpercept.
0 Theactionischosenbasedonstateandrules.
Goal‐basedagent
Utility‐basedagent
Utility‐basedagent0 Utilityfunction:internalizationoftheperformancemeasure.
0 Theactionischosenbasedonstate,goal,andcost.
Learningagent
Multiagent0 Cooperation0 Competition0 Swarmintelligence:performancemeasureappliedtocollectivebehavior.
0 Decentralizedrepresentation0 Emergentbehavior
0 Weakemergence:thequalitiesofthesystemarereducibletothesystem'sconstituentparts.
0 Strongemergence:e.g.qualia.0 Theconceptsofutilityandrationalitychange!
Prisoner’sdilemmaPrisonerBsilent PrisonerBbetray
Prisoner Asilent A:0.5,B:0.5 A:10,B:0Prisoner Abetray A:0,B:10 A:5,B:5
Twosuspectsarearrested.Ifonetestifiesagainsttheother(betray)andtheotherremainssilent,thebetrayergoesfreeandthesilentaccomplicereceivesthefull10‐yearsentence.Ifbothremainsilent,bothprisonersaresentencedtoonlysixmonthsforaminorcharge.Ifeachbetrays theother,eachreceivesa5‐yearsentence.Howshouldtheprisonersact?
• Nomatterwhattheotherplayerdoes,a playerwillalwaysgainagreaterpayoffbyplayingdefect.
• Sinceinany situationbetrayingismorebeneficialthanremainingsilent,all rationalplayerswillbetray.