Enhanced Social Learning via Trust and Reputation Mechanisms in Multi-agent Systems

Enhanced Social Learning via Trust and Reputation Mechanisms in

Multi-agent Systems

PhD Completion Seminar

Golriz Rezaei

Supervisors: Dr. Michael Kirley

Dr. Shanika Karunasekera

Dept. Computer Science and Software EngineeringThe University of Melbourne, Australia

20 April 2011

Outline

Overview MotivationEnhanced Social LearningResearch Goals / Questions / contributions / publications

BackgroundTrust and Reputation in Multi-agent SystemsTrust and Reputation in Evolutionary Game TheoryEvolutionary Games on Graphs

The Research workFirst ModelSecond ModelThird Model

Concluding Discussion

Acknowledgment and Questions?

Motivation

• Multi-agent Systems (MAS)?• Interacting autonomous agents• Different geographical locations• Varying cognitive / processing abilities• Limited information / partial knowledge

• Perform tasks Receive utility• Difficult tasks Beyond individual agent capacity• Maximise utility Interact (collaboration / resource sharing)

Problem?• Appropriate partners Successful performance Maximise utility• Open dynamic MAS Uncertainty + Partial knowledge

Establishing strategic connections is difficult!

Enhanced Social Learning• Social Learning (biological background)?

• Learning through observation / interaction with others• Knowledge transmission without genetic materials• Acquire knowledge from others without incurring the cost / time

• Major mechanism Imitation (perceive and reproduce behaviour)

• Why good? • keep track of beneficial interaction partners• save time / energy / cost• Improve long term performance (individual / system)

Problem? error-prone / outdated / inappropriate information

• Solution? selective

• When High individual trial-and-error cost

Intermediate environment change rate

• How Mixed with personal innovation

• From whom • Agents are heterogeneous• Appropriate role models Important for performance• Partner selection

Enhanced Social Learning cont.

?

1) Top-down • Plan at design time • Ability of the designer predict optimal connections in advance• Fixed structure of relations (random / particular topology)• Autonomy condition + Environmental condition not realistic

2) Automatic learning • Build and sustained adaptively at run time• Trust & Reputation Formal definition?• Evaluate before interaction Partner selection / Decision making• Relations evolve Partner’s reliability / trustworthiness

Survey in Ch2

Enhanced Social Learning cont.

Evolutionary game theory Concrete App MAS

Coevolutionary Endogenous Social Networks

Dynamic relation formation

Topology Behaviour

Social

ties

Agents’

strategies

Proposed framework

1. Life-experiences

2. Endogenous Evolving

Social Networks

Evaluation

Trust & Reputation

SocialLearning

Enhanced Social Learning

1) Social Dilemma Evolutionary Games

2) Advice-seeking in Distributed Service Provision Applications

?

Research goals and questions

Central hypothesis:

“Does incorporating concepts of trust and reputation within a social learning framework help to enhance the agents’ interactions in a MAS? And

consequently does it help to improve their long term performance?”

1. (Life-experiences / Aging) + (Coevolutionary endogenous social networks) Trust / Reputation? Effective social learning approaches?

2. Encourage cooperation in social dilemmas? Broader perspective of general MAS applications (Advice-Seeking for Resource Discovery in Distributed Service Provision)

3. Impacts of agents’ heterogeneity (behaviour/attributes/preferences)

4. Structural characteristics of the underlying evolved relationship networks?5. Interaction patterns system's behaviour?

Interaction pattern System behaviour

Publications

Life Experiences in Spatial 2-player Prisoners’ Dilemma Game

1. G. Rezaei and M. Kirley (2008). Heterogeneous payoffs and social diversity in the spatial prisoner's dilemma game. In X. Li, M. Kirley, and M. Zhang, editors, Proceedings of 7th International Conference on Simulated Evolution and Learning (SEAL), volume 5361 of Lecture Notes in Computer Science, pages 585--594, Springer.

2. G. Rezaei and M. Kirley (2009). The effects of time varying rewards on the evolution of cooperation. Evolutionary Intelligence, 2(4):207-218.

First Model

Publications cont.

N-player Prisoners' Dilemma Game on an Evolving Social Network

1. G. Rezaei, M. Kirley and J. Pfau (2009). Evolving cooperation in the N-player prisoner's dilemma: A social network model. In K. B. Korb, M. Randall, and T. Hendtlass, editors, Artificial Life: Borrowing from Biology (ACAL), volume 5865 of Lecture Notes in Computer Science, pages 32-42, Springer Verlag, Berlin.

2. An extended version is under preparation (2011).

Distributed Advice-Seeking on an Evolving Social Network

3. G. Rezaei, J. Pfau and M. Kirley (2010). In Distributed Advice-Seeking on an Evolving Social Network. 2010 IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

Second Model

Third Model

Outline






Trust: [Gambetta 1988]

Subjective probability expects performs a given action welfare depends on.

Reputation: Information about an agent’s behavioural history.[Ismail et. al. 2007]

Challenging Confusing Inconsistent

Typology

BackgroundTrust and Reputation in MAS

A B A

Survey in Ch2

Background cont.Typology

Suitable

mechanisms 1) Variety of sources of information 2) Individuals/distributed evaluation 3) Robust against possible lying/fraud

Game Theory (GT)?

Evolutionary GT?

Social Dilemmas? “Cooperation” “Tragedy of the commons”

Autonomous individuals

Theory individuals behave selfishlyNature cooperation exists

Biology, Economics, Sociology (IEEE Trans, Statistical Physics, Nature, CEC, GECCO …)Distributed systems (P2P) (DAI)Crucial for performance of MAS

Abstract framework many real-life scenarios Simple games + rich dynamics Appropriate mathematical tools Study complex Strategic interactive scenarios

act cooperatively contribute to the social welfare

behave selfishly (not investing anything ) enjoy the free benefits shared among all the members

(free-riding)

Mechanisms?

Background cont. Evolutionary Games

Still an open ended question!(AAMAS)

[Hardin 1968]

Why? The most difficult settings for cooperation Robust and fundamental method of modelling Simplicity of statement and design MAS

(2-PD)

2 players / agents 2 choices (C or D) Payoff joint actions Actual values order

Order change game change

i) T > R > P > Sii) 2R >= (T + S)

(D,D) Nash Equilibrium

Background cont.Prisoners’ Dilemma

Trust and Reputation in Evolutionary Games

5 Fundamental mechanisms Evolution of “Cooperation”

Kin selection vs. Group selection Direct Reciprocity

-Iterated encounters

-Return of altruistic act / punishment

-“You scratch my back, I’ll scratch yours!”

Indirect Reciprocity-Unlikely repeated interactions

-Return from third parties

-Image/Reputation score -“You scratch his back, I'll scratch yours!”

Network Reciprocity-Social / spatial constraints Non-uniform / Local neighbourhood interactions

-Clustering effect (community structure) Enhances cooperation

[Nowak 2006]

Compare Trust & Reputation

Background cont.Basics of the Networks

Network graph, G(N, E), N finite set of nodes (vertices) E finite set of edges (links) G represented by N×N adjacency matrix

aij = 1 there is an edge between node i and jaij = 0 otherwise

A graph with 8 vertices and 10 edges Network of computers

Background cont.Topological properties

Degree, ki , of a node

Path length, L average separation between any two nodes

Clustering coefficient, Ci , of a node

probability that two nearest neighbours of a node are also nearest neighbours of each other.

N

jjiE

0

Background cont.Types of Networks

Random uniform probability p

Mathematical objects Comparison only (not good for real social network)

Regular

Not good for real networks

Small-World Regular lattice Random graph

One end of each link rewired small probability p Highly clustered + Short path length

Scale-Free Grow preferential attachment Power-law degree distribution Most nodes very few links, small nodes highly connected

1-D circular 2-D square grid (lattice)transition0 p 1

?

Small-world graph

The same degree

Background cont.Evolutionary Games on Graphs

Local neighbourhood interaction

Population Structure system dynamics

Clusters of cooperators Enhance cooperation

Developmental stages

-scaffolding interaction different types of network topology

-parameters (magnitude rewards/punishments, population size, initial condition, update rules)

-mathematical analysis difficult Computational simulations

Non-uniform interactionsStatic Networks

Socio-biologicalUniform interactions

Non-uniform interactionsDynamic Networks

2-D Grids Realistic Social Net

Outline






Only Decision making No Partner selection Cooperative behaviour

First ModelLife Experiences in Spatial 2-PD Game

Trust & Reputation

SocialLearning

Enhanced Social

Learning

Life-experiences&

Age

1 2 3

4 5

6 7 8

Fixed Network (grid)

Local neighbourhood interaction Moore Accumulates received payoffs Fitness End of each round Imitate

the most successful neighbour (MSN) Clusters of cooperators

outweigh losses against defectors

?

First Model cont.The challenge

Typically “Universal fixed payoff matrix” Hypothesis Introducing “social diversity” alters trajectory of the population

Adaptive rewards (Individual agent strategies + Life-experiences) Given a limited agent life span

MSN (Highest accumulated normalized utility + Older) Role model trustworthiness!

Age αi(t+1) = αi(t) + 1Life-span λi randomly from a uniform distribution [min, max]

(αi(t) == λi dies and replaced by a new random agent)Personal version of payoff matrix updated at each time step based on experience level

Each agent

Update rule

Contributions ?

First Model cont.Adaptive rewards

Update

Where is the payoff values for agent i at time t is the default payoff matrix values T, R, P, S is the magnitude of the rescaled values is the age of agent i at time t is the expected life time of agent i is limiting factor and characterises the uncertainty related to the environment

1)

2)

First Model cont.Scenarios

1. Standard PD Universal fixed Payoffs + Age

2. Homogeneous model Universal fixed Payoffs+Age

3. Heterogeneous model Individual Adaptive Payoffs + Age(3 versions: update 4 elements / update 1 element / update 1 element capped)

What is the equilibrium state?

Coevolution Altruistic behaviour + Non-stationary dynamic rewards

(Het 1) (Het 2) (Het 3)

(HOM)

(S)

First Model cont.Experimental setup

2-D grid (32*32) Implemented in Netlogo 4.0 [Wilensky 2002]

Population initialization (20% C – 80% D) / (50% C – 50% D)

Payoff (small: T=1, R=1, P=0, S=0) / (Big: T=5, R=3, P=1, S=0)

Life-span distributions (λi ) [0,50] / [0,100] / [50,100]

Environmental constraint K [0.1 : 0.025 : 0.2]

Each trial 10000 iterations & All configurations 30 times

Statistical results are reported

First Model cont.Sensitivity to the base payoff values

Payoff (small: T=1, R=1, P=0, S=0) / (Big: T=5, R=3, P=1, S=0)

Standard Homogeneous (HOM)(S)

First Model cont.Heterogeneous vs. Homogeneous

Payoff: (Big: T=5, R=3, P=1, S=0) / Population initialization

(20% C – 80% D) (50% C – 50% D)

First Model cont.Snapshots

Payoff: (Big: T=5, R=3, P=1, S=0) / Population initialization (20%C – 80% D)

(Het 1)

(Het 2)

(Het 3)

(HOM)

Varying size clusters of cooperators (black)

Other extra results for different parameters K, life-span, replacement …

Outline






Decision making Partner selection Coevolution (Interaction network + Individuals’ strategy)

Second ModelN-PD on an Evolving Social Network

Trust & Reputation

SocialLearning

Enhanced Social

Learning

Endogenous Evolving Social Networks

2-PD N-PD Cooperative behaviour in larger groups More difficult ! (N > 2)

Real-world social communities

Fixed underlying network Relaxed Relations evolve over timeLink weights Trust & Reputation

Natural extension of 2-PD

Utility [Boyd and Richerson 1988]

Conditions

Second Model cont.N-player Prisoners’ Dilemma

defection is preferred for individuals

contribution to social welfare is beneficial for the group

Conventional EG (D,D, … all D)

0 cb

Nbc /

Agents play cooperatively form social links (reinforced)

One agent defects breaks his links with the opponents

Second Model cont.Evolving Relations

slow positive / fast negative

Incorporating “social network” into N-player PD

Network evolves by cooperative behaviour

Introducing “cognitive” agents Decision making based on some function of the opponents

Second Model cont.Contribution - Hypothesis

Encourage high levels of cooperation Persist for longer Analyse the state of the underlying network

Second Model cont.Schematic Algorithm

Algorithm: Social network based N-PD modelRequire: Population of agents P, iteration = imax, players N 2

1: for i = 0 to imax do2: G = 0;3: while g = NextGame(P,G, N) do4: G = G {g}5: PlayGame(g)6: AdaptLinks(g)7: end while8: a,b = Random Sample(P)9: CompareUtilityAndSelect(a,b)10: end for

Decision making

Partner selection

First agent Randomly from remaining population

Two Scenarios

(N-1) partners

Second Model cont.Game Formation Partner selection

Randomly from remaining population

From the first agent remaining social contacts probabilistically

Two scenarios (cognitive abilities)

Pure strategy (always cooperate/defect)

Mixed strategy (play probabilistically)

Discriminators function of

Agents receive corresponding payoff based on outcomes (Boyd and Richerson function)

Second Model cont.Game Execution

Decision making

generositygradient

Average links weight

Second Model cont.Snapshots

|P| = 25, N = 3, Defector, Cooperator, Discriminator

Self-organize social ties based on their self-interest

Strategy update cultural evolution

Second Model cont.Scenarios

Partner selection + Decision making (Random matching) (Pure strategy)

Partner selection + Decision making (Social Network game formation) (Pure strategy)

Partner selection + Decision making (Random matching) (Pure strategy + Discriminators)

Partner selection + Decision making (Social Network game formation) (Pure strategy + Discriminators)

Step 1

Step 2

Step 3

Step 4

Population size = 1000 Group sizes = (2, 4, 5, 10, 15, 20) ε = 0.9 Game formation probability b = 5 and c = 3 (payoff values benefit & cost) Pure strategy scenario (50% pure C – 50% pure D) Mixed strategy scenario (33.3% each) α = 1.5 and β = 0.1 (decision function) average 20 independent trials up to 40000 iterations

Second Model cont.Experimental Setup

What is the equilibrium state and network topology?

Second Model cont.Group size vs. Strategy

Step 1 Step 2

Step 3 Step 4

Second Model cont.Emergent Social Networks

ClusteringCoefficient

Step 2

Step 3

Step 4

Second Model cont.Final Degree Distribution

Step 4N=2

Step 4N=5

Cooperation higher degree distribution higher Size & shape depend on N

Outline






Decision making Partner selection Coevolution (Interaction network + System’s behaviour)

Third ModelDistributed Advice-Seeking on an

Evolving Social Network

Trust & Reputation

SocialLearning

Enhanced Social

Learning

Endogenous Evolving Social Networks

Life-experiences

Games Advice-Seeking in Distributed Service ProvisionRelations evolve over time (Link weights Trust & Reputation)

?

Third Model cont.Distributed Infrastructure Technology

Characteristics1) Unknown large environment2) Varieties of selection options3) Users are heterogeneous4) Exact characteristics not available

until accessed, if it is made explicit at all

Ex./ Specialized protein search engines, Netflix

Approaches

1) Individual try & error2) Central registration directory (Brokers, Web Service [Facciorusso et. al. 2003])

3) Advice seeking Direct exchange of “selection advice” beneficial! ex./ Learning [Nunes and Oliveira 2003 ], Distributed Recommender Systems

Unknown

Unknown

Unknown

Unknown

Unknown

Unknown

Unknown

Unknown

Unknown

Unknown

Question?

Social Networks!

Third Model cont.Advice-Seeking

Question:

Heterogeneous individual requirements Whom?

Challenge: Identify other suitable users difficult!

?

- Large number of them - Preferences not publicly available - Not in a position to make their own preferences explicit

Social contacts serve as valuable resources Manage improve long term payoff gains

Third Model cont.Abstract Framework

Agent-based simulation (resources + agents)

Repeatedly

Subjective Utility

Goal = Maximize long term utility, limited selections

Challenge = Identify appropriate resources

Evolving Social Network

- Connect with similar minded Autonomously based on local information only

- Receive advice improve resource selection - Learn their own subjective utility advice accuracy

decide retain / drop the contact - Form new connections Seek referrals

Match? Unknown

Unknown

Unknown

Unknown

Unknown

Unknown

Unknown

Unknown

Unknown

Unknown

Unknown

Third Model cont.What we study?

This capability

Connection network Advice exchange

Agents’ interactions Social relationships

The evolving social network Utility gain

UnknownAffect the match?

How co-evolve?

Improve?

Change?

Algorithm: Evolving Social Network Advice seekingRequire: Population of agents , set of resources , rounds , evolutionary rate , maximum out degree , recommendation threshold t, default edge weight

1: Weighted Graph = InitializeGraph ( , , )2: for r = 1 to do3: for each a in random order ∈ do

4:5: if Random() > then6: AccessResource(a, )7: else8: Query (a, , , t)9: end if

10: if Random() < then 11: AdaptLinks(a, , RANDOM() < , ) 12: end if

Third Model cont. Schematic Algorithm

1-Initialization

2-Exploitation/Exploration

3-Advice selection 4-Assessment *

5-Network Adaptation *

Third Model cont.1-Initialization

Heterogeneous pool of resourcesn-dimensional binary feature vector fr initialized randomly

Heterogeneous agent population n-dimensional binary preference vector pa initialized randomly

Initialize Graph( , , )

2 scenarios: random agents no structural restriction social agents outgoing edges, default weight ( = 0.5)

Selection based on personal knowledge / Query others!

Probabilistic Quality of the agent’s acquired knowledge

Exploit Access the largest utility resource it knows so far Explore Seek advice (resource, utility)

Random agents other random agents

Social agents outgoing edges, social contacts

Third Model cont.2-Exploitation/Exploration

Third Model cont.

A suggestion probabilistically

1. Advisor Link’s weight

2. One of his suggestions Reported utility

Subjective utility of accessed resource• Similarity between pa & fr

• Normalized Hamming distance mapped to [-1,1]

Positive values better than average random selection

Negative values random selection would have done better

3-Advice selection

Third Model cont.

Social agents learn from their interactions adjust the weight of links

Following a particular suggestion

- Positive | ua (r) – urep (r)| < thrdis

- Negative

Adjust the link weight with multiple advisors

- the link weight

- w(a,b) < thrtolerance remove the edge, free slot!

4-Assessment *

Third Model cont.

Social agents

opportunity to change their links probabilistically!

Link to a random agent with default weight

Ask for referrals Trust propagation [Massa and Avesani 2007, Vidal 2005]

5-Network Adaptation *

Third Model cont.Snapshots

Steps 4 & 5 eventually make link with similar preferences Similar-minded community spot beneficial resources faster

Third Model cont.Experimental Setup

Monte-Carlo simulations, various parameter settings

Scenarios (Social agents only and Random agents only)

Population sizes (small = 100, large = 300 agents)

Environmental complexity |R| = (1000, 5000, 10000, 50000)

Heterogeneity |pa| & |fr| = (2, 3, 4, and 5)

First 1000 iterations Average over 30 independent trials (Note! exhaustive exploration will find eventually)

Third Model cont.Basic Model behaviour

Social agents gain higher utilities? (|A| = 100, |pa| & |fr| = 3, |R| = 5000)

Third Model cont.Environmental Complexity

Efficiency of social and random scenarios Facing more complex environments?

|A| = 100 |pa| & |fr| = 3|R| = (1000,5000,10000,50000)

Third Model cont.Analysis the underlying Network

|A| = (100 , 300) / |R| = 5000 / |pa| & |fr| = (2, 3, 4, 5) Modularity Score

Small population Large population

Outline






Efficacy of Enhanced Social learning approaches Agents interactions Individuals’ and System’s long term (utility) performance

Life-experiences + Endogenous Evolving Social Networks Trust and Reputation ESL

First Model (2-PD on Fix Grid Structure): Adaptive rewards Life-experiences / Age

Innovative notion of role model trustworthiness / Heterogeneous social diversity Cooperation

Second Model (N-PD on an Evolving Social Network): Endogenous network formation Partner selection + Decision making (Cooperation)

Emergent Social Networks High average clustering + Broad-Scale heterogeneity

Third Model (Distributed Advice-Seeking for Resource Discovery):Life-experiences + Endogenous network formation Similar minded (appropriate role models)

Strongly connected communities with similar preferences Higher utility

Summary Thesis contributions

Limitations

Generality of Adaptive rewards on Fixed interaction networks2-PD on simple Grid Other classes of games (Hawk-Dove / Stag-Hunt / …)Age attribute Heterogeneity Other concepts? How encourage Cooperation?Simple Grid Other fixed topologies? Effect of different neighbourhood structures

Generality of Adaptive rewards on Evolving Social NetworksDynamic Payoffs N-PD framework Not satisfying! (limited parameter settings)

Extensive analysis Determine why it was not helpful / If it is helpful at all / How?(Ex./ Bigger ranges of life-span / different time scales for update rules + evolution interaction network)

Realistic approaches for Advice-Seeking frameworkGeneric model Inspired by several distributed service provision systems

Synthetic date Set up specific, controlled platform Represent semi-realistic MAS Evaluate performance of the ESL Not solution for particular application!

Exploit such techniques real technological systems real data sets real users preference profiles binary preferences Not realistic!

Dynamic Environment Dynamic relations / Users / Preferences / Resources?

Future workN-PD fixed group sizes + similar for all agentsDynamic group formation + heterogeneous sizes different communities in real-world

Advice-Seeking model similarities with Recommender SystemsDifferent purpose here BUT!

Interesting to Modify and apply in such context Comparison with other models

Enhanced Social Learning Imitation (basic cultural learning)Extend to other methods of MAS learning ex./ Reinforcement Learning

Evolutionary Game Theory + Advice-Seeking Investigation domainsPotential domains (MAS) P2P / Mobile Ad-hoc Networks / Grid Computing

Robustness of the proposed mechanisms Different scales of dynamicity in real-world environment

Acknowledgment

1. Michael, Shanika, Adrian

2. Jens

3. Les, Ed, Leon, Liz, …

4. Agent lab members, Rebecca, …

5. Dept. Computer Sci / Uni Melb

6. Rahil, Leila, Parvin, Toktam, …

7. Lab colleagues (Saeed/Raymond/…)

8. …

Questions?

Thank you

References1) D. Gambetta. Can We Trust Trust? In D. Gambetta, editor, Trust: Making and Breaking Cooperative

Relations, pages 213--237. Basil Blackwell, 1988.2) R. Ismail, A. Jøsang, and C. Boyd. A survey of trust and reputation systems for online service provision.

Decision Support Systems, 43:618644, 2007.3) M. A. Nowak. Five rules for the evolution of cooperation. Science, 314:1560-1563, 2006.4) R. Boyd and P. Richerson. The evolution of reciprocity in sizeable groups. Journal of Theoretical Biology,

132:337--356, 1988.5) C. Facciorusso, S. Field, R. Hauser, Y. Hoffner, R. Humbel, R. Pawlitzek, W. Rjaibi, and C. Siminitz. A Web

Services Matchmaking Engine for Web Services. In E-Commerce and Web Technologies, Lecture Notes in Computer Science, pages 37--49, 2003.

6) L. Nunes and E. Oliveira. Advice-exchange in heterogeneous groups of learning agents. In Proceedings of the second international joint conference on Autonomous agents and multiagent systems, pages 1084--1085, 2003.

7) P. Massa and P. Avesani. Trust-aware recommender systems. In Proceedings of the 2007 ACM conference on Recommender systems, pages 17--24, 2007.

8) J. M. Vidal. A Protocol for a Distributed Recommender System. In J. Sabater R. Falcone, S. Barber and M. Singh, editors, Trusting Agents for Trusting Electronic Societies. Springer, 2005.

9) G. Hardin. The Tragedy of the Commons. Science, 162:1243{1248, 1968.10) U. Wilensky. Modelling Nature's Emergent Patterns with Multi-agent Languages. In Proceedings of

EuroLogo, 2002. NetLogo is a cross-platform multi-agent programmable modelling environment. See http://ccl.northwestern.edu/netlogo/.

Backup Slides

First Model cont.Sensitivity to the magnitude of K


(Het 1)

1)

2)

First Model cont.Sensitivity to the Life-span (λi)


(Het 1)

First Model cont.Sensitivity to the replacement strategy


(Het 1)

Third Model cont. Metrics

Average utility

Average error rate

Efficiency

Third Model cont. The influence of Heterogeneity

Finding similar-minded agents important roleHow heterogeneityin |pa| & |fr| affect the performance of social agents?

|A| = (100 , 300)|R| = 5000|pa| & |fr| = (2, 3, 4, 5)T = 1000

Averaged accumulated utility

Enhanced Social Learning via Trust and Reputation Mechanisms in Multi-agent Systems

Documents

social learning framework

social dilemmas

enhanced social learningsocial

reputation mechanisms

concepts of trust

multiagent systemstrust

observation interaction

high individual trial