1 Deviating from Traditional Game Theory: Complex Variations in Behavior and Setting Jun Park A literature review submitted for Advanced Topics in Cognitive Science taught by Professor Mary Rigdon assisted by Robert Beddor School of Arts and Sciences Rutgers University – New Brunswick May 4, 2015
Game theory has been a popular area of research due to its wide application in fields that concern social interactions between interdependent agents, and consequently has been repeatedly studied by countless economists, computer scientists, cognitive scientists, and military strategists. The purpose of this paper is to gather and explore recent research on advanced models of game theory, with emphasis on complex and incomplete information scenarios in games, methods for predicting strategies, reinforcement learning, and cost of computations and its effect on behavior.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Deviating from Traditional Game Theory: Complex Variations in Behavior and Setting
Jun Park
A literature review submitted for Advanced Topics in Cognitive Science taught by Professor
Mary Rigdon assisted by Robert Beddor
School of Arts and Sciences
Rutgers University – New Brunswick
May 4, 2015
2
1: INTRODUCTION
1.1 Objective
Game theory has been a popular area of research due to its wide application in fields that
concern social interactions between interdependent agents, and consequently has been repeatedly
studied by countless economists, computer scientists, cognitive scientists, and military
strategists. The purpose of this paper is to gather and explore recent research on advanced
models of game theory, with emphasis on complex and incomplete information scenarios in
games, methods for predicting strategies, reinforcement learning, and cost of computations and
its effect on behavior.
1.2 Basic Background of Game Theory
Game theory is the analysis of interdependent relationships between competitive agents
and the evaluation of strategies for dealing with these situations. Games between agents depend
heavily on how the each agent acts and reacts, and thus a decision-making algorithm that
considers the potential decisions that the opponent can make is essential in solutions to a game
theory problem. Also, an important aspect of game theory to note is that these agents are
considered to be rational agents, which means that they have clear preferences and always aim to
choose the decision that will most benefit themselves. Rational agents can be anything or anyone
that can make decisions based upon the scenario and information given to them, and can be
automata (machines) or humans.
A classic example of a game theory problem is that of the prisoner’s dilemma, which was
first developed by Merrill Flood and Melvin Dresher in 1950. In the scenario, two suspected
criminals referred to as A and B were caught for a minor offense such as vandalism and face
small amount of jail time. However, the investigator suspects that the criminals committed other
3
crimes previously and wants to arrest them for a major offense such as robbery (assuming they
did commit both crimes). The investigator tells each of the criminal of their possible options: if
they both confess, they both get three years; if one confesses but the other denies, the one who
confesses get one year, but the other gets seven years; if they both stay silent, they both receive
two years. In here, the optimal outcome for both criminals two years if both stay silent. However,
knowing each of the possible choices (and assuming these criminals are rational agents), each
criminal will act according to his own self-interest. Then, given B’s possible options (confess or
deny), A will try to confess every time since it works for his favor regardless of what B chooses
to do: if B was to confess, A can deny (receive seven years) or confess (receive three years); if B
was to deny, A can deny (receive two years) or confess (receive one year). Thus, regardless of
what the other criminal chooses to do, the action that the criminal will always follow through is
to confess, and will not deviate from this action in all subsequent cases since it does not benefit
them to do so. This outcome is called the Nash equilibrium, where players have nothing to gain
from deviating from their current strategy after considering the opponents’ actions.
1.3 Types of Game theory and Common Terminologies
Different types of game theory can alter the amount of information shared, the extent to
which one agent can influence another, and the possible rewards or consequences from each
agents’ actions. Game theory is typically split into two branches, cooperative and non-
cooperative. Cooperative games mean that the agents in the game can communicate with
another, while non-cooperative games do not. The prisoner’s dilemma, as explained above, is a
non-cooperative game, since they cannot communicate with each other. Other examples of
games with game theoretic framework include Rock-Paper-Scissors, Ultimatum Games, Dictator
Games, Peace War Games, Dollar Auction Game, and Trust Games, to name a few.
4
Another important game type to note is a negotiation game. Negotiation is a game in
which each player tries to come up with a mutually beneficial solution, but at the same time
acting as a rational agent (thus attempts to exploit the other agents). There are terms in
negotiations that are frequently used in the sources within this paper, which will be briefly
defined.
An agent’s preference, regarding economics, is how an agent values each alternative or
outcome based on the amount of utility the outcome provides or how much the agent feel like
they are being rewarded with what they want (e.g., happiness, satisfaction). Utility is how useful
the outcome of the negotiation will be to the agent. In economics, it is equivalent to how much
people would pay for a company’s product.
Test domain consists of two parts, as defined by Chen and Weiss (2015, p. 2295):
competitiveness and domain size. Competitiveness describes the distances between the issues
that each agents have, meaning, the greater the competitiveness, the harder to reach an
agreement. Domain size refer to the number of possible agreements, and as such, the efficiency
of negotiation tactics is crucial if the agent wants to resolve multiple issues.
Discounting factor is the factor in which the value of the potential profit in a negotiation
decreases as time passes. The discounting factor starts at 1 and goes to 0, and can only decrease
the maximum utility that can be gained from the negotiation. A typical explanation of the
discounting factor is that “a dollar today is worth more than a dollar tomorrow.” Discounting
factor becomes irrelevant for issues that does not concern the future (factor stays at 1 for the
whole negotiation).
Concession (or giving into an agreement) in complex negotiations is strictly dependent on
the type of agreement and the amount of time that has elapsed since the negotiation began. As
5
explained by Williams, Robu, Gerding, and Jennings (2011, p.432), “there exists a trade-off
between conceding quickly, thereby reaching an agreement early with a low utility, versus a
more patient approach, that concedes slowly, therefore reaching an agreement later, but
potentially with a higher utility.” As such, discounting factors play a significant role in
influencing the agents’ behavior, depending on their intention for higher or lower utility
outcome.
1.4 Evaluation of Efficiency of Proposed Strategies, Mechanisms, and Algorithms
Since the introduction of Generic Environment for Negotiation with Intelligent multi-
purpose Usage Simulation (GENIUS) by Hindriks et al. (2009), almost all new negotiation
studies evaluated their agents’ effectiveness through their program. With GENIUS,
experimenters can configure negotiation domains, preference profiles, agent (both human and
automata), and scenarios and provides analytic feedback such as Nash equilibriums, visualization
of data and optimal solutions. The purpose of GENIUS is to facilitate research of game theory
applications in complex negotiations and practical environments.
6
2: RISING RESEARCH TOPICS
2.1 Model Prediction with Incomplete Information
Previous research of game theory applications consists mostly of simple models or
scenarios with predetermined information and preferences of opponent agents in restricted
environments. Since applying game theory to dynamic and interacting games can be very
difficult, researchers often used to over-simplify the games’ conditions and environment to
gather data on the effectiveness of game theory. However, these environments are seldom
encountered in real-world situations, and “real-world decision-makers mistrust these over-
simplifications and thus lack confidence in any presented results” (Collins and Thomas, 2012).
Thus, recent studies have moved on to focusing on experiments with highly complex scenarios to
mimic behavior and to gather experimental data of applied game theory in realistic
environments. As defined by Chen and Weiss (2015, p. 2287), a complex multi-agent system has
three assumptions made about the setting. First, agents have no prior knowledge or information
about their opponents such as their preferences or strategies, and thus cannot use external sources
to configure or adjust their strategies. Second, the negotiation is restricted with time and the
agents do not have information on when the negotiation will end. Thus the agents must consider
their remaining chances to offer exchanges and also realize the fact that the profit or reward
through agreement of the negotiation decreases as time passes by a discounting factor. Third,
each agent has its own private reservation value in which an offer below the threshold will not be
accepted, and can at least achieve that amount by ending the negotiation before the time-
discounting effect diminishes the profit to the reservation value. These assumptions create
complicated interaction models between the agents, but also serve to create more accurate
datasets for realistic situations.
7
The study of complex and multi-agent negotiations is becoming a growing area for
research in game theory since the inception of negotiation tactics (i.e., Raiffa, 1982). As such, a
difficult problem that frequently arises when facing complex scenarios is how to compensate for
the lack of prior knowledge of the opponent agent in choosing one’s strategy. Without knowing
the opponents’ strategies, preferences, or reservation values, an agent cannot rely on a dominant
strategy to gain advantages and instead must present an adaptive behavior throughout the game.
Consequently, recent research have been focused on finding methods to model the
opponent’s behavior within the negotiation itself, along with its preferences and reservation
values. For instance, Lin, Kraus, Wilkenfield, and Barry (2008) proposed an automated
negotiating agent that can achieve significantly better results in agreements and in its utility
capabilities than a human agent playing that role. Lin et al. (2008) based the agent’s model on
Bayesian learning algorithms to refine a reasoning model based on a belief and decision updating
mechanism. For the purpose of this paper, the algorithm will be explained in a grossly simplified
perspective. The main algorithm for the agent is split into two major components – “the decision
making valuation component” and the “Bayesian updating rule component” (Lin et al., 2008, p.
827). The former inputs the agent’s utility function, i.e., a function to calculate utility or
potential, as well as a formed conjecture of the type of opponent that the agent is facing. Both
data are used to generate or accept an offer, based on how the inputs compare to a set of
probabilities describing if the rival will accept them, or how the potential reward and loss relate
to the reservation value. The ladder component updates the agent’s beliefs about the opponent as
well as itself based on Bayes’ theorem of conditional probabilities. The two combine to create a
working design mechanism that shows consistently successful results. Lin et al. (2008)
experimented their mechanism in three experiments in which they matched their automated agent
8
against humans, a different automated agent following an equilibrium strategy, and itself. The
results demonstrate a superior performance of the agent against humans in any given scenario in
relation to their utility values, and thus the researchers claimed to have succeeded in developing
an agent that can negotiate efficiently with humans with incomplete information.
Similar line of research have been taken by others to develop agent mechanisms that can
learn the opponent’s strategy in complex negotiations. Williams, Robu, Gerding, and Jennings
(2011) aimed to develop a profitable concession strategy using Gaussian processes to find
optimal expected utility in a complex (as defined above) and realistic setting. The algorithm for
the strategy is separated into three stages – 1) predicting the opponent’s concession, 2) setting the
concession rate for optimal expected utility, and 3) selecting an offer to generate or choose
(Williams et al., 2011, p. 434). The estimation process for the opponent’s concession or expected
utility is handled by a Gaussian process with covariance and mean functions, which provide
predictions and a measure of confidence of that prediction within real-time time constraints. The
process is evaluated for each particular time window of some duration to reduce both the effect
of noise and amount of input data (Williams et al., 2011, p.434). After the prediction is
generated, the strategy utilizes Gaussian processes to set the concession rate by simulating the
probability that an offer of a given utility will be accepted by the opponent. When a target utility
is generated after the concession rate is set, the strategy creates a range from the target utility, for
instance, [µ – 0.025, µ +0.025], and chooses a random value within the range as an offer. If the
offer is not available within the range, the range of the offer increases until an offer is accepted
(Williams et al., 2011, p. 435). The results, after being compared to the performance of the
negotiating agents from the 2010 Automated Negotiating Agents Competition, indicate that their
new strategy is superior in terms of accuracy and efficiency. For concrete numbers, an analysis
9
done in self-play scores show that this new strategy achieved 91.9 ± 0.7 % of the maximum
score possible, with the next distinct score achieved by IAMhaggler2010 (previous leading
strategy developed by Williams et al. in 2010) with 71.1 ± 3.4 % of the maximum score
(Williams et al., 2011, p. 437).
However, in these previous research attempts to solve complex negotiation scenarios,
there are cases where inacceptable assumptions were made which may have overestimated the
agents’ true effectiveness in real-life negotiations. In their most recent work, Chen and Weiss
(2015) pointed out two issues that were not fully addressed in previous studies. One issue was
that the way that the agents learn their opponents’ strategies is flawed, and the other issue was
that there is no effective existing decision-making algorithm for complex negotiations to concede
to the opponent’s offers. Previous researchers undermined or overlooked the first issue as their
methods of modeling was computationally cost-ineffective (which will be discussed in another
section) or they made impractical assumptions about the opponent before the negotiation began.
For example, in the research performed by Lin et al. (2008), the proposed Bayesian learning
method started with out-of-scope assumptions, as Chen and Weiss (2015, p. 2288) pointed out
that the experiment “assumed that the set of possible opponent profiles is known as a priori.”
This convenient assumption allowed comparative analysis to be made within the negotiation to
determine which profile the opponent resembles, but such assumption goes against the
conditions set as a complex scenario in which no prior knowledge about the opponent is brought
into the negotiation. Furthermore, the method fails to compensate for profiles that are not
contained within the premade set, and thus only produces results based on the model that is the
most similar to one of the set profiles. However, for realistic applications, the model proposed by
Lin et al. (2008) can be deemed appropriate, since a human agent could have prior experience
10
with other agents and have learned their possible profiles. Hao et al. (2014, p. 46) also pointed
out that Saha et al. (2005) made their strategy under the assumption that agents negotiate over a
single item, rather than multiple items like in practical negotiation settings, which diminished the
complexity of the setting.
In addition, in the experiment done by Williams et al. (2011), Chen and Weiss (2015)
noticed that while optimizing its own expected utility based on the opponents’ strategy provides
strong results, the strategy suffers from the problem of “irrational concession.” Most likely, the
time window factor creates opportunity for errors, as Williams et al (2011, p. 434) stated, “small
change in utility for the opponent can be observed as a large change by our agent.” Thus, the
agent can misinterpret the opponent agent’s intentions (especially against sophisticated agents),
which can lead to the formation of high risk behavior that can potentially backfire on the agent
by conceding too quickly.
Therefore, Chen and Weiss (2015) claimed that the current methods of learning opponent
models is too inflexible or inefficient, and proposed a new adaptive model called the OMAC*
(opponent modeling and adaptive concession) that carefully accounts for all the aforementioned
errors. The OMAC* model contains an improved version of Gaussian processing algorithm with
the addition of discrete wavelet transformation (DWT) analysis for its opponent-modeling
component. Discrete wave transformation analysis allows noise filtering and function
approximation among others (Ruch and Fleet, 2009) through repeated representation of a signal
and its frequency. In relevance to the OMAC*, DWT allows the agent to extract “previous main
trends of an ongoing negotiation” with smoother curves and more accurate mean approximations
as time elapses (Chen and Weiss, 2015, p. 2290). Combined with Gaussian processes, DWT
analysis have shown one of the highest capabilities to predict opponent’s behavior in real-time.
11
As for the OMAC*’s decision-making component, the researchers implemented an adaptive
concession-making mechanism which relies on the learnt opponent model. The component
handles how the agent will behave in response to counter-offers and also decides when the agent
should withdraw from the negotiation. The results (Chen and Weiss, 2015, p. 2298), through
experimental and empirical game theoretical analysis, suggest that the OMAC* outperforms all
previous agents, including IAMhaggler2011 (the new agent by Williams et al.) by a large margin