7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006 http://slidepdf.com/reader/full/knowledge-and-strategy-based-computer-player-for-texas-holdem-pokerhold2006 1/102 Knowledge and Strategy-based Computer Player for Texas Hold'em Poker Ankur Chopra Master of Science School of Informatics University of Edinburgh 2006
102
Embed
Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
I would like to thank Dr. Jessica Chen-Burger for her overwhelming support and help
throughout the life-cycle of this project, and for the late nights she spent playing my Poker Players. I would also like to thank Mr. Richard Carter for his insight into the workings of
some of the Poker players, and all the authors of the research quoted in my bibliography,
especially the creators of Gala, Loki, Poki and PsOpti.
I would also like to thank my parents, who have always been there to me, and inspire me
every step of the way. And finally, I would like to acknowledge the calming contribution of
my lab-fellows, without whom, completing this dissertation couldn't have been nearly this
much fun.
iii
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
4.2.3 - General Strategic Behavior.................................................................................24
4.3 - Anki – V1.........................................................................................................254.3.1 - Strategy based player..........................................................................................25
4.3.2 - Overview of functioning and method.................................................................26
4.3.3 - Probability realisation of all possible starting hands...........................................28
4.3.4 - Grouping form of strategy..................................................................................30
4.3.5 - Similarities and differences from human beings.................................................31
v
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
Perfect Information Games have a well-defined optimal move, i.e. there is always a move which is
at least as good as some other move. In addition to this, if the opponent gains knowledge of this
move, it would make no difference to the current play, as it is by definition, the optimal move.
This property is exploited by search algorithms to find that optimal move in the game tree.
2.1 - Imperfect Information games
“Although game-tree search works well in perfect-information games, there are problems in trying to use it for imperfect information games such as bridge (and poker). The lack of knowledge about the opponents' possible moves gives the gametree a very large branching factor, making the game tree search infeasible.” [4]
Imperfect Information Games are played with a constant knowledge-gap between players with
both having partial knowledge of the game state. Unlike Perfect Information Games, a
deterministic strategy cannot be utilised in these games as that would allow the opponent to havean advantage of guessing the complete knowledge of the game. For this reason, optimal strategies
in Imperfect Information Games can be described as a random combination of strategies with
optimal evaluation of the partially known game state.
“Kuhn ([19]) has shown for a simplified poker game that the optimal strategydoes, indeed, use randomization.” [1]
2.2 - Poker history
It is evident from the previous citation that people have been interested in computer poker players
for a very long time. Apart from Kuhn, illustrious mathematicians and economists have also
worked on theoretical and practical solutions for poker. This research field dates back to John von
Neumann and John Nash, who used simplified poker to illustrate fundamental principles
[16][17][18].
With such a long history of focus on Imperfect Information Games, especially Poker, it would be
expected that the current computer players at least be of commendable standard. However, this isnot the case. The most prolific players have all been created under the same umbrella of Darse
Billings' work. Also, the best full-game Poker players are intermediate quality at best. The
4
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
This statement of Koller and Pfeffer and been repeated many times in the various papers by Darse
Billings, e.g. [10].
2.4 - Loki
Loki was one of the first ever full-game computer poker players. It was developed by DarseBillings et. al., and is documented in [11]. It was created in two stages. Both versions have an
expert knowledge and rule-based engine hard-coded into them, i.e. the experience of Darse
Billings as a master poker player facilitated the formation of a form of game tree. This game tree
consisted of scenarios and strategies suggested by the expert, and thus the game engine could
brute-search through these strategies to obtain a near-optimal or random solution.
The first version of the program relied solely on this expert knowledge to play against other
players. It can be argued if this was truly an AI player or just the code representation of a
particular master's player. Strategies were given priority rankings, and were randomly selected
using weights, this allowed them to be relatively random.
The first version, created on 1998, was shown to perform well solely against beginners, as even
with the expert system, certain situations resulted in near deterministic working. As discussed
earlier, any form of fixed strategy by an opponent can be used by the player to their advantage,
and this occurred with Loki when it played against more experienced players. Figure 2 shows the
basic Loki Architecture, and its functions.
7
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
The second version, updated in 1999, added more computing concepts into the player. It,
however, still kept the core expert engine,
“ Loki uses expert knowledge in four places:1. Pre-computed tables of expected income rates are used to evaluate its
hand before the pre-flop , and to assign initial weight probabilities for thevarious possible opponent hands.
2. The Opponent Modeler applies re-weighting rules to modify the opponent hand weights based on the previous weights, new cards on the board,opponent betting actions, and other contextual information.
3. The Hand Evaluator uses enumeration techniques to compute hand strength and hand potential for any hand based on the game state and theopponent model.
4. The Bettor uses a set of expert-defined rules and a hand assessment provided by the Hand Evaluator for each betting decision: fold, call/check or raise/bet.” [13]
In the above statement, there is a mention of 'hand strength' and 'hand potential'. Hand Strength is
the probability that the cards held by a player are the best possible cards in the game. On the other
8
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
Poki, the step up from Loki, introduced the strategy of Simulation and Enumeration to the
computer poker players. It was created in 1999 – 2000, once again by Darse Billings et. al., [10].
Poki is also currently the best full-game Texas Hold'em Poker player, the variant of poker under
consideration in this project. Once again, more information is available in Chapter 3 regarding
Texas Hold'em.
“Poki supports a simulation-based betting strategy. It consists of playing out many likely scenarios, keeping track of how much each decision will win or lose. Every time it faces a decision, Poki invokes the Simulator to get an estimate of theexpected value (EV) of each betting action. ... A single trial consists of playing out the hand from the current state of the game through to the end. Many trials produce a full information simulation.” [10]
Enumeration, on the other hand, refers to the updating of the current belief or partial information
state, with data received through interaction from the outside world. In the case of any card game,this can be implemented in the form of increasing the apparent chance of good cards with the
opponent, if the opponent continues to bet excessive money during the game. This requires some
form of an opponent modeler, which tries to guess an opponent's hand or strategy through an
opponent's actions.
Figure 4 shows the architecture of Poki. It can be seen how it is very similar to that of Loki – 2 in
Figure 3. The most major upgrade that resulted in a different name was concerning the opponent
modeling.
10
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
“A game-theoretic solution produces a sophisticated strategy that is well-balanced in all respects. It is also safe and "robust", because it is guaranteed to obtain thetheoretical (optimal) value of the game against *any* other strategy.“ [1]
This value allows the player to play a pseudo-optimal game throughout its tournament. As it finds
the value against any game, this also holds true for randomised, aggressive and bluffing game-
play, all of which exist to a very high extent in master level poker play. Figure 5 shows PsOpti's
player against a master level player 'the count'. The unit of measure used here is 'small bets won',
which can basically be replaced in this case by 10s of dollars.
Figure 5. “the count's” performance against PsOpti
However, this player is severely restricted in its game play.
“... abstraction techniques are combined to represent the game of 2- player Hold’ em, having size O(1018 ), using closely related models each having size O(107 ) .” [2]
Some of these abstraction techniques remove the possibility of PsOpti being utilised in the full-
game field in its current state. For example, one of the abstractions removes one of the betting
rounds from the game of play, and another restricts the play to only a two-player game. A two-
player game can be viewed as a simplified version of a full-game, wherein upto ten people can be
playing at the same time.
12
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
This can still be seen as a great beginning, as all the abstractions are mostly computational
reductions, and such players can easily be scaled up once they have proven their might.
“The drawback is that this type of strategy is fixed -- it can't adapt to the style of a particular opponent. Although it will break even against any opponent, it might only
win at a slow rate against a weak or predictable player.” [1]
Another manner of interpreting this statement is that the game-theoretic player evaluates its
current state and the state of the game, to try and obtain a near-optimal strategy. It has has no
opponent modeling and thus cannot take advantage of human inexperience or mistakes. Opponent
Modeling cannot be expressed in a similar manner as a game-theoretic approach; and thus the
creators of PsOpti are currently looking into a player which plays game-theoretic till it develops
enough knowledge about an opponent, so that it can switch to more sub-optimal strategies. These
are strategies, which do not necessarily give the best result against an optimal player, but are the
expertly determined best response to the manner in which the opponent is playing.
Many of the abstractions used in PsOpti's creation are also being used to create the program of
this project. This is because these abstractions offer a smaller set of constraints to satisfy, which is
a commodity in a time constrained project such as this. Also, PsOpti's creation shows how these
abstractions are mere computational releases of regulations, and offer a faster cycle of design,
prototyping and results which are still relevant to the full-game analysis.
The next chapter introduces the game of poker, Texas Hold'em, and its strategies and complexity.
13
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
3.2 - Sequence of each game (specific to Texas Hold'em)
“ ... the game of Texas Hold’em, the poker variation used to determine the world champion in the annual World Series of Poker. Hold’em is generally considered to be the most strategically complex poker variant that is widely played in casinosand card clubs. It is also convenient because it has particularly simple rules and
logistics.” [10]
The popularity of Poker has already been discussed in Chapter 1, so the choice of Texas Hold'em
should be obvious, as it is seen as the most popular form of the game.
Texas Hold'em starts with all the players receiving a two-card initial hand. These cards are hidden
from all the players bar the one to whom they are dealt. With deception or future potential of
cards in mind, a betting round is held. The exact semantics of a betting round are explained in the
next sub-section. At the end of the betting round, three community cards are dealt face-up. Incomputer play, this is represented by making the knowledge of the three cards available to both
players. These three cards are called the 'Flop'.
The Flop is followed by another round of betting, at the end of which another community card is
shown. This card is known as the 'Turn'. This is followed by another round of betting and then
another community card called the 'River' is revealed. The final betting round takes place
afterwards, and if more than one person remains till the end, the seven cards are checked for the
best hand of five. The winner takes all the money collected in the betting rounds of that game.
3.3 - Betting Rounds
The aim of the betting round is to collect equal money from all players before proceeding to the
next stage, be it the revealing of community cards or a showdown. The options available to a
player at any point in the round is fold and bet . Folding results in immediate forfeit of the money
in the pot. Bad as it may sound, it is usually better to fold a hand which you are quite certain
would not win, rather than lose money on it.
Betting puts in a 'bet amount' into the pot. In this project this bet-amount is fixed at 10 units of
money. There is another option available to the player, i.e. to check . Checking can be done when
15
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
there has been no money put down at the start of a betting round. So, at the start of any betting
round, the person with the first turn has the option to check, and after that the second person has
the option to check, bet or fold.
The final option available to a player is that of a raise. Raising puts 20 units of money into the pot
and is almost the opposite of checking, as it is allowed only when there has already been a bet on
the table. Also, raising is restricted to a maximum of three times per player.
Betting rounds can follow many patterns from the choices available above. For a two player game,
an example is both players checking, in which case neither player puts any money into the pot.
Another is that of both players betting, whereby both put in 10 units each. Certain complex
betting rounds can also occur, such as check, bet, raise, raise, bet. Here, the first player checked,
only to have the second player bet. The first player now has the option of betting, folding or
raising, and he chooses the latter most. The second player re-raises and finally the first player betsto bring the contribution of both players at the end of the betting round to 30 units each.
A betting round terminates as soon as a person folds, in which case the game ends, or when a
person matches the opponent's contribution to the pot by checking or betting. Raising is usually
used to 'raise-the-stakes'.
3.4 - Winning combinations
The strength of a winning five-card hand is determined by the table shown in Figure 6. The top-
most 'Royal Flush' is considered the best hand in the game, whereas a 'High Card' is the worst.
When determining a winner at the showdown, the player with the pattern which is highest on the
table in Figure 6 wins.
16
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
Figure 6. Winning combinations in Texas Hold'em Poker
Certain circumstances require further evaluation to find the winner of a hand. In such a case, the
entire best-five hand is seen. For example, if by some strange luck, the community cards are the
four Jacks and an 8, then the best five-card hand for each player is going to be the four jacks, as
royal and straight flush and not possible, followed by the next highest card. It is this card that
determines the winner of the game. If Player 1 has King-Four, and Player 2 has Queen-Ten, then
the winner is Player 1, as his 'Four Jacks with King kicker' beat Player 2's 'Four Jacks with Queen
kicker'. As is the case with Four of a Kind, sometimes, the second or third kickers are seen to
determine the winner, but never beyond the best-five hand of a player.
Draws occur in poker usually when both player 'play-the-board', i.e. the best-five hand for both
players is actually the community cards. In this case, as the ranking of all the five cards is the
same for both players, the pot is split halfway between the two players.
3.5 - Basic player/strategy types
“There are several different ways to categorise the playing style of a particular player. When considering the ratio of raises to calls a player may be classified asaggressive , moderate or passive ( conservative). Aggressive means that the player
17
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
frequently bets or raises rather than checking or calling (more than the norm),while passive means the opposite. Another simple set of categories is loose ,moderate and tight . A tight player will play fewer hands than the norm, and tend to fold in marginal situations, while a loose player is the opposite. Players may beclassified as differently for pre-flop and post-flop play” [3]
Pre-flop refers to the betting round held before the flop is revealed, whereas post-flop refers to the
rest of the game after the display of the flop. A differentiation is usually made between these
stages as the amount of information change is very great. Initially, each player has knowledge of 2
of the 4 cards that have been dealt, and after the flop, each player has knowledge of 5 of the 7
cards in play.
The different strategies mentioned above also result in a specific form of reaction from the
opponent. Games against aggressive or loose players would be worth a lot more, and at the same
time, raising by an aggressive player may not always mean that the player has a good hand. On
the contrary, raising by a tight or conservative player should be considered with greater concern.
Apart from these basic player types, there are also known to exist many complex strategies which
make the game of Poker both interesting and exceedingly hard.
3.6 - Advanced strategies and Poker complexity
“Poker is a complex game, with many different aspects, from mathematics and
hidden information to human psychology and motivation. To master the game, a player must handle all of them at least adequately, and excel in most.” [10]
This is the task being handed to a computer player to successfully excel in the game of Poker.
Quite clearly, this is a huge task, and it probably would not be accomplished in the next few
years. Mostly this is due to the inability of computing players to have a gut instinct or the ability
to rapidly change one's strategy to combat another's. This topic is discussed further in Future
Work in Chapter 6.
Computing players have to deal with a lot of advanced strategies like check-raising, whereby a
false impression is imparted on the opponent by checking on a good hand to raise when the turn
comes back to the player. This would usually cause the opponent to at least respond with a bet,
18
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
and thereby allow the current Player to extract more money from the opponent. In addition to this,
a check-raise may be made only in order to scare an opponent, as it is a well-known aggressive
strategy.
The best player to date, i.e. PsOpti incorporates no opponent modeling.
“It is important to understand that a game-theoretic optimal player is, in principle, not designed to win. Its purpose is to not lose. An implicit assumption isthat the opponent is also playing optimally, and nothing can be gained byobserving the opponent for patterns or weaknesses.” [2]
It is in the face of these challenges that this project hopes to show a brave front and come up with
some important conclusions that may assist in furthering the field of AI poker players.
3.7 - Abstraction Techniques of 2-Person Bet-Limit Poker The form of poker being considered for design and evaluation is that of two-player, bet-limit
Texas Hold'em Poker. These abstractions have been made as they allow a player to be built under
the required time constraints, and yet have conclusions which are applicable to full-scale poker
players.
Bet-limit poker is a form of poker whereby each bet is of a set limit, i.e. 10 money units in this
case. Texas Hold'em tournaments are usually held with no-limit poker, whereby a player can
dedicate his/her entire money on the first bet of the first game itself. Bet-limit poker provides both
a regulated and a beginner level of understanding, which can be matched by the scope of this
project.
19
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
scenario of a tournament is lost in two-player poker. For example, every player that is required to
put in a big blind that player also gets to be the final player to bet in that betting round. Thus the
pros and cons seem to balance each other out, a system which would be redundant in a smaller
table of two-player poker. Also, the program designed for this project has very limited opponent
modeling, which makes the changing of sequence of play unnecessary as a player's strategy
doesn't effect the betting decision of the AI Player. For this reason, the blind system has been
replaced with an ante. An ante is a fixed amount of money contributed by each player at the start
of each game, due to the same reason as blinds. It is worth a bet amount, i.e. 10 units of money in
this case.
Figure 7 shows a clear view of the playing sequence of each game, in the form of cards and
Betting Rounds. The Start represents the submission of ante, this is followed by the individual
hands being allocated. The next step on the board is the revealing of the Flop, however, there is a
betting round, known as Pre-Betting Round that is played before the Flop is revealed. Similarly,the Post-Flop and Post-Turn Betting Rounds sit between the Flop and Turn, and the Turn and the
River respectively. The tournament continues onto the Post-River Betting Round, which is also
called Final Betting Round sometimes. The final step, i.e. the Showdown represents the
calculation of the winner/s of the game and the distribution of finances to that/those players.
21
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
Basic player strategies have been discussed in Section 3.5, and the keywords such as aggressive,
conservative, tight and loose will be used very frequently in this thesis. These keywords are used
to describe the playing strategies of a poker player, and the basic aim of such a player is to
randomise these strategies as much as possible. Sticking to just one of the strategies can allow the
opponent to recognise it and respond accordingly. The first step towards finding a near-optimal
poker strategy is to create poker players which utilise these basic strategies strictly or randomly,
and then to compare their performances. The comparison is given in Chapter 5, Testing and
Evaluation, and the detail of the players created is provided below.
Most of the previous research in the field also points towards this form of project cycle, whereby
known bad or random players are created, and newer versions of the actual AI players are played
against these players to obtain performance data. [2]
The single most important decision in Poker can be stated as “knowing when to play your
hand”. The keywords associated with this feature are tight and loose, where tightness signifies
fewer hand plays and loose signifies a more liberal approach. Based on these keywords, two
players were created.
The first type of player randomly chose between acting loose and tight when it was it's turn to act.
This player was coined 'Random - 1'. In addition to having a randomised betting strategy, it also
had a randomised strategy to determine its aggressiveness. So, in theory, this Random - 1 was acompletely random player, which could fold, bet, check or raise at any time; the final two options
being restricted under the relevant circumstances.
The second type of player chose a randomised aggression but has a strictly loose policy. It played
every hand that it was dealt, randomly choosing how much money it wanted to bet on it. This
player is called 'Random - 2'. As the policy of the player forces it to be completely loose, it never
folds a hand, and bets and raises quite frequently. Thus, it is also seen as a more aggressive player
by humans.
24
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
The evaluation before each betting round is done by looking at the 'type of hand' that the player
has. 'Type of Hand' here refers to both the current hand strength and the future potential to have a
strong hand. This is done by grouping together similar 'types of hands'. There are check, bet and
raise 'buckets' or groups, and by matching the current hand to the grouping, the final betting
strategy is decided.
4.3.2 - Overview of functioning and method
“The most important method of abstraction for the computation of our pseudo-optimal strategies is called bucketing . This is an extension of the natural and intuitive concept that has been applied many times in previous research[22][23][14]. The set of all possible hands is partitioned into equivalence classes(called buckets or bins ). A many-to-one mapping function determines which handswill be grouped together. Ideally, the hands should be grouped according to strategicsimilarity , meaning that they can all be played in a similar manner without much
loss in EV (Expected Value).” [2]
Before the working of the code and player is understood, there is a need to express the method by
which the above mentioned buckets have been created. The most difficult time to correctly judge
a betting strategy is that of the pre-flop betting round. This is the time when the player has the
least percentage of information available to him/her, only 50%, as only two of four cards are
visible, compared to the end, where 7 of the 9 cards are known to the player.
However, playing strategies for initial hand can be determined as well. This can be done through
simulation and expert knowledge. [3] provides an extensive table for comparison of performance
of opening hands, however, it was designed specifically for Loki, and thus uses certain
unspecified changes and heuristics. A similar table specific to this project needed to be created,
with estimated performance values of opening hands. These performance values were used to
determine the bucket or group that the hand would be assigned in. More details regarding this
simulation is given in the next subsection, 4.3.3.
After having dealt with the pre-flop, the post-flop also needs some extensive strategies. Groups in
this case were decided on the basis of an optimistic hand potential. Optimistic here can be defined
has a fairly loose strategy that looks for patterns and hopes to complete them in the future. All the
26
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
possible winning patters such as sequences, flush or pairs are seen, along with their completeness
to finally determine the group of a particular state of a hand.
It is important to note that unlike the pre-flop hand, the post-flop works on patterns rather than
on actual cards. For example, simulation determines the playing value of each possible initial
hand, i.e. all 2,652 of them, yet, the post-flop hands are taken on the basis of pattern, instead of
individual values. A sequence of 3,4,5,6,7 would be treated in this system exactly the same as a
sequence of 4,5,6,7,8 as they both follow the same pattern. The sequences pattern, is divided into
low and high number sequences, and is not judged on the exact value of the cards. This is because
of the information explosion of the game state. For example, leading up to the final betting round,
a player could have 6.74 * 1011 combinations of cards available. This number is clearly too large to
allow individual analysis of each hand.
Aggressive strategies are commonplace among expert human players. This is one of the wellknown strategies of master play, i.e. to intelligently utilise aggressive play to increase the doubt in
an opponent's mind about a player's luck or bluff. Always staying aggressive is obviously not
considered wise, as it discloses the player's strategy, but general aggressive behaviour is accepted
at the master level. This reasoning has lead to a tight aggressive showdown player.
The above categorisation requires more explanation. The tightness refers to its decision to only
play to the end if it feels it has at least some form of a pattern available to it, i.e. it does not play to
the end with a 'High Card'. The aggressive nature is apparent from Section 4.3.4 where the table
shows that anything above a pair is automatically put into the raising group, and is thus given the
maximum aggression.
Another important point to be mentioned is that the evaluation of the betting strategy for Anki –
V1 is done before each betting round, and the decision is maintained throughout that betting
round, irrespective of the strategy of the opponent.
It should also be clear from the above explanation that the Anki – V1 does not bluff, it works with
pure evaluation strategies, evaluated after each betting round. Overall it can be termed as a 'Tight
Aggressive Player'.
27
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
Simulation and Enumeration allows real-time strategy build up, and limited reaction to
opponent's responses in a game. This new player is called Anki – V2. It primarily uses the
concept of Probability Triples.
“ A probability triple is an ordered triple of values, PT = [f ,c ,r ], such that f + c +
r = 1.0, representing the probability distribution that the next betting action in a given context is a fold (check) , call (bet) , or raise, respectively.” [13]
The probability triples allow the program to create a controlled randomised strategy over a betting
round, as compared to Anki – V1, which worked with a strict strategy over the whole round. Each
of the numbers in the Triple represent the individual probability of a certain action. A significant
difference between the quotation given above and the implementation in this project lies in the
range of the numbers used. Poki and Loki used a range between 0.0 – 1.0 to express the Triples,
whereas this thesis utilises all the real numbers between 0.0 and 100.0. It makes no major change
to the expressive power of the program, but allows a percentage output for each of the Triple
values.
4.4.2 - Statistical Method vs. Random Generator
The exact working of the Anki – V2 is explained in sequence below, this provides an overview of
the entire Figure 13 seen previously :
1. Like Anki – V1 for the initial hand, Anki – V2 also plays simulated games with
randomised values for the flop, turn, river and opponent hands. In the end, it receives a
winning percentage. The games played are called 'pseudo games' and the winning
percentage is called WP or Winning potential.
2. Using this WP, and some pre-defined formulas, the player creates a Probability Triple for
the next betting round.
3. During the course of the betting round, at every decision point, a random real number
between 0 and 100 is generated which is compared against the probabilities of the
Probability Triples, to decide on the betting action.
4. The randomised betting action is adjusted using the Tightness setting provided to prevent
'silly' decisions by the player.
33
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
5. At the end of the betting round, when the next set of community cards are shown, the
pseudo games are played again, by including information that is now available. For
example, if the flop and turn have been revealed, Steps 1 through 4 are repeated, but Step
1 only generates random rivers and opponent hands, and uses the flop and turn
information to get a more exact WP.
The exact formulas and values used to calculate all the data mentioned above is given in Section
4.4.3. However, it is important to understand the need for this simulation method against a more
mathematical statistical model.
One of the major drawbacks of simulation is that it is essentially a form of approximation. Experts
have documented and proven that luck or a strange coincidence of events can effect a hand's
performance to make it seem better or worse than its actual value [10]. This phenomenon can
show its effects for a couple of thousand hands at a time. For this reason, even though extensivesimulation can be considered to be a very good approximation, it is always exactly that, and can
unknowingly contain high levels of noise. The other option that can be considered is that of
statistics.
A statistical method to find the Winning Potential of a given hand would consist of finding all the
scenarios under which the current hand is stronger than an opponents'. This is clearly a more exact
method of hand evaluation. Also, there is definite possibilities of this method in the final betting
round, when the final state or pattern of a player is known. The amount of unknown information is
quite scarce, i.e. only the opponent's two-card hand. Thus an exact statistical model of hands
better than the players' can be generated.
This method however has a near exponential blow-out when the first betting round is considered.
The number of possibilities of future cards have been discussed earlier, i.e. 6.74 * 1011 possible
hands. To group these hands in terms of those which are better, equivalent or worse is clearly a
mammoth task, the kind for which there is neither computational power nor time. This is further
proven by the fact that no computer Poker player created till date has ever tried to obtain the exact
statistical evaluation of a game state.
34
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
Another major reason for the choice of simulation is that fact that it mimics human play. It offers
both optimistic or pessimistic viewpoints at times, quite like another human player. This allows a
more random strategy than what a strict statistical model would provide.
4.4.3 - Formulas and Evaluation
There are a variety of formulas and numbers that have been utilised in the formation of Anki –
V2. These features need further explanation, both for their function in the program and their
justification. The exact formation of the Probability Triple is also explained, and so is the working
of the betting action evaluator.
4.4.3.1 - Pseudo games and Winning Potential
Each pre-betting round evaluation begins with the simulation of 1000 pseudo games, at the end of
which the number of games that the player won, drew or lost are reported back. These numbersare then used to determine a Winning Potential. The WP is then adjusted using enumeration,
which is explained in Section 4.4.3.3. And finally, the WP is converted into a Probability Triple.
Figure 14 shows a part of the program code which relates to this exact sequence of work, along
with a representation of the formula used to calculate WP from the data of pseudo games.
Figure 14. Sequence of evaluation of Winning Potential
The 'X' written in the final line of the Prolog code is the Probability Triple that is generated. The
method 'assign_str' is explained in Section 4.4.3.2.
The first justification is regarding the 1000 pseudo games being played. This is due to the time
constraints specified by Darse Billings that a player should not ponder over a decision for more
than two seconds. And with the provided computational power, 1000 pseudo games were found to
require 1 – 2 seconds of computation time. Higher computation power would allow more pseudo
games, and thus provide a better approximation, but under the current restrictions, this the best
that the game can offer.
35
play_pseudo_game(...),WP is ((W + (D / 2)) / (W + D + L)) * 100,WP1 is WP + N,assign_str(WP1, X), ...
Winning Potential=
Win Draw
2
Win Draw Lose×100
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
10%, as this allows substantial bluffing power to the player even in the case of bad hands. As a
result, the probability triple for 0% WP becomes [80,10,10] in the form [Check, Bet, Raise].
Human players can be found to change their strategy rather dramatically with knowledge of the
current hand having more than 50% or 75% Winning Potential. This strategy change has been
dulled to a large extent in the program, to allow a more gradual change relating to the exact
Winning Potential.
The hand with an exact 50% WP can be seen to have the Triple [45, 45, 10], with a jump to [40,
40, 20] for a hand with a number just above 50 but tending to it. From this point on, the checking
power falls at a greater speed, while the betting power builds up.
The final change occurs at 75%WP, at which point the need for a greater raise probability is
introduced. Exact 75% WP has a Triple [15, 65, 20] and a WP just over 70, but tending to it, has aTriple [15, 30, 55]. This may seem like a dramatic jump, but in practice it is quite mellow due to
the inability to raise in most situations. In these situations, the player simply bets, thereby
reducing the apparent jump in probability. This is discussed further in the next subsection.
4.4.3.3 – Betting Strategies, Randomised Numbers and Enumeration
This section deals with the happening inside a betting round. The Anki – V2 Player has a
Probability Triple available to it, guiding its future moves, however, these moves still need to be
monitored and executed.
“The choice (of betting action) can be made by generating a random number ,allowing the program to vary its play, even in identical situations. This is analogousto a mixed strategy in game theory, but the probability triple implicitly containscontextual information resulting in better informed decisions which, on average,can outperform a game theoretic approach.” [13]
Figure 16 provides a Prolog code excerpt from the actual program that details the working of
Anki – V2's betting turn. The random number generated is a real number between 0 and 100. That
allows decimal type Triples to be treated correctly, and not rounded off to the nearest integer.
37
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
The Strategy Selection shown in Figure 16 converts the random number into one of the betting
strategies. It is done in the following manner; if the random number is lesser than the probability
of a check, the result is Check. Otherwise, if the number is between the probabilities of check and
the sum of check and raise, then the result is a Raise. In all other cases, the result is a Bet. For
example, if we are given the Triple [40, 35, 25], random numbers generated between 1- 40 would
lead to Checks, 40 – 65 would lead to Raises and 65 – 100 would lead to Bets.
The Enumeration step involves looking at the previous action of the opponent and slightly
changing the odds to represent the game state more accurately.
“... we have specific information that can be used to bias the selection of (opponent's) cards. For example, a player who has been raising the stakes is morelikely to have a strong hand than a player who has just called every bet.” [1]
Thus, both the actions of checking and raising by an opponent offer information regarding the
opponent's hand. It can generally be assumed that a raise results from a good hand, and a check
from a much worse one. Using this data, the Winning Potential can receive small tweaks to better
represent this information. It has been decided that upon each of the opponent's checks, the WP
would be incremented by 2, thus implying that the player now has a 2% greater chance of
winning, as the opponent seems to have a bad hand. Similarly, an opponent's raise decrements the
value of the WP by 2.
This enumerated value is usually transferred between the betting rounds, with the exception of the
pre-flop to post-flop change. In this case, the WP of hands are found to change so drastically
through the different flops possible that any notion of enumeration over this stage would be
redundant. Also, the value of 2 was chosen to allow a maximum of 10% change in the value
created by the Simulator. This is a value chosen by the author, and thus, further improvements can
38
random_float(...), % Random Number Generationchoose_rel_str(...), % Strategy selectionchange_win(...), % Enumeration Stepexact_str(...), % Strategy Refinementeval(...). % Strategy Implemented
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
This chapter tests and evaluates both the system and all the computer players described in Chapter
4, Methodology. Firstly, there is a testing of the system design and architecture in the white box
method. This is followed by black box result checking of the system. Finally, all the players, i.e.
Random – 1, Random – 2, Anki – V1, and Anki – V2 are evaluated, with most emphasis on the
latter two. The results and evaluation is also compared to the previous research of Loki, Poki, etc.
discussed in Chapter 2, Literature Review.
5.1 - System Test – White and Black Box Testing
The first batch of experimentation and testing that needed to be performed on the program is
regarding its completeness and soundness. Its stability needed to be proven, to justify any results
obtained from it later on. The architecture of the program was put to brute-force worst-casescenario tests to try and prove its soundness. These test were conducted on the Human vs. Human
specification, so that each step of the program could be monitored and observed.
The following tests were conducted and found to complete successfully :
1. The program started up without any errors and provided completely random opening
hands to both the players of the game. Also, absolutely no repetition or pattern in the
cards was found over a number of hand requests.
2. The program was found to provide the Human Player with all the necessary game state
information, including the cards he/she held, community cards and the financial state of
the game. All the information was found to be accurate. An example screen shot of the
program is provided in Figure 17.
3. The betting options were found to adhere to their respective constraints, along with
allowing the player to re-play the last move in case an erroneous choice was entered, e.g.
raising when it is not permitted.
4. The betting rounds were found to progress in the manner required, and ended upon equal
commitment of monies from both players, i.e. in the cases of two checks, two bets or a
raise followed by a bet.
40
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
strategy that an expert would recommend for master play. The static aspect of this player is not a
disadvantage to it either, as the opponent, Anki – V1, does not have opponent modeling.
In addition to testing the performance of Anki – V1 against the pre-defined computer player, it is
also imperative that Anki – V1 prove its increase in its performance as it develops. Anki – V1 is
created from four different betting strategy/evaluation components; pre-flop evaluation, post-flop
evaluation, post-turn evaluation and final evaluation (post-river). It needs to be shown that the
introduction of each one of these components adds value to the player as a whole.
For the purpose of these experiments, the Anki – V1 with only pre-flop evaluation was coined as
Start-Eval Anki. The next upgrade with both pre-flop and post-flop evaluations is called Flop-
Eval Anki. The addition of post-turn evaluation leads to Turn-Eval Anki and finally, all four
evaluations come together to be called Final-Eval Anki. Each experiment between the players
consisted of 10,000 tournaments. This was done so, in lieu of the fact that previous research hasshown that up to a couple of thousands of games can be affected by good or bad luck of a player
[10]. Thus, to make the statistical result more accurate, and assuming at least 100 hands per
tournament, 10,000 tournaments provide us with a million games. This gives us an unbiased result
that is free from the luck factor. All the results were checked to confirm that more than a million
games had at least been played, and this was found to be true.
Figure 21 shows the performance of improving Anki – V1 against Random – 2. As can be seen,
each improvement is found to benefit the performance.
46
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
constraints of the project, the test base for the project was restricted to a close community, and
thus certain members of the community had additional information available to them, which
helped them develop a strategy against Anki – V1. For this reason, they have been considered in
the category above the absolute class.
Finally, advanced players are either people with frequent exposure to the game in tournament play
(with real money online or in the cash form), or intermediate players with knowledge of the
player's capabilities. Once again, due to the given constraints, the experiments were held to a
lower capacity than ideal. However, at least three individuals were gathered from each of the
prescribed categories and were asked to play till they either won or lost a tournament.
The final result data from all the tournaments was gathered, and sorted once again according to
the categories in which the human players had been divided. Figure 23 provides a brief outlook of
Anki – V1's performance against the human players. Each point on the line of a performancecurve is the cumulative average of Anki – V1's money at that point of time, hereby measured in
number of games. Also, the important points in the game are provided with their game number.
Figure 23. Anki – V1's performance against human players
It can be seen from the figure above that Anki – V1 succeeds in its primary objective of beating
the beginner player, i.e. the tournament ends with Anki – V1 having all of the 2000 money on the
50
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
on the actual experiments and their results is given in the subsections below, with certain aspects
of the research expressed in Chapter 6, Conclusions and Future Work.
5.4.1 - Anki – V2 vs. Computer players
Tests for Anki – V2 start from the easiest, i.e. the Completely Random Player. Anki – V1 has setthe bar for most of the test results of Anki – V2, and thus Anki – V2 would be expected to win all
the tournaments against this player, like Anki – V1.
The amount of result data, however, is more restricted in this case. Anki – V1 could play 10,000
tournaments in an hour and was thus given the ideal figure of 10,000 tournaments. On the other
hand, Anki – V2 takes 1 – 2 seconds for each evaluation that it creates. This introduces a massive
time lag into the system, and faced with the time constraints of the project, it is unreasonable to
play 10,000 tournaments. Instead, the figure has been reduced to 100 tournaments. This figure
may seem very small, but it is the best compromised size that can be considered. 100 tournaments
take about a day to finish evaluation, and provide an average of 25,000 games, which is more than
twenty times the recommended amount required to get rid of the luck factor [10].
Figure 27 shows a graphical representation of the Anki – V2 vs. Completely Random Player
tournaments. As it can be seen, Anki – V2 passes the first test and easily wins all its tournaments
against Random – 1.
54
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
Figure 27. Anki – V2's performance against Completely Random Player
The next test for Anki – V2 comes from Random – 2, or the Non-Folding Random Player. This
should be a true test for Anki – V2 as Anki – V2 is a quite a moderate player in terms of
tightness, and tends to fluctuate quite highly between its perception of tightness and looseness.
This is mostly due to the incorporation of the Simulation engine and allowing some deliberate
randomness to creep into the system.
Figure 28 shows the result of the first test conducted between Anki – V2 and Random – 2. As can
be seen, it was a complete failure. This was mostly seen due to Anki – V2's inability to continue a
strategy. It is found to abandon reasonably good cards in which it has invested a lot of money just because their their chance of winning falls below the specified threshold. Unlike Anki – V1, Anki
– V2 saw a lot of hands folded in the latter stages of the game, after they had already been deemed
play worthy. Betting round memory was found to be inadequate and the need for game memory
was recognised.
55
0
10
20
30
40
50
60
70
80
90
100
% TournamentsWon
% Games Won
Anki - V2's Performance
%
V i c t o r y
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
Figure 31. Anki – V2's Performance playing against Human Players
Overall, Anki – V2 was found to perform satisfactorily. Its exact comparison to Anki – V1 isgiven in the section below. There was a lot of positive feedback from the Human players,
especially from the Advanced Players who seemed to find the player quite good.
In comparison to the Beginner players, Anki – V2 was found to be quite superior, however, two
of the three beginners commented on the player's good luck at the time. Even if the true
performance of Anki – V2 is actually lower at the current setting, it can be expected that through
strategy modification and experimentation, a player can be created that is more suited to fight
beginners.
The players lost, as expected, to the Advanced Players, once again, not putting up too much of a
fight. Once again, due to time constraints, modified versions of the player which were more loose
and aggressive were not tried, but would have offered some insight into the playing capabilities of
the master level players, and then try to imitate them.
The intermediate players took a long time to understand the working of Anki – V2 (upto 1686
games), but were once again able to exploit the Tightness in pre-flop and the inability to stick to a
strategy for the whole game. Anki – V2 was found to fold a lot of the times after the turn or the
river because the probability of winning dropped, and it chose a 'bad' random number. This flaw
59
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
encouraged the Humans to be even more aggressive and loose, allowing them to once again
imitate the Random – 2 Player.
5.5 - Anki – V1 vs. Anki – V2
The most important results also lie in the comparison between the two players Anki – V1 andAnki – V2. This comparison is two-fold, firstly, there is the direct play, in which the two Players
Anki – V1 and Anki – V2 play against each other. Secondly, the players are compared on their
performances against the Human and Random – 2 Players.
5.5.1 – Direct Anki Comparison
The two players faced each other in a direct competition similar to experiments performed on
them with Random – 2, i.e. 100 tournaments. Figure 32 shows the result of the first series of
experiments, playing Anki – V1 against the original Anki – V2.
Figure 32. Original Anki – V2's playing against Anki – V1
It is visible from the figure that Anki – V2 acted at par with Anki – V1, so there is no apparent
increase in performance. However, similar to Anki – V2's fight against Random – 2, the most
60
0
5
10
15
20
25
30
35
40
45
50
55
60
% Tournaments
Won
%Games Won
Anki - V2's Performance
% V i c
t o r y
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
Figure 37. Comparison of Anki – V1 and Anki – V2 against Advanced Humans
It is in this figure that we see some disappointing results, as Anki – V2 loses faster than Anki –
V1. However, once again, further experiments with a more aggressive and intelligently tight
versions of Anki – V2 will offer a better insight into the true power of Anki – V2, as it will win
more and in a more efficient manner. Also, apart from the numerical results, all the advanced
players who had played Anki – V1 as well, commented on the uncertainty of Anki – V2. One of
the Advanced Player had the following insight to offer;
“Anki – V2 has clearly introduced randomisation; this is forcing me to concentrateharder and play a smarter game. I am less likely to blindly bet pre-flop, am forced to fold later in the game and don't understand the player's hand till after the turn. Anki – V1 offered the same card information at every betting round (which Anki – V2 doesn't).” [24]
The advanced players were surprised to find that they had beaten Anki – V2 faster than Anki –
V1. They commented on how they found Anki – V2 a much harder player, and thought that the
only reason for a faster tournament was that each game became worth a lot more due to Anki –
V2's randomised betting strategy even in the face of bad hands. It is also for this reason that the
players treated Anki – V2 with more respect as it fought harder than Anki – V1.
5.6 - Anki and the Previous ResearchSome of the results that were observed in the sections given above are consistent with many of the
64
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
'Luck plays a huge factor in poker' ; this is a statement that has been repeated many times in both
this thesis and in many previous research papers. This can be seen in Figure 38, at both the sharp
drops, at around 2500 and 5500 hands played. These are the sharpest drops in the graph, and have
been accredited to luck in [3]. Similar trends can also be seen in Figure 36, in the plays of both
Anki – V1 and Anki – V2 against Intermediate Human Players. These games against the
Intermediate Players are long enough for certain luck factors to show themselves, and they do. It
can also be noted that these luck factors introduce the steepest curves seen in any of the analysis.
Thus, the factor of luck cannot be denied in Poker play.
Another observation made in both PsOpti and the versions of Anki is that of the 'Blink Factor', or
the fact that humans get tired, but machines never do. The first fall of 'the count' at around 2500
hands was due to luck, but he continued to perform badly, and he stopped after a while saying that
he was tired and wanted to retire for the day. He came the next day and started to recover. Once
again, the Anki analysis that best shows this result is the long tournaments Anki – V2 playedagainst Intermediate Human Players. At least half of the Intermediate players complained of
fatigue at this point in the game, and commented on how they 'took a short break to gather their
senses' before continuing once again. This 'Blink factor' is something a future version of Anki
could be taught to exploit, as Human players' ability seem to take a serious fall during this time.
Lastly, an aspect of memory that needs to be developed that human beings already utilise in their
strategies is that of 'remembering the best strategy to compete against an opponent'. Both PsOpti
and the versions of Anki do not try to remember the strategies that helped it win against the
opponent, and this makes a difference to the output. After the 'Blink Factor' experienced by 'the
count', he bounces back after around 3800 hands. This is because, even after a day's break, he
remembers the strategies he had played earlier to good effect, and utilises these strategies once
again to increase his performance. This phenomenon can be seen in Figures 36 and 37 for Anki –
V2 plays against Intermediate and Advanced players. Anki – V1 was seen to perform well right at
the start, as was noted at the start of this discussion, but Anki – V2 starts of quite poorly. This is
simply because the Human Players attack Anki – V2 similar to the strategy they learned playing
against Anki – V1, and even though Anki – V2 is a better player, that strategy is found to work
well against it as well.
66
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
Four computer players, i.e. Random – 1, Random -2, Anki – V1 and Anki – V2, along with initial
strict strategy players such as Always-Checks1, Always-Checks2, Always-Bets and Always-
Raises were created by the author of this thesis.
The project succeeded in its goal to make Knowledge and Strategy based Texas-Hold'em Poker
Players in Prolog. The players created, Anki – V1 and Anki – V2, were of good quality and
provided many positive results. They were created using different approaches, which were well
documented, and have been presented in this thesis. The documentation also verifies the stability
and soundness of the program, which allows the results to be considered reliable. Through these
results, the quality of Anki – V1 and Anki – V2 can be justified to be good, as they both appear to be sub-intermediate level, with the possibility of Anki – V2 excelling given the correct settings.
Both Anki – V1 and Anki – V2 were evaluated against Random – 1, a completely random player;
Random – 2, a non-folding random player; and three categories of Human Players, Beginner,
Intermediate and Advanced.
The project followed a life-cycle shown in Figure 39. The author's previous knowledge and
extensive reading provided a basis for Human Heuristics. These heuristics were formulated into
Computer/Machine Heuristics. Experimentation led to many lessons being learned and being re-
iterated to improve the Human Heuristics, which lead to further formulation of Computer
Heuristics.
Figure 39. Life-cycle of the Project leading to a better formulation of Human Heuristics
68
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
Players if enough experiments are carried out with humans to tune its Aggressive and Tightness
Behaviours against those of the Intermediate Human Players.
Comparison of results obtained from PsOpti's play against the master level human player, 'the
count', and that of Anki – V2 against Intermediate Level Human Players revealed that Anki – V2
was at least just below Intermediate Level in its play. This coupled with the fact that Anki – V2
can further be tweaked in its strategies to play Intermediate Level Players even better, means that
it can possibly claim to be Intermediate Level with the right adjustments.
6.3 - Conclusions of Poker Game and Betting Strategies
The creation and evaluation of Poker Players Anki – V1 and Anki – V2 have revealed a lot of
interesting results and conclusions relating to human strategies. As discussed earlier, one category
of strategies is concerning the relative number of hands played, which is described by Tight,
Moderate or Loose. Anki – V1 was found to be very tight, as it only played the hands it had a
very good chance of winning in, and Anki – V2 was found to range from tight to moderate
depending on its randomised betting strategy.
The second category of discussing player strategies is concerning the amount bet per game. Both
Anki – V1 and Anki – V2 were very aggressive. This was a good choice as both players showed
good winnings in the games that they were confident of winning, and thus the aggressive attitude
managed to take more money than usual away from the Human Players they played against.
The extremely tight strategy of Anki – V1 was defeated by loose strategies of Human Players.
Beginners lost their games due to bad plays, and never sticking to a strategy. They also folded
early in the game without trying to figure out the hand that Anki – V1 may have. Thus, a tight
strategy against a very tight intelligent player does not work. Both advanced and intermediate
human players eventually defeated Anki - V1 by adopting loose strategies and trusting the player.
Thus, a predictably tight player can be defeated by a reactive loose player. The human players
had to be reactive loose players, i.e. react to the strategy of Anki – V1 and fold when Anki – V1
raises multiple times. The need for the reactive aspect can be proven by looking at the play of Anki – V1 against Random – 2. Random – 2 was a very loose player and it lost against Anki – V1
to the extent of 93% of the tournaments. It can thus be concluded that a tight aggressive player
70
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
will beat a very loose player. The aggressiveness can be shown through the reasoning that a tight
player needs to make up for the money lost in folding numerous games through winning large
amounts in the games that it plays to the end.
Anki – V2 offered a much better test-bed for strategy manipulation as its aggressive and tightness
strategies could be tweaked to give the desired combinatorial strategy. Anki – V2's victory against
Beginners was for the same reason as that of Anki – V1, but its longer struggle against
Intermediate Human Players arose from its unpredictability and randomised betting strategy
which forced the human players to observe the player for longer to figure out a strategy to defeat
it. Thus, a controlled randomised strategy is a much better option against human players, as
it adds confusion to a human player's mind. Anki – V2's quick defeat against Advanced players
was also one of the project's biggest lessons. Even though it is known that Anki – V2 could be
tweaked to perform better against that category of human players, the conclusion received from
the experiment was that betting strategies need to be remembered over betting rounds. This istrue as most of the Advanced Players commented on Anki – V2's excellent unpredictability, but
poor choice to fold a bluffed hand after committing lots of money to the pot.
Anki – V2 also lost horribly against Random – 2 player, this was because an unsure moderate
(tightness index) player will loose to a completely loose player. However, when the moderate
player is converted into an intelligent loose player, it can perform wonders, as the intelligent
loose player was found to beat the completely loose player in over 98% of the tournaments
played. By intelligently loose, it refers to the fact that Anki – V2 chose to fold the worst of its
hands.
Finally, the last conclusion was obtained through the play of Anki – V1 against Anki – V2. They
were both created equal, and they performed as such. But as usual, Anki – V2's adaptive
capabilities allowed it to gain an upper hand through modifications. In the case of equal
tightness, the more aggressive player is found to win. This was proven through the
experimentation of making Anki – V2 more aggressive.
71
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
The quality of Anki – V1 and Anki – V2 allow these players to be excellent tools for teaching
beginner and intermediate players simple strategies of tightness and aggressiveness. These playershave already been fully coded, with the Prolog code provided in Appendix A, and thus only need
small additions to form teaching support. Both players have also proven their stability through
both documentation and extensive play against computer and human players.
Anki – V1 plays against beginner players really well, and offers a great future as a teaching tool
of poker to new human poker players. This is because it plays just above their level, and is based
completely on rules and strict strategy. Each action of Anki – V1 is justified through the hand
strength or the hand potential, and it can use this information to suggest possible moves and
provide support to beginner level human players. Human players at this level were found to have
most trouble understanding the concepts of poker, and getting to grasp with the winning patterns
and simple loose and tight strategies. All of these aspects can be expressed through an Anki – V1
based help engine.
Anki – V2 offers a slightly higher level teaching tool, as it encompasses not just tightness and
looseness but also aggressive and conservative strategies. On top of that, it is also fully
customisable, which allows a person to set the playing strength of Anki – V2. Moreover, the
complete control over Anki – V2's strategy, allows people learning poker to learn to play against
Aggressive, Conservative, Tight, Loose and Moderate players all using the same program. This
way, users can experience play against particular strategies, and learn to play better. The strategy
maker itself can be randomised, to give the user an Intermediate – level playing platform. Once
again, like Anki – V1, coding of the actual methods has already been done. Anki – V2's learning
tool can also provide help to its users, by giving them a good estimate of the winning probability
of their current hand, thereby allowing them to learn the value of any kind of hand.
72
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
The previous section discussed one of the possible applications of Anki – V1 and Anki – V2.
There are also certain additions that can to be made to the project, to both improve and realise its
full potential. Most of these additions involve human testing with the various settings of Anki –
V2. There is very few code addition to be done, but the inputs of experts to help provide better
enumeration is explored further in Section 6.4.3.
Anki – V1 has a static hand evaluation technique, and thus all of its groups and scoring system
was put to a complete test when it was played against both pre-coded players and Human players.
The extension that can be considered for this player involves the increase in scoring categories for
buckets and additional strategies. The current buckets are based on the patterns that were
observed in the Anki – V1's hand. The scoring system only involves numbers 0 to 4, in which 0
signifies the strategy of 'check else fold', 1 – 2 allocates betting, 3 allocates raising and 4 signifies
an excellent finished pattern which does not need to be re-evaluated. The scoring system can be
extended to provide support for additional strategies, some of which are present in Poki [10], like
'check else bet', 'bet if opponent checks, else check', etc.
Anki – V2, being the second player that was created, did not get the full extensive testing it
deserved to show its abilities against human players. It showed how by changing its internal
strategies, it could perform much better against static strategy players. Using the same reasoning,
and an automated strategy modifier, it can force it's opponents to frequently change their strategy
against this Anki – V2 as well, thereby increasing its performance. Also, this would allow a muchmore extensive set of conclusions similar to the kind that were documented in Section 6.3.
6.4.3 - Resource based extensions to project
One of the major aspects missing from this project was that of expert systems. All previous
players such as Loki, Poki and PsOpti had the expert input available from a master level player,
i.e. Darse Billings. The author of this project started from being an Intermediate Human Player
and can now be considered Advanced at best. Darse Billings justifies a lot of his heuristics and
expert systems of Loki through past experiences [13], which are not expressed in the research to
the extensive detail as is required for the coding of a similar Poker Player. Thus, the availability of
73
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
1. D. Koller, A. Pfeffer; Generating and Solving Imperfect Information Games. IJCAI 1995:1185-1193.
2. D. Billings, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T. Schauenberg, D. Szafron;
Approximating Game-Theoretic Optimal Strategies for Full-scale Poker. IJCAI 2003:661-668.
3. D. Papp; Dealing with imperfect information in poker. Master's thesis, Department of Computing Science, University of Alberta, 1998.
4. S. J. J. Smith and D. S. Nau; Strategic planning for imperfect-information games. InWorking Notes of the AAAI Fall Symposium on Games: Planning and Learning, 1993.
5. D. Koller, A. Pfeffer; Representations and Solutions for Game-Theoretic Problems. Artif.Intell. 94(1-2): 167-215 (1997).
6. S. A. Gordon; A comparison between probabilistic search and weighted heuristics in agame with incomplete information, in: AAAI Fall 1993.
7. J. R. S. Blair, D. Mutchler and C. Liu; Games with imperfect information. In Proceedingsof the AAAI Fall Symposium on Games: Planning and Learning, 59— 67 (1993).
8. D. Koller, N. Megiddo, B. von Stengel; Fast algorithms for finding randomized strategiesin game trees. STOC 1994: 750-759 (1994).
9. M. van Lent and D. Mutchler; A pruning algorithm for imperfect information games. InProceedings of the AAAI Fall Symposium on Games: Planning and Learning (1993).
10. D. Billings, A. Davidson, J. Schaeffer, D. Szafron: The challenge of poker. Artif. Intell.134(1-2): 201-240 (2002).
11. J. Schaeffer, D. Billings, L. Pea, D. Szafron; Learning to Play Strong Poker. In proceedings of the Sixteenth International Conference on Machine Learning (ICML-99),J. Stefan Institute, Slovenia (Invited Paper), 1999.
12. J. Cassidy; The Last Round of Betting in Poker, The American Mathematical Monthly,Vol. 105, No. 9. (Nov., 1998), pp. 825-831.
13. D. Billings, L. P. Castillo, J. Schaeffer, D. Szafron; Using Probabilistic Knowledge andSimulation to Play Poker. AAAI/IAAI 1999: 697-703.
14. J. Shi, M. L. Littman; Abstraction Methods for Game Theoretic Poker. Computers andGames 2000: 333-345.
15. A. Junghanns, J. Schaeffer; Search Versus Knowledge in Game-Playing ProgramsRevisited. IJCAI (1) 1997: 692-697.
16. J. F. Nash; Non-cooperative games, Ann. Math. 54 (1951) 286–295.
17. J. F. Nash, L. S. Shapley; A simple three-person poker game, Contributions to the Theoryof Games 1 (1950) 105–116.
18. J. von Neumann, O. Morgenstern; The Theory of Games and Economic Behavior, 2nd
75
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
19. H.W. Kuhn; A simplified two-person poker, Contributions to the Theory of Games 1(1950) 97–103.
20. N. Findler; Studies in machine cognition using the game of poker. Communications of theACM 20(4):230-245 (1977).
21. C. Cheng; Recognizing poker hands with genetic programming and restricted iteration.Genetic Algorithms and Genetic programming at Stanford, J. Koza (editor), Stanford,California (1997).
22. D. Sklansky and M. Malmuth; Texas Hold’em for the Advanced Player, Two Plus TwoPublishing, 2nd edition, 1994.
23. K. Takusagawa; Nash equilibrium of Texas Hold’em poker, Undergraduate thesis,Computer Science, Stanford University, 2000.
24. Personal correspondence with Human Players
76
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
% This program is the one that was used to test Human Play against Anki – V2. It still contains all the%different players such as Anki – V1, Random – 1, etc. within it.
:- use_module(library(system)).
% 1 - 13 for Ace, 2, 3, ... Jack, Queen and King% 1 - 4 for Clubs, Diamonds, Hearts and Spades.
% Strategy 1 - Always bets.% Strategy 2 - Always raises when possible, otherwise bets.% Strategy 3 - Random choice including folding.% Strategy 4 - Random choice without folding.% Strategy 5 - Uses initial eval to decide fold(check) or bet.% Strategy 6 - Uses initial eval to decide fold(check) or raise.% Strategy 7 - Uses initial eval to decide fold(check) or bet or raise.% Strategy 8 - Uses initial eval to decide Strategy 3 or 4.%
:- dynamic seed/1.
seed(124353425).
:- nl, nl,write('************ Welcome to Texas Hold Them ***************'), nl,write('* Each card is represented in a tuple *'), nl,write('* (Card-number,Card-suit) as denoted below: *'), nl,write('* *'), nl,write('* 1 - 13 for Ace, 2, 3, ... Jack, Queen and King; *'), nl,write('* 1 - 4 for Clubs, Diamonds, Hearts and Spades. *'), nl,write('*******************************************************'), nl,write('Please give seed : '),
play_poker(X, Y, Z, 'n', N, N1, Stream):-end_game_check(Y, Z, X, N, N1, Stream), !.
play_poker(X, Y, Z, 'y', N, N1, Stream):-initialise_game(A),set_players(A, B, C, Y, Z), play_game(A, B, C, X, N, N1, Stream), !.
play_poker(X, Y, Z, _, N, N1, Stream):-nl, write('Kindly choose from the given options of y or n!'),nl, write('Do you wish to play another game? '),read(A), play_poker(X, Y, Z, A, N, N1, Stream).
write('The tournament has ended. Player 1 has '), write(B1),write(' money, and Player 2 has '), write(C1), write('.'), nl,nl(Stream), write(Stream, 'The game has ended. Player 1 has '), write(Stream, B1),
78
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
write(Stream, ' money, and Player 2 has '), write(Stream, C1), write(Stream, '.'), nl(Stream),X = 0,(B1> C1 -> write('Player 1 wins the tournament!') ; write('Player 2 wins the tournament!')),write('There were '), write(N), write(' games played.'),nl, write('Player 1 won '), write(N6),nl, write('Player 2 won '), write(N7),nl, write('Drawn games were '), write(N9),write(Stream, 'There were '), write(Stream, N), write(Stream, ' games played.'),
nl(Stream), write(Stream, 'Player 1 won '), write(Stream, N6),nl(Stream), write(Stream, 'Player 2 won '), write(Stream, N7),nl(Stream), write(Stream, 'Drawn games were '), write(Stream, N9),!.
play_game(_, (_,_,0.0), (_,_,C1), X, N, (N1, N2, N3, T, N6, N7, N9), Stream) :-T1 is T + N, N4 is N3 + 1, N5 is N1 + 1,nl, write('*******************************************************'), nl,
write('The tournament has ended. Player 1 has 0 money, and Player 2 has '),write(C1), write('.'),nl, write('Player 2 wins the tournament! '),write('There were '), write(N), write(' games played.'), !,(N5 > 0 -> X = 0, write(N5), write(' tounaments played.'),
nl, write('Total games played: '), write(T1),nl, write('Player 1 won '), write(N2), write(' and '), write(N6),nl, write('Player 2 won '), write(N4), write(' and '), write(N7),nl, write('Drawn games were '), write(N9),nl(Stream), write(Stream, 'Total games played: '), write(Stream, T1),nl(Stream), write(Stream, 'Player 1 won '), write(Stream, N6),nl(Stream), write(Stream, 'Player 2 won '), write(Stream, N7),nl(Stream), write(Stream, 'Drawn games were '), write(Stream, N9);X = (N5, N2, N4, T1, N6, N7, N9)).
play_game(_, (_,_,5.0), (_,_,C1), X, N, (N1, N2, N3, T, N6, N7, N9), Stream) :-T1 is T + N, N4 is N3 + 1, N5 is N1 + 1,
nl, write('*******************************************************'), nl,write('The tournament has ended. Player 1 has 5 money, and Player 2 has '),
write(C1), write('.'),nl, write('Player 2 wins the tournament! '),write('There were '), write(N), write(' games played.'), !,(N5 > 0 -> X = 0, write(N5), write(' tounaments played.'),nl, write('Total games played: '), write(T1),nl, write('Player 1 won '), write(N2), write(' and '), write(N6),nl, write('Player 2 won '), write(N4), write(' and '), write(N7),nl, write('Drawn games were '), write(N9),nl(Stream), write(Stream, 'Total games played: '), write(Stream, T1),nl(Stream), write(Stream, 'Player 1 won '), write(Stream, N6),nl(Stream), write(Stream, 'Player 2 won '), write(Stream, N7),nl(Stream), write(Stream, 'Drawn games were '), write(Stream, N9);X = (N5, N2, N4, T1, N6, N7, N9)).
T1 is T + N, N4 is N2 + 1, N5 is N1 + 1,nl, write('*******************************************************'), nl,
write('The tournament has ended. Player 2 has 0 money, and Player 1 has '),write(B1), write('.'),nl, write('Player 1 wins the tournament! '),write('There were '), write(N), write(' games played.'), !,
(N5 > 0 -> X = 0, write(N5), write(' tounaments played.'),nl, write('Total games played: '), write(T1),nl, write('Player 1 won '), write(N4), write(' and '), write(N6),nl, write('Player 2 won '), write(N3), write(' and '), write(N7),nl, write('Drawn games were '), write(N9),nl(Stream), write(Stream, 'Total games played: '), write(Stream, T1),nl(Stream), write(Stream, 'Player 1 won '), write(Stream, N6),nl(Stream), write(Stream, 'Player 2 won '), write(Stream, N7),nl(Stream), write(Stream, 'Drawn games were '), write(Stream, N9);X = (N5, N4, N3, T1, N6, N7, N9)).
play_game(_, (_,_,B1), (_,_,5.0), X, N, (N1, N2, N3, T, N6, N7, N9), Stream) :-T1 is T + N, N4 is N2 + 1,
N5 is N1 + 1,nl, write('*******************************************************'), nl,
write('The tournament has ended. Player 2 has 5 money, and Player 1 has '),write(B1), write('.'),nl, write('Player 1 wins the tournament! '),write('There were '), write(N), write(' games played.'), !,(N5 > 0 -> X = 0, write(N5), write(' tounaments played.'),nl, write('Total games played.'), write(T1),nl, write('Player 1 won '), write(N4), write(' and '), write(N6),nl, write('Player 2 won '), write(N3), write(' and '), write(N7),nl, write('Drawn games were '), write(N9),nl(Stream), write(Stream, 'Total games played.'), write(Stream, T1),nl(Stream), write(Stream, 'Player 1 won '), write(Stream, N6),nl(Stream), write(Stream, 'Player 2 won '), write(Stream, N7),
nl(Stream), write(Stream, 'Drawn games were '), write(Stream, N9);X = (N5, N4, N3, T1, N6, N7, N9)).
play_game(A, (1,B,B0), (2,C,C0), X, N, N2, Stream) :- N1 is N + 1,B1 is B0 - 10,C1 is C0 - 10,nl, write('*******************************************************'), nl,
write('Your hand (personal 2 cards) is : '), write(B), nl,write('*******************************************************'), nl,nl(Stream), write(Stream, 'Player 1 has cards : '), write(Stream, B),
nl(Stream), write(Stream, 'Player 2 has cards : '), write(Stream, C),
% First round evaluation and betting.
eval_start_good(C, Good1),
80
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
(P4 = 3 -> write('It is a draw!'), write(Stream, 'It is a draw!');write('Player '), write(P4), write(' has won!'),write(Stream, 'Player '), write(Stream, P4), write(Stream, ' has won!')),nl, write('Player 1 now has '), write(B6), write(' money.'),nl, write('Player 2 now has '), write(C6), write(' money.'), nl,
81
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
nl, write('The current state of the poker game is as follows :'),nl, write('Your cards are '), write(B), write('.'),nl, write('The cards on the table are '), write(D), write('.'),nl, write('Your money is '), write(B1), write('.'),nl, write('Your opponent has '), write(C1), write(' money.'),nl, write('The money currently in the pot is '), write(R1), write('.'), nl,(M > 0 -> write('You need to bet a minimum of '), write(M), write(' to continue.'); true),nl, write('Please choose one of the following options for betting,'),nl, write('f - fold, b - bet'),((M > 0, N < 6, B1 >= 20, C1 >= 10) -> write(', r - raise') ; write('')),(M = 0 -> write(', c - check : ') ; write(' : ')),read(X),eval(X, B1, C1, R1, B3, R3, M, M1, 1, N, X1, Stream),!,change_over(1, X1, A1),change_over(P, X1, P2),
82
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
write('The river (final community card) is : '),write(D), nl,write('*******************************************************'), nl,
nl(Stream), nl(Stream), write(Stream, 'The river is : '),write(Stream, D).
eval('c', B1, _, R1, B1, R1, M, 0, A, _, 'c', Stream):-M = 0,nl, nl, write('Player '), write(A), write(' has checked.'),nl(Stream), write(Stream, 'Player '), write(Stream, A), write(Stream, ' has checked.').
eval('c', B1, _, R1, B1, R1, M, M, _, _, 's', _):-nl, nl, write('Did I say that you could check? Did I? Huh? Huh? Go up and check...'),nl, write('I did not, did I? So, please answer correctly.').
eval('b', B1, _, R1, B2, R2, _, 10, A, _, 'b', Stream):-B2 is B1 - 10,R2 is R1 + 10,nl, nl, write('Player '), write(A), write(' has bet.'),nl(Stream), write(Stream, 'Player '), write(Stream, A), write(Stream, ' has bet.').
eval('r', B1, C1, R1, B2, R2, M, 10, A, N, 'r', Stream):-M > 0, N < 6,B1 >= 20,C1 >= 10,B2 is B1 - 20,R2 is R1 + 20,nl, nl, write('Player '), write(A), write(' has raised.'),nl(Stream), write(Stream, 'Player '), write(Stream, A), write(Stream, ' has raised.').
eval('r', B1, _, R1, B1, R1, M, M, _, _, 's', _):-nl, nl, write('Did I say that you could raise? Did I? Huh? Huh? Go up and check...'),nl, write('I did not, did I? So, please answer correctly.').
eval('f', B1, _, R1, B1, R1, _, _, A, _, 'f', Stream):-write('Player '), write(A), write(' has folded!'),write(Stream, 'Player '), write(Stream, A), write(Stream, ' has folded!').
eval(_, B1, _, R1, B1, R1, M, M, _, _, 's', _):-nl, nl, write('Did I say that you could write that? Did I? Huh? Huh? Go up and check...'),nl, write('I did not, did I? So, please answer correctly.').
write('Player 1 has the following cards : '), write(B),nl, write('Player 2 has the following cards : '), write(C),nl, write('The following cards are on the table : '), write(D),nl, nl,nl(Stream), nl(Stream), write(Stream, 'Player 1 has the following cards : '), write(Stream, B),nl(Stream), write(Stream, 'Player 2 has the following cards : '), write(Stream, C),nl(Stream), write(Stream, 'The following cards are on the table : '), write(Stream, D),nl(Stream),append(B, D, B3),ace_it(B3, B4),q_sort(B4, [], B5),
eval_start(X, A) :-suited(X, A1),sequenced(X, A1, A),A > 0, !.
eval_start(X, A) :-high(X, A, 8).
suited([(_,A),(_,A)], 1).
suited(_, 0).
paired([(A,_), (A,_)], X):-((A > 9 | A = 1) -> X is 3 ; X is 2).
sequenced([(A,_), (B,_)], X, X1) :-(B is A + 1 |B is A - 1 |A = 1, B = 13 |B = 1, A = 13),greater(A, B, _, B1),(B1 > 9 -> X1 is X + 2 ; X1 is X + 1).
sequenced(_, A, A).
high([(A, _), (B, _)], 1, N):-(A = 1 | A > N),(B = 1 | B > N).
high(_,0, _).
eval_flop(_, _, 4, 4).eval_flop(_, X, _, E):-
sequen(X, 1, 2, 0, 1, A, 2),E1 is 0,(A > 9 -> A1 = A ; A1 is A - 1),(member((A1,_),X) -> E2 is E1 + 2 ; E2 = E1),A2 is A1 - 1,(member((A2,_),X) -> E is E2 + 2 ; E = E2).
eval_flop(B, X, _, E):-eval_flusher(X, E1, 1, A),member((_,A), B),(E1 = 2 -> B = [(_,A),(_,A)], E is E1 ; E is E1).
91
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006
append(B, [], [], [], B).append(B, [M, N, O], [], [], [M, N, O|B]).append(B, [M, N, O], [P], [], [M, N, O, P|B]).append(B, [M, N, O], [P], [Q], [M, N, O, P, Q|B]).
assign_str(Win, (C, B, R, Win)):-Win =< 50,C is (80 - (Win / 2)),B is ((Win / 2) + 10),R is 10.
assign_str(Win, (C, B, R, Win)):-Win =< 75,C is (90 - Win),B is (Win - 10),R is 20.
assign_str(Win, (C, B, R, Win)):-Win > 75,C is 15,B is 105 - Win,R is Win - 20.
94
7/31/2019 Knowledge and Strategy-Based Computer Player for Texas Hold'Em Poker_Hold_2006