Note: This series was recently published on GammonVillage in Walter Trice's Beginners Boot Camp section and is now being made available to readers who are interested in learning more about bots and Snowie 3 Professional Analysis, an exclusive GammonVillage service. For a free trial offer please contact GV Editor Michael Strato.Snowie Info - Article 1
I got the news that machines could "learn" around 1960. An article in
Scientific American described how one could build a "computer" out of matchboxes and beads that would learn to play a mean game of Tic-Tac-Toe. The procedure was so simple that a 12-year-old (such as myself) could understand it.
You'd need enough matchboxes to represent every possible tic-tac-toe position - around 100, as I recall. Each box would have a diagram for a partially completed game pasted on top. You would play out a bunch of games, and during each game you would open the boxes corresponding to the sequence of positions that arose. At the end of a game, if X won you would put a black bead into each open box. If O won each open box would get a white bead, and in the event of a draw they would get one white and one black. Then you would shut the boxes and play another game.
I have forgotten the details of the procedure for choosing moves. One way to do it would be to have the "player" open a box corresponding to a legal play and pull out a bead. If it's his colour (black for X, white for O) he makes that play, leaving the box open; otherwise he replaces the bead, closes the box, and tries another. You might start out each box with one white and one black bead, so that in the beginning any random move would have a 50% chance of being chosen. Eventually positions that were good for X would collect a lot of black beads, O winners would get a predominance of white ones, and drawn positions would get a balanced mix.
The whole concept was rather unsettling to my 12-year-old psyche. Could a collection of cardboard boxes and glass beads actually learn, make plans, and think? Would the mere shuffling of beads somehow result in the creation of a
mind? Or, on the other hand, was it possible that all my own thoughts, feelings, perceptions, and experiences really amounted to nothing more than the movement of tiny little beads among invisible boxes according to simple mechanical rules?
Philosophical qualms aside, the idea behind the matchbox tic-tac-toe computer is a useful one, and it is especially useful for analyzing backgammon strategy. Let's start by considering a subset of backgammon that is strategically
less difficult than tic-tac-toe: backgammon with only one checker for each side, played without the doubling cube. We'll need 576 matchboxes. The player on roll has 25 possible locations, including the bar. If he is on the bar, the other player has 24 available spots, and if he is on the board the other has 23 possibilities. The proportion of black beads to total beads in each box will represent the probability of winning from the corresponding position. Our bead-shifting rules will involve only one move at a time. As we play out the games, whenever someone wins, the box for the position just before he won will get a black bead. If a move doesn't win immediately, then we'll select a random bead out of the box for the resulting position, note its colour, put it back in its box, and then add a fresh bead of the
opposite colour (from one of our two giant bead-pots) to the box for the position before the move.
There actually is a decision that may arise in a game of one-checker cubeless backgammon. Sometimes you have to decide whether to hit your opponent's blot or to pass it by without hitting. To make this decision you could open the boxes for the two possible resultant positions, count the beads, and play for the position with the greater proportion of white beads (or just peek into the boxes and choose the one that looks whiter.)
A computer could do it all more efficiently. A program would use 576 memory locations instead of matchboxes. Each memory location would contain a number, representing an estimated probability of winning. These numbers could be initialized to whatever you please at the start of the run - it really doesn't matter. On each move 1 minus the number from the resultant position would be averaged in with the number from the pre-move position, with a weighting factor that could be considered a "learning rate." As with the matchbox computer, a winning move would get a "win" reward - in this case the number "1", rather than a black bead -- averaged into its estimate.
Question 1: why bother? Can't you just calculate the probabilities from first principles without having to go around in circles zillions of times? Diagram 1 illustrates the problem with direct calculation. What if Green rolls 5-5? Then the position repeats. In fact any contact position in one-checker backgammon can lead to any other, and can repeat an arbitrarily large number of times in a single game. A few weeks ago I wrote about how bear-off positions can be solved by averaging over their successors, which works just fine if you already know the answers for the successors. But if a position can crop up among its own successors, then you have a bit of a chicken-and-egg problem. The iterated learning approach circumvents this problem.
Diagram 1:
Question 2: does it work? The numbers are going to jump around while the computer plays out games using random dice. Will they settle down eventually, and will they settle down where they should? The answer to this question is not trivial, but mathematicians have proved that the process will eventually give the right answers if either (a) you cycle through the set of all possible games repeatedly, instead of playing randomly, or (b) you gradually decrease the learning rate according to a couple of constraints that ensure that it decreases neither too rapidly nor too slowly.
Solving one-checker backgammon this way, whether with matchboxes and beads or with an electronic computer, is an example of
temporal difference training. "Temporal difference" is often abbreviated to TD in the literature. The name reflects the fact that estimates are adjusted according to how they change one time step into the future.
In principle, one could compute an exact and perfect solution to the game of backgammon this way. The only problem is that you would need 18,528,584,051,601,162,496 matchboxes, a couple of trillion little beads per box, and eons of time to devote to the project. Even using a computer the number of positions is much too large. (Nonetheless, this is how Hugh Sconyers solved 3-checker backgammon, or "Hyper-Gammon." When you take on one of the hyperbots on GamesGrid you are playing against a perfect opponent whose strategy is represented by a table of about 32,000,000 numbers.)
If assigning a unique number to every possible position is not an option, what are the alternatives? All backgammon-playing computer programs have been based on the idea of giving every position a number representing its value. If the numbers cannot be stored then they must be computed from the position. But how? Early backgammon programmers invented scoring systems for positions based on their own judgment about what's important. Ingredients for an "evaluation function" might be such things as a lead in the pip count, how many points you own, how many opposing checkers you have trapped behind a prime, and so on. This was simply the best available technique. You'd make up an evaluation function, code it up and try it out, observe the results, fiddle with the numbers and the mix of ingredients, and try to make gradual improvements.
Handcrafted evaluation functions were adequate for programs that played other games, such as chess and checkers, because the real strength in those programs lay in their ability to look into the future. In those games the "tree of possibilities" is manageable by a computer. A typical chess position might have 30 legal moves, which means that you can look ahead two moves by considering 900 possibilities, three moves with 27,000, and so on. Clever "pruning" algorithms reduce the number of moves that need to be considered, making it possible to extend the analysis still further.
But in backgammon that is not so easy. At every turn each player has 21 possible rolls, and sometimes hundreds or even thousands of ways to play a roll. To look ahead a mere 3 rolls into the future you may have to evaluate billions of positions. Therefore a backgammon-playing program needs a strong evaluation function, encapsulating the equivalent of the
judgment abilities of a top human player.
The big breakthrough in backgammon programming came in the early 1990's in the form of TD-Gammon, developed by Dr. Gerry Tesauro, an IBM researcher. The guts of TD-Gammon was a "neural network" that had learned to play backgammon from scratch, by playing out games against itself and adjusting its evaluation function using a temporal difference method.
A neural network is a particular kind of mathematical function that mimics some of the features of organismic nerve structure. It's usefulness and potential power derived from two properties. First, a sufficiently large network can reproduce
any function; in a sense, it is universal. Second, there is a reasonably simple way to go about adjusting it so that it makes smaller errors. If you keep on reducing errors, eventually you get to a point where errors can't be made any smaller; and since networks are universal, "as small as possible" might just turn out to mean no error at all!
That a solution to backgammon could theoretically be embodied in a neural network might have turned out to be nothing more than one of those "interesting theorems" that have no practical utility. After all, matchboxes and beads would work just as well "in theory." Very often, though, the big stumbling block in neural network development is putting together a sufficient collection of examples for the network to train against. Tesauro's grand insight was to realize that if you combine temporal difference training with a standard error-reduction method called "back-propagation," then you can let the computer play against itself and produce an unlimited supply of training examples. The errors to be reduced by the training algorithm would be the inconsistencies in the program's evaluation function from one move to the next, and reducing these inconsistencies to zero would accomplish the same for error in the absolute sense.
So he tried it out, and it worked quite well! In the fall of 1991 Dr. Tesauro invited Bill Robertie, one of the best players in the world and twice winner of the Monte Carlo World Championship, to visit the IBM research center in White Plains and take on TD-Gammon. Robertie played a session with the new program and, reporting on his experience in the magazine Inside Backgammon early in 1992, estimated that his edge against TD-Gammon was about a quarter of a point per game in money play. This meant that TD-Gammon was substantially stronger than the two best commercially available programs, which Bill had benchmarked at 0.66 and 0.82 points per game.
TD-Gammon never became available to the public in a useful form. But it wasn't long before others commenced efforts to duplicate and extend Tesauro's accomplishment. Frederik Dahl's Jellyfish hit the market in 1994. For the first time serious players now had a mechanical playing partner strong enough to offer a legitimate critique of their own play. Jellyfish had a half-decent Windows interface and it incorporated some useful features, such as the ability to do rollouts. Jellyfish revolutionized backgammon, both for players and for analysts.
Jellyfish ruled until 1998, when Andre Nicoulin and Olivier Egger offered Snowie to the world. Snowie improved on Jellyfish in the strength of its evaluation function, in the number and usefulness of its features for play and analysis and, most strikingly, in its user interface. Jellyfish lives on, mainly because it sells for less and because it runs well on older computers. (Don't even
think about putting Snowie on anything less than a 500MHz Pentium.) But any serious backgammon player who tries to get by without Snowie is significantly handicapping himself.
The prospect of learning to use Snowie can be daunting, if only because it offers so many choices and options. I know players who have owned the program for years without ever having figured out how to
really use it. Next week we will delve into specific ways that you can use Snowie to learn about backgammon and improve your game, and explore the meanings of the wealth of information that it provides.
Snowie Info - Article 2