TD-Gammon

Gammoned.com

TD-Gammon: From neural network to OS/2 game

by Jeri Dube

When most people think of IBM Research, they tend to think of fractals, scanning tunneling electron microscopes or high temperature superconductivity. Games are not usually one of the thoughts that come to mind. However, the TD-Gammon game included in the IBM Family FunPak for OS/2 Warp was developed by IBM Research.

By virtue of being created at such an auspicious place, you would think that this version of backgammon is quite special. Well, to be quite honest and not so humble, it is! TD-Gammon is the most advanced computer version of backgammon. It can play at the most advanced levels. If the system were a human, it would be rated as a World Class Master.

TD-Gammon was developed by IBM Research Staff Member, Gerry Tesauro. Gerry is not a game developer, rather he is a theoretical physicist who has been working in the area of neural networks and artificial intelligence for several years. He did not initially intend to develop an OS/2 game for the Family FunPak. All he wanted to develop was a basic research project to study learning algorithms that would enable a computer to teach itself a task.

Gerry chose backgammon as the task because it appeared to be a good domain in which a neural network might work well. At this point you may be wondering now that I've mentioned it twice, what is a neural network? Well, in short, it's a model of interconnected neurons (also known as nodes) that was inspired by the logical neurons in the human nervous system. Each connection between neurons has a particular weight value associated with it.

In the case of backgammon, the state of the backgammon board is fed into input neurons that have connections to hidden neurons (or units). These hidden neurons in turn connect to an output layer that holds the value of the state (that is, the chances of winning from that particular state). The computation between the input neurons and the hidden neurons is a weighted linear summation of all the input neurons. The result of the summation is put through a thresholding function. This function compresses the value to lie within a certain range of probabilities. (In case it ever comes up in conversation, the function is known as a squashing function.) The squashing function is a non-linear function. The non-linearity allows a system to learn more complex functions.

To use this model to teach a system backgammon, all the initial weights between the neurons are randomly set. The neural network starts from the opening backgammon position and plays both sides until one of the sides wins. The outcome of the game is used as a reward signal for reinforcement learning. That is, the neural network takes the outcome of the game and adjusts the weights accordingly. The adjustments improve the network's ability to evaluate board states for subsequent plays of the game.

This learning process is repeated hundreds and thousands of times. Using an RS/6000 computer, the learning actually took about two weeks. Gerry and his colleagues were amazed at how well the neural network learned to play backgammon. The system kept getting better and better until it reached the world class master status. Actually, the neural network could improve its play even more with further training and a larger network.

Although playing backgammon on a computer that plays as well as a world class master seems somewhat awe-inspiring, you can work up to it. The game comes with five skill settings, where each higher setting uses an increasing larger and more complex neural network as its underlying engine. If you want to use TD-Gammon to improve your backgammon skills, it is quite good as a learning device. Not only do you get feedback from the results of your playing but the system is quite supportive of you. It gives a modest, 'I win' message when you lose and a hearty 'Congratulations, you win!', when the computer loses.

To embody this expert backgammon-playing neural network into an OS/2 game, IBM Research hired Keith Weiner, a professional PC game developer, to add a front end written for OS/2's presentation manager. TD-Gammon is fully 32-bit and takes full advantage of OS/2 Warp's multi-threading capabilities. Like all presentation manager programs, TD-Gammon comes with a settings notebook where you can set things such as the background color and the animation speed.

Given the success of the TD-Gammon game, I asked Gerry what his next neural network game would be. He told me that researchers have used other games such as Chess, Othello, and Go with varying degrees of success to study neural network learning. None have been as successful as backgammon. Gerry theorizes that the stochastic element of backgammon (i.e. throwing the dice) is what makes backgammon so useful in modeling the self-learning process. With that in mind, Gerry's next venture into self-learning is with financial time series analysis. If that project is as successful at learning as the backgammon game, then I'm really looking forward to that program.

For more information on Gerry's work, you may want to read his article "Temporal Difference Learning and TD-Gammon" published in Communications of the ACM, volume 38, number 3, pp. 58-68 (March 1995).


The above article originally appeared on IBM's Web site at www.austin.ibm.com. It is reproduced here with their kind permission.

Return to the main backgammon page.

Stephen Turner
University of Cambridge Statistical Laboratory
E-mail: sret1@cam.ac.uk

Page last modified: 11-Nov-99

Gammon Empire


Play Backgammon Online Now!

Mac / Linux users?

Click Here


No Need to Download

New! Backgammon Forum

For the novice and the pro's: all about Backgammon on the Backgammon Forum

contact | about us | site map | links
Copyright © 2007 Online Backgammon Inc. All rights reserved.