by Douglas Zare
1 April 2009

Gnu Backgammon and Snowie are both extremely strong backgammon players. Some types of positions are better understood by one bot than the other. Which bot understands the opening better?
In the comments following my February 2009 column, I mentioned that Gnu Backgammon and Snowie disagree on how to play a 1-1 response to an opening 2-1 slot. Gnu hits 24/20*, while Snowie primes 8/7(2) 6/5(2). I decided to survey the disagreements between these bots in the responses to the opening roll. As a referee, I used Stick Rice's collection of rollouts in the opening. I was surprised to find that the bots disagree with each other, or with rollouts, in about 15% of positions. Most differences are too small to call errors, but a few are significant, and even a small mistake may be a sign that the evaluations are off by a larger amount.
| The rest of this article (39.80 K) is premium content. |
Article text Copyright © 1999-2009 Douglas Zare and GammonVillage Inc.
I left out the amount after 4-3 down, 4-4, which both bots get wrong. 24/20(2) 13/9(2) is about 0.020 better than 24/16* 13/9(2).
Douglas Zare
There are many people, including me, who have less than 100 percent confidence in the answers provided by any program, particularly when you are talking about an early-game position where there are many, many moves to the end of the game. If Snowie or GNU make just one move along the way slightly wrong, particularly if that move is the next roll or two after the opening move, than all the results are suspect. My "gut" tells me that if the bots continue to improve, it is quit possible that in 10 years we will look back and say that at least some of the opening moves that you show to be right could turn out to be wrong.
My questions to you are: 1. To what degree do you think your conclusions will hold up 10 years from now? 2. If you agree that one should not apply 100 percent confidence in early-game evaluations and rollouts, would you also agree that if a player prefers a play that the bots say is wrong by less than, say, .025, it's not such a bad idea to go ahead with your preference? 3. What is your prediction for the future of bots? Are they likely to ever get to the point where we can have close to 100 percent confidence in them? If it's not 100 percent today, where, in your opinion is it? 80%? 90%?
Phil
Phil, rollouts are significantly more reliable than evaluations, but rollouts are not perfect, and won't be until bots solve backgammon.
It is a mistake to ignore bot evaluations just because they are imperfect. It is a worse mistake to ignore rollouts just because there is a chance they are wrong. Almost as bad of a conceptual error would be to focus on which play a bot says is right without paying attention to the size of the preference.
A lot of the time, rollouts in the opening will be off by 0.020. (Evaluations are often wrong by 0.040.) That means when a rollout says play A is slightly better than play B, it should not be a surprise if play B is slightly better than play A according to the next generation of bot rollouts. When I see that a long rollout favors play A by 0.010, my conclusion is that the plays are close, not that play A is clearly better. That the plays are close is unlikely to change as bots improve. Knowing that the plays are close is valuable, and lets you adapt to playing at different match scores, or in slightly different positions, or against weaker or stronger opponents.
One thing which may improve in the future is that bots may state how confident they are in the results of an evaluation or a rollout. That will help us to save time when we try to determine which of our disagreements with the bots are worth studying.
Douglas Zare
You must be signed in to post comments.

