Letting stones go unturned

Wéiqí, Baduk or simply, Go. Over two thousand years old, this challenging game of strategy and tactics has taught warriors, monks and intellectuals to focus, concentrate and plan. But a program is beginning to change our views of what is possible. In computer competitions this multi-armed bandit has stolen first prize more often than any other. It’s MoGo, or Monte Carlo Go.

The game of Go may have simple rules, but to play successfully requires enormous skill. Few computer programs are able to navigate through the unimaginable number of possible moves and find the best strategy. The solution, as shown by MoGo, is not even to attempt to look at all the possibilities. Instead, a randomised exploration of the options based on multi-arm bandits makes this problem more manageable. The secret behind the early success of MoGo was Upper Confidence Trees.

The idea of using machines to take the role of our opponent in a game arose even before computers had been created. Chess has long attracted the attention of designers of these machines. While the very earliest chess automata were frauds, by the 1950s the subject was being considered seriously. According to Claude Shannon in 1950, we should think of the game as an imaginary tree branching out from the current state of the board. Each possible move was another branch from the current state; follow a branch and you reach a new board state from which new branches grow. Chess has a lot pieces and a lot of possible moves by each piece, making the game tree vast. Even the most powerful supercomputers cannot search all of this tree. However by the late 1990s and early 2000s, our best computers were fast enough to search significant parts of the tree. Computers such as Deep Blue and later Deep Fritz were able to beat human chess grand masters using these bruteforce approaches.

But the game of Go was another matter entirely. Compared to chess, which may have around 37 legal moves possible each
turn on average, Go has rarely fewer than 50 moves possible, and typically over 200 each turn. This means that the game tree for Go is considerably larger than the one for chess. So even with the very fastest supercomputers, a brute-force search for good moves simply doesn’t work for the game of Go. Monte Carlo search is one useful way to make vast search spaces more manageable. Instead of trying to exhaustively search the whole space, a Monte Carlo method randomly samples points in the space and then uses the results of that sampling to approximate the solution.

When applying Monte Carlo search to a game tree, this equates to evaluating many branches of the tree chosen randomly to see which player might eventually win or lose if those moves were followed, and then deciding on the best branch to follow. Traditional Monte Carlo methods might iteratively search the game tree, deeper and deeper, but they sampled each branch with a uniform probability. An even better approach, as introduced by researchers Levente Kocsis and Csaba Szepesvári in 2006, is to bias the random sampling of branches. Using a method called Upper Confidence Trees (UCT) (an extension of a previous method known as Upper Confidence Bounds) the different moves within the game tree are treated like different one-armed-bandit machines. Each time a one-armed-bandit is played it will return a random result, but on average some provide better results than others. In exactly the same way, each time a certain move is tried, it might result in a different outcome (depending on all the other moves), but on average some are better than others. By finding an appropriate balance between exploring which are the best bandits (moves) and exploiting the good ones already found, the Monte Carlo method can be biased towards more useful moves in the game tree. In this way UCT is able to search the game tree of Go selectively and randomly and discover a good move to make each time. (In reality “multi-arm bandits” are more appropriate for Go than one-arm bandits.)

A group of researchers took exactly this approach when they developed their Computer Go playing program. Prof Remi Munos and his student Yizao Wang (a keen player of Go) at Ecole Polytechnique were inspired by the recent work of Rémi Coulom who had created a hugely successful Monte Carlo Go player called Crazy Stone. They joined forces with another student, Pierre-Arnaud Coquelin, and Olivier Teytaud and his PhD student Sylvain Gelly at University of Paris-Sud. This team applied the UCB algorithm to the game tree of Go and without realising it, independently invented the UCT algorithm at almost the same time as Kocsis and Szepesvári. MoGo used UCT to bias its Monte Carlo search – a strategy that enabled the program to win the CGOS tournament where games were played on a smaller 9x9 grid, in August 2006. MoGo was first introduced in the On-line Trading of Exploration and Exploitation PASCAL Workshop 2006.

MoGo has had many success, including winning the Gold Medal in the 2007 Computer Game Olympiad for the game of Go, and being the first program to defeat a professional player in 9x9 Go. In March 2008 MoGo won a 9x9 game against the winner of the "Tournoi de Paris", the most important tournament in Europe, and Catalin Taranu (fifth Dan Pro) claimed that mogo was close to the Dan level in 19x19. According to Olivier Teytaud, one of the current developers of MoGo, “MoGo does not contain a lot of Go expertise and is however quite strong; this shows that the approach is quite general... We are now applying UCT and related tools to other games and to some combinatorial optimisation problems.”

Omniture TouchClarity focuses on optimising the online experience for users – it decides what should and should not be shown within webpages, based on the interaction of past users. The technology is being used by most UK Banks (and many other clients) to provided tailored promotions to visitors based on their clicks within the pages. The problem can be regarded as being similar to a game where the human players (users) make their moves by clicking on certain parts of web pages, and the computer responds with its move of changing the content to make it more likely that the user will choose options of interest to the users (and profitable for the company). So the same kinds of technology used for playing Go can also enable this problem to be solved. Chief Scientist Leonard Newnham explained, “PASCAL allowed us to revisit the problem from the ground up to refine our algorithms.”