PROGRAMMER’S WORLD WILF’S WORKSHOP
MASTERCLASS EXTRA
This Series
Part One
Part Two
Part Three
Part Four
Creating a teachable program
Building an interface that meets the needs
Basic analysis of the game
Building code that can learn
How the program can adapt to experience
Rules of the game PROJECT TIME
2 00 HRS
MINS
SKILL LEVEL 0
10
7
INTERMEDIATE A strong knowledge of programming is useful
YOU’LL NEED THIS Any programming language, but QBasic is used here for illustration purposes.
!
http://forum.pcplus.co.uk
There’s more to noughts and crosses than meets the eye, says Wilf Hey, particularly if you’re encouraging a program’s ‘intellect’ ur current project in the Workshop is to craft a program that learns a game and, because it’s relatively simple, I chose noughts and crosses. We must establish a few facts first: we can’t expect the program to learn how to play the game (that’s a much more complex task), but we can insist that it learns how to win at the game. Or rather, how to play well.
O
Even the famous chess-playing computer Deep Blue was programmed knowing the moves of each chess piece, and rather a lot of chess lore was built into it from the start. It was equipped to learn from its experiences, too, and that is what we endeavour with the PC and noughts and crosses. Let’s consider the fundamental rules of the game rather than tactics for the moment. Two players take alternating turns placing their symbol in any one of nine cells, so long as it’s empty. The object of the game is to be the first to make a row of three cells with your symbol: rows may be orthogonal (rows or columns) or diagonal. You probably know already that if the two players are skilled, there will be no winner and every game will end in a draw. You may also know a strategy or two. The most commonly known is the idea of seizing the middle cell in the three-by-three matrix. It’s true that you can guarantee not losing if you play your first move in the centre (whether or not you go first), but it may surprise you to know that this is not a necessary strategy. If you go first, you can put your symbol in any cell and still be certain of not losing. Last issue we looked at how to build the visible interface between man and computer, because ultimately we will be challenging the program to play against us, hoping that it will
Multiple representations
Perhaps it seems most natural to think of the noughts and crosses matrix as a three-bythree array. The content of each of the nine entries could define whether the related cell is empty (0), contains an ‘X’ (1) or an ‘O’ (2). You can run through the cells with a FOR/NEXT loop embedded inside another loop.
1
170
PCPlus 229 | May 2005
When it has become expert, your program may make the response in red on its second move, knowing that it can’t lose.
learn from its mistakes. This month, we’ll study the actual sequence of play and look at how we can encourage learning behaviour. 1 Having all the answers Consider all the possible settings of the playing matrix (or board). We can, of course, eliminate many configurations as being impossible (where there are fewer ‘X’ symbols than ‘O’, or at least two more ‘X’ symbols than ‘O’), but there are still quite a large number. After the first turn of each player, there are 72 configurations. Even if you cut down rotations and reflections, there are still 10 distinct configurations. In theory, we could build a program that has in-built answers for every configuration and it would be a perfect player. However, even in this simple game, the number of configurations is great – in chess, it’s astronomical. We can’t
There are several ways to record the status of noughts and crosses
Another way to represent any particular configuration is to convert it to a number. Again, treating blank as 0, ‘X’ as 1 and ‘O’ as 2, a configuration turns into a nine-digit number (base three) that can be converted into a decimal number. The advantage of this method is that a small integer encapsulates the full information.
2
Another way (especially useful when debugging routines) is to represent the status of the matrix as a string that’s nine characters long. In this example, we’ve shown a small symbol as a placeholder for space, signifying an empty cell. Many languages are rich with string processing commands, making this method attractive for our puposes.
3
MASTERCLASS EXTRA
WILF’S WORKSHOP PROGRAMMER’S WORLD
Wilf Hey
So what is Wilf’s workshop?
wilf.hey@futurenet.co.uk
The art and theory of modern programming, whichever languages you use. Learn techniques that will benefit any project, and gain a deeper understanding of advanced coding disciplines.
Wilf Hey has been grappling with programming for several decades now, and is always up for a game of noughts and crosses
PASSING SHOT
1. The priority strategy for ‘X’ is to win. 2. It’s to block the threat. 3. The best strategy is to create a jeopardy.
The three areas of competence: in-built knowledge, the ability to analyse and the ability to base an action on experience.
work with so many configurations, so we must look for rules that govern our response, each rule applying to many possible configurations. First, what are the rules that we should be building into the program? What basic facts will it need even before it learns strategy? Well, it must know to give its opponent a move, and then take its own move in turn. It must be aware of which cells are empty and available for it. These are important but purely mechanical considerations. However, let’s grant a bit of innate intelligence for our program: it should be able to recognise a threat (so that it can block the line) and it should also be able to recognise an immediate opportunity to win. In the game, there’s an effective strategy that deserves to be known from the start. On occasion, there will arise the opportunity to make a double threat. Two different lines are nearly complete and your opponent can block only one of them in the turn. If the program can make a double-threat, it’s guaranteed a win on the next turn. Here are the steps encapsulating the knowledge that we’ll build directly into our learning program:
1 When the (human) opponent has completed a turn, the program must first of all analyse whether the opponent has won. If the opponent has won, the game has ended. 2 Otherwise, it must determine whether there’s a draw. If so, the game has ended. 3 Otherwise, the program must compile a list of empty cells, considering that each is a potential choice for its move. 4 It will then test whether any of those moves is an immediate win. If it finds at least one, it eliminates all nonwinning choices and goes to step seven. 5 (If there’s no immediate win), it will then test whether any line bears a threat. If it finds at least one, it eliminates from its list of potential moves all non-blocking moves, and goes to step seven. 6 (If there’s no immediate win, nor threat), the program will test whether any of the potential moves makes a double threat. If it finds at least one such move, it eliminates others from its list of potential moves. 7 The program now randomly chooses from its list of potential moves (which may have been doctored by some of the above steps).
In our version of the learning program, we find the single integer representing the current configuration, and then compare it to the integers associated with all its rotations and reflections. The lowest value integer then becomes ‘templates’ for all the rest, significantly simplifying the quantity of configurations we have to store in memory.
4
Rotation and reflection of the matrix can be accomplished in several ways, but one of the easiest is by reordering the characters of a ninecharacter string using the MID$ and VAL functions (or their counterparts in other languages). Some languages have a ‘TRANSLATE’ function that reorders a string with a single instruction using the same idea.
5
You’ll see, if you investigate our source code for noughts and crosses (on the SuperDisc) that we’ve refined the method of displaying the ‘X’ and ‘O’ characters. We designed them by use of LINE and PAINT, then looked carefully at the results. From this data, we constructed a method to draw the outline of each symbol, followed by a single PAINT. This is faster and removes the flicker that you may see with the original method. Check out our debug subroutines that enabled us to probe the original results, providing the necessary data to trace an outline.
While you’re in at the planning stage, it can be useful to provide for more than one method of representation. For example, the integer method is best for efficient storage, but not so wonderful for rotation and reflection. It’s wise (and a good mind exercise) to construct little subroutines that convert between each of these three methods.
6
PCPlus 229 | May 2005
171
PROGRAMMER’S WORLD WILF’S XXXXXXXXXXXXXX WORKSHOP
MASTERCLASS EXTRA
Jeopardy – being sure of a win If you can impose jeopardy you are assured of stunning victory The word jeopardy has its root in gameplay. It comes from the French ‘jeu parti’ (meaning ‘split move’), describing what happens in chess when a knight threatens both the opposing king and queen. The knight (represented by a horse) has a distinctive attack in several directions at once, and cannot be blocked. If a knight is safe from being
GOING FURTHER Our noughts and crosses program so far plays the game competently, but has little intellect and no educational skill. We’ve built in a few minor configurations that will stop it falling into the most basic traps, but you should still be able to beat it quite often. Until next month, see how many of these special configurations you can find that don’t yield to the three basic rules. Remember that to be thorough, you should consider not only configurations for ‘X’ moves, but configurations for ‘O’ moves, so that the program can let the human go first.
NEXT MONTH We’ll encourage foresight: the program should become able to anticipate moves by its human opponent.
taken when it ‘makes a jeopardy’, the attacked king has no choice but to move out of the way. This leaves the knight a free move to take the queen. In noughts and crosses, you can sometimes make a move that threatens a row on the next turn in two different directions. No matter which threat is blocked, the other row is available for
completion. This means that when you make a jeopardy, you’ve secured a guarantee of winning on the very next move. A jeopardy can’t occur before ‘O’ has taken two turns, but often ‘X’ can make a threat by its second turn, preventing ‘O’ taking precautions against the coming jeopardy. ■
With these steps, the program will be able to perform quite adequately, playing a competent game against most players. You can think of this kind of knowledge of the game, in-built as it is, as the counterpart of raw intelligence in a human. Talented though the program appears, it’s not learning anything. Even at this stage, incorporating these rules will require an interesting skill within the program: the ability to analyse a configuration even though it’s not yet happened. For example, the program must consider in turn each potential move in order to know which potential moves to eliminate. So, we must build in the ability to analyse any ‘imaginary’ configuration of the game matrix, and check which of the rules applies. Suppose we want to analyse what to do in a particular configuration, we can establish all the possible moves and cut down the results using the rules. From the remaining choices, we can select one at random. What if our rules don’t cover every situation? The program will have to deal with these odd configurations in a separate way. When the program runs into an undocumented configuration, it can record this and see if it can formulate a rule to cover. 2 Computer intellect We can extend the rules slightly by imagining, for each possible move the computer makes, all the possible replies by the opponent. If any of these is a win for the opponent, or gives the opponent a chance to make a double threat, the computer should eliminate this move, since failure is effectively suicidal. This kind of knowledge – the ability to analyse future possibilities – corresponds to intellect in human terms. Eventually, we’ll get the program to demonstrate three kinds of knowledge: innate or in-built skill (manifest in the seven rules), intellect (in its ability to analyse the opponent’s immediate response), and learning (based on its experience while winning and losing actual games).
The white knight at e6 on this chessboard has black in a jeopardy. Black’s king is in check and must move, leaving the knight free to take the queen.
3 Morsels of memory In many games, it’s important to know the sequence of previous moves. In noughts and crosses, there’s no such need. We need only work out a good way to store configurations in memory, like snapshots of possible games. I’ve continually referred to the playing area as a matrix, which should give you a clue as to one form of storage: a three-by-three array, each element indicating whether there’s an ‘X’ or an ‘O’ in a cell, or that it’s empty. Say we use zero for empty, one for ‘X’ and two for ‘O’. The matrix will occupy at least 18 bytes, and we can store each matrix configuration into a three-dimensional array, two dimensions of the matrix, and a third being a list of these configurations. An easier way would be to keep the nine values of cells in nine integer values (again, zero for blank, one for ‘X’, two for ‘O’) – just one dimension instead of two. 4 Nine-character strings We’ve recently looked at how the modulo function can help us convert back and forth between one- and two-dimensional matrices. For example, ‘Row two column three’ is cell six, since [cell = 3 x row + col – 2]. Storing these nine values repeatedly in a matrix will be easier, because it need only be two dimensions. Yet another way of storing memory of a configuration is as a nine-character string. This is shorter than using numbers and can always be converted back to numbers by simple code in a loop – or by using QBasic’s INSTR function, which returns the displacement within a string at which a specific value can be found. Furthermore, a string can be stored and retrieved by one or two commands. Try the program NandX on the SuperDisc. It’s well on the way to being a good player, with only innate skill and intellect. Soon, it will be ready for insertion of self-learning code. Be with us next time! PCP
On reflection Space can be saved when we introduce the notion of matrix orientation These eight configurations are all effectively the same. ‘X’ should employ the same strategy in each case.
172
PCPlus 229 | May 2005
How many configurations may face the computer? There are nine possibilities for the first placement of ‘X’, and for each of these, there are eight potential frist moves for ‘O’. If the computer goes first, it faces 72 different configurations after just one turn – and this is just for starters. It’s not as bad as it first appears, because the order in which the play goes doesn’t matter. Two (or more) combinations of moves may produce the same configuration. You can make an economy by recognising that two configurations can be different, yet effectively the same, because they are mirror images of each other. Additionally, one configuration
may be exactly the same as another but rotated by 90 degrees. In fact, each configuration is essentially the same as seven others: four are identical but rotated relative to each other, and then each of these has a mirror image. Sometimes, symmetry will reduce this number, but very often, one configuration can stand for eight. The program can deal with this without too much of a problem. When it encounters any of these eight configurations, it can react in the same way. How can you identify one master configuration to stand for all eight effectively identical ones? In NandX, we‘ve dealt with this in the following
way. We generate the other seven configurations related to the current real one and calculate a distinctive number for each. The configuration with the lowest number will serve as the template for all eight related configurations. All the program has to keep in mind is how to twist back to the current configuration after taking its turn. This means remembering how many mirroring moves were done (0 or 1) and how many rotations (0 to 3) were needed to turn it into the template. This twisting and mirroring to match a configuration to its template is called a ‘transform’. ■