MASTERCLASS EXTRA This Series
Part One
Problem-solving techniques
Using lateral thinking to apply solutions
WILF’S WORKSHOP PROGRAMMER’S WORLD
Wilf Hey
Solution programming
wilf.hey@futurenet.co.uk Wilf Hey has been grappling with programming for several decades now, and is always up for a game of noughts and crosses
The key to successful programming is skill in using the appropriate solution for the unwrapped problem, says Wilf Hey
PROJECT TIME
n old saying tells us that there are many ways to skin a cat. It is rare to find a problem that yields only to one strategy. Often there are several different techniques that can be tried, and often each will bring new insights to the true nature of the problem you first encountered. Sometimes you have to take a mental step backward so that the problem is presented in a new light.
SKILL LEVEL
A
Dr Edward de Bono – the developer of lateral thinking – strongly urges that we use our brains more effectively by developing the habit of deferring decisions. Too often we think in ‘straight lines’, leading us to a single solution which may be valid, but far from ideal. A large stockbroking firm once needed a program that would calculate the balance of a certain 10 stocks for every day in the last quarter. The company had opening balances of each subject stock, and a few hundred reels of magnetic tape, each containing very short records indicating the type and amount of a stock bought or sold, along with the date. The systems analyst responsible for creating a solution decided that the way ahead was: (1) Read all tapes, extracting onto output tapes all those buy/sell transactions for the 10 relevant stocks between the desired dates; (2) Sort all these transactions into date order within stock name; (3) Process these selected and sorted transactions as input to a program which calculates the balance each day of the period, adding or subtracting each transaction against the opening balance. Each day, the running total (for each stock) was printed.
4 00 HRS
0
10
MINS
7
ADVANCED General knowledge of programming is useful
YOU’LL NEED THIS
!
Any programming language, but QBasic is used here for illustration purposes
FILES ON SUPERDISC This month’s program locates both the longest and the fattest worms in a field of 25 random numbers.
All files needed to complete this masterclass are on the SuperDisc, including MaxWorm.BAS
All very well, but this solution generates lots of tape handling while sorting, and completing the whole task can take a good long time. The data has to be scanned three times: twice for extraction and sorting, and once through the reporting program itself. The task requires operator attention (changing tapes frequently) and stinks of the high possibility of things going very wrong. 1 A new perspective Fortunately, the programmer assigned to working on a solution saw another approach to the problem, and quickly persuaded the analyst of a much better and more secure method. He considered the original transaction tapes, noting that their content should already be nearly in date/time sequence – but not quite, because they may have been recorded in small
http://forum.pcplus.co.uk
One problem, two solutions You’ll notice that MaxWorm doubles up; it finds the longest and the fattest worm As I wrote MaxWorm I realised that the original goal – to find the ‘best’ worm in a particular array – had two possible interpretations. Was it better for a worm to be longer (involving more cells), or to have a higher total value? Would it take a longer time to find these two possibly different creatures? Not really, because by force we must look at every possible worm to judge how effective it is. Why not just have two categories, and then when we come to analyse a worm, put it up against two stored comparisons: the longest so far and the fattest (highest total value) so far?
Obviously, if we find a new worm that is the same length as the so-far longest, we judge the new one as better only if its total value is higher. Similarly, if we find a new worm with the same total value as the so-far fattest, we judge the new one as better only if it is longer as well. It’s simple logic. After many trials I found that in the great majority of instances, the longest worm to be found in a matrix also bears the crown for the fattest worm. In one version of the program I did things a little differently. The program went through all possible worms and
memorised the longest and the fattest, but did not display the actual path as it worked. At the end the program became interactive, encouraging the operator to match the solutions within a number of minutes. In my experience players became disillusioned by the intensity of the competition. When it comes down to it, there is no direct algorithm that will generate the best worms, but the search method I use here is very useful in that it creates each and every valid worm for consideration, without going down any blind alleys. ■
The longest worm in this particular configuration occupies 16 cells – none of them a corner cell. There are so many duplicate values in this example that it is not too difficult to find the solution.
PCPlus 227 | March 2005
165
PROGRAMMER’S WORLD WILF’S XXXXXXXXXXXXXX WORKSHOP
MASTERCLASS EXTRA
So what is Wilf’s workshop? The art and theory of modern programming, whichever languages you use. Learn techniques that will benefit any project, and gain a deeper understanding of advanced coding disciplines.
PASSING SHOT Debugging MaxWorm raised some interesting issues. Usually you know what a program is doing, and debugging tends to be supervisory activity – checking that the program is doing electronically what a human would do manually. However, MaxWorm is not like that. We have asked it to do something we would never do: attempt to find every possible configuration that could be called a worm, and choose the best from among them. I found the biggest problem was to convince myself (through forcing various values into the matrix) that the method used would root out each and every valid worm. The method boils down to a simple recursive procedure for adding a new cell to an old worm – but recursive thinking does not come easily.
batches. Could they dispense with the sort? If the program were flexible enough, it could tolerate non-consecutive records in the same way that the Post Office handles batches of letters: by employing the age-old system pigeonhole banks, in this case one for each day in the accounting period. A new solution presented itself: a program was to be written that would read the transaction file raw, ‘pigeonhole’ each transaction from among one of the relevant stocks, and ignore other transactions. No extraction needed, and no sort. No report would appear while input tapes were still being fed into the program, but when they had all been processed the program would gather up the pigeonholed changes to the balance of each stock, and print the summaries all at once. This solution was fast and required a minimum of operator action. The program was no harder to write, either. 2 Sizeable worms Some years ago the mathematician Clifford Pickover devised a puzzle which can be represented ideally on a computer screen. It consists of a five by five matrix, each cell containing a random integer (whole number) between 1 and 26. It’s just possible that each cell will contain a unique number, but it’s much more likely that there will be some duplication. The puzzle is to find the longest chain of cells with no duplications. Each cell must be connected by one of its four sides to the next cell in the chain. Doctor Pickover calls each chain a ‘worm’. The question is: what is the longest worm you can find, given a particular matrix of random numbers. Let’s consider this problem in a number of stages. For starters, it’s easy to construct the matrix on-screen. We’ve drawn such playing areas before, so it’s no great problem. Five cells down and five cells across will suffice, all 25 an appreciable size to be visible. Creating the random values for each cell is easy as well. To produce a random integer
Sequences using Modulo arithmetic
Code that controls a counting sequence so that it does not exceed a maximum is simple: test for the threshold and disallow going beyond it, Here the sequential transition of buttons is four, five, five … repeating. HighButton is set to the threshold number, and the code will not increment the current button number beyond it.
1
166
PCPlus 227 | March 2005
between the positive values x and y in QBasic, enter this line of code: value = INT( RND * (y – x + 1)) + 1
The maximum worm length is, of course, 25 cells, but as we can know with a few moments of thought, there is no guarantee of such a length because of the restriction against duplication. This restriction itself raises difficulties. How can it be enforced? Every cell to be added to the tail of a worm must have its value compared with the value of each already connected cell. If a worm has the values (3, 7, 18, 12, 5) you must somehow ensure that a new candidate cell for appending to the worm contains a value different from all these. That seems to require an exhaustive (but thankfully short) search. Or is there another way? What would be ideal is a single command or function [if worm contains this then…] where [this] signifies the value of the new candidate cell. If only [worm] and [this] were strings, because in QBasic there is an expression for telling where within a string you will find a substring. If A$ is the string ‘ABCDE’ and B$ is the substring ‘D’ then the expression (A$,contains,B$) has the value four. If the substring B$ is ‘X’, the expression has the value zero, indicating that the B$ substring is not to be found within the A$ string. Now we have a new goal in mind: can the values in the worm and candidate cells be represented as individual bytes rather than quantities? The answer is clear: of course they can. If the values in the worm are (3, 7, 18, 12, 5) this can be represented by the string ‘CGRLE’. Suppose you have a candidate cell for appending and it contains the value 20, you need only convert this (to ‘T’) and perform the CONTAINS function: if the result is zero, you know the candidate is acceptable. A single instruction sufficed, not a loop checking against each worm segment in sequence. What about each segment? When you have a worm
The Modulo (MOD) function is useful as a
Sometimes it’s desirable that the sequence cycles itself. This can be done with a threshold test and reset to zero when it is reached, but a more economical way uses the Modulo function. The sequence here goes four, five, one … and so on. Using this method avoids introducing IF/THEN clauses that may become quite complex and confusing.
2
In the MaxWorm puzzle introduced this month (see main body text), the five by five matrix is represented by a 4x4 DIM, with significant rows numbered 0 to 4 and columns 0 to 4, which means 25 cells in all. This is the most compact form in which we can comfortably work with the cells, wasting no space at all.
3
MASTERCLASS EXTRA
WILF’S WORKSHOP PROGRAMMER’S WORLD
The Sierpinski gasket We follow on from last month’s look at fractals in programming On this month’s SuperDisc there are two programs developed from the shell provided with last month’s subject. The coding starts by drawing a large equilateral triangle on the screen, then repeatedly divides it into smaller triangles. There are two working versions to compare with your own
efforts. In the first, a small triangle is built in each corner of the original, and then smaller triangles in each corner of these, and so on as far as we have room, which isn’t more than a few times. The second program does the same but also takes into account the new triangle in the middle. The output
display of the second is interesting, but the display of the first is phenomenal: a Sierpinski gasket, so called because it resembles a sponge full of holes. At level one there are three smaller triangles, and at level two, nine. At level n, what percentage of the gasket is taken up by its smallest triangles? ■
considering all possible worms. A recursive method works well. Take an existing worm and add a new cell to it to make a new valid worm. When no new cell can be added, back up by removing the previous cell and trying in another direction. If you consider trying to find some object in a network of roads, the solution is much the same: keep going in one direction as long as you can. When you come to a dead end, back up and try another way. Say our program does this, looking for no particular goal, but remembering the best results during the process. Starting at the first cell, it will build a worm by adding cells in a certain direction. When it comes to a dead end, it backs up and tries again. Eventually it will have backed up to the first cell again. All worms that start with that first cell have been investigated, so we repeat the process starting with the second cell – and so on until all worms have been generated starting from each cell. The longest seen during this process will be, of course, the longest possible. When you test a program you may want to create a map that translates between symbols you use within the solution.
and a candidate cell, how can you know that the candidate is not already part of the worm? (In other words, how can you ensure that the worm does not intersect itself?) With a little lateral thinking, you will see that this is no problem at all. If a candidate is already part of the worm, its content value will also be part of the worm. The one CONTAINS function nicely tests that the candidate is distinct and furthermore that it is not already used. Perceiving the worm as having a head cell and a tail, the next candidate is a cell just next to the tail, either to the left, right, up or down. How do you address the cells? You can use any scheme you like, but I found it most natural to identify each cell by its column number (0 to 4) and its row number (0 to 4). The nub of the problem is this: the problem can be solved by
3 Seeing double Did you realise that you will in fact see every possible worm twice using this process? This is because we can think of a worm as having its ‘head’ in one cell and its ‘tail’ in another. Yet the recursive process outlined here will consider every possible cell as the head. It will find a worm with head in one cell and tail in another, yet it has no way of knowing whether it has already seen the exact reverse of this worm (swapping head for tail). The complete program, MaxWorm.BAS, is on the SuperDisc this month. This QBasic program generates a random puzzle matrix, then solves it at lightning speed. You can run it repeatedly, and you can terminate it by pressing [ESC]. The source is full of comment so that you can see the algorithm working – but if you don’t have the QBasic interpreter handy you can run the EXE version. The program is built around my ‘Piecrust’ skeleton code, which contains many useful commands, subroutines and functions. PCP
The fractal program Fractal1.BAS demonstrates that calculations based on a triangle’s three corners can produce a crystallike structure which is called a ‘Sierpinski gasket’.
GOING FURTHER MaxWorm has been one of those very interesting projects because it was open-ended when started. I didn’t know whether it was going to be possible to wait while the program did an exhaustive search for all possible worms to find the best. Even when I had formulated the way that all worms could be generated, it was difficult to see how long it would take. Consider a fairly simple game and try to calculate for yourself how long an exhaustive search through possible configurations would take. Think of noughts and crosses: how many pairs of opening moves can there be? Nine for X and any one of the eight remaining for O. That’s 72 – or is it?
NEXT MONTH We’ll be looking at a program which learns how to play a game by both analysis and the experience of losing against a human player.
means of controlling the sequence of switches or measurements
An alternative strategy would have been to use a 6x6 DIM, with rows numbered 1 to 5 and columns also 1 to 5. Rows 0 and 6 and columns 0 and 6 would then exist as a border around the working matrix. Stepping too far along a row or a column will result in entering a ‘virtual’ cell: one of these border cells.
4
The criterion by which a cell is judged ‘correct’ is that it has a distinct value in it. If you arrange that every border cell contains the same value as its closest legitimate neighbour, the matrix will be selflimiting. Every border cell will fail the criterion and will be left unused – and all without time-consuming code to monitor boundaries.
5
Our fractal program calculates three new points, given three points of a triangle. Combinations of three points among these six yield several inner triangles, fractally related to the original. Modulo arithmetic shortens the way these combinations can be generated. Remember, point D is related to A and B, point E to B and C, and point F to C and A.
6
PCPlus 227 | March 2005
167