Evolving Worms

by Bj÷rn Andersson, Jannike Lundqvist, Lars Íhrngren


We have evolved control programs for the classical game "Worms", made popular again by the Nokia mobile phones.

1 Introduction

The game is very simple. Two worms are trapped on a board divided into a number of squares (e.g. 20x20). They move one square at a time (forward, right or left) and grow either spontanously or by eating some kind of bait (e.g. apples). A worm dies if it hits one of the walls or any of the worms' bodies. The surviving worm is the winner. In our implementation, there are no apples and the worms grow stochastically with an average rate of 0.1 per step.

In some incarnations of the game, the contestants are "snakes" rather than worms. However, this difference is not significant to the strategy.

1.1 Reading Guidance

Chapter 2, Method, describes the the way in which the problem was solved.

Chapter 3, Results, covers the results of the problem solving attempt. Fitness graphs are not applicable in this case, since the selection does not involve a fitness value.

Chapter 4, Discussion, contains thoughts on difficulties encountered and quality of the results.

Chapter 5, Future Work, handles possible future enhancements and possibilities of the system.

Chapter 6, Summary, contains a summary of this report.

Chapter 7, References, gives a listing of resources used during this work.

In the pictures of the game, the worms are blue and green (or dark and light in grayscale environments). The heads are marked by a darker shade of the worms colour. Worms always move in the direction of their head.

2 Method

We have used genetic programming with a register machine language. Each individual is a control program for a worm in the game.

We use homologous crossover (90%) and ordinary 2-point string crossover (10%). The mutation rate is 20%. The population size is 1000 individuals.

The input to the individual includes the coordinates of the opponent's head given relative to its own head. In some cases, the coordinates of the opponent's first curve was given as input, in order to give the individual a sense of what direction the opponent was moving in.

Fig 2.1: Relative coordinates moving upwards.

Fig 2.2: Relative coordinates moving right.

Two constants was used (typically 0 and 1). The number of internal registers was 3.

The output is one value which is interpreted in the following way:
= 0Move forward
< 0Turn left
> 0Turn right

The language consists of arithmetic operations, involving inputs, outputs and registers, an if statement, comparing two registers, and the state function. The following is a summary of the available statements:

The state function returns the state (occupied or vacant) of a given square. The square is defined by relative coordinates provided by two register values. The return value is stored in the first of the two registers.

In later experiments, we introduced some static "bricks" to give the worms a more dynamic training environment.

Fig 2.3: Brick pattern. Note, that it is designed to trap worms that can only turn in one direction.

We also helped the worms by not allowing them to run into things if there was a way to avoid it. The point is that it is not necessary for the worms to learn how to avoid obvious mistakes, which allows them to focus more on creating traps and other strategies. We call this addition the post-execution correction procedure.

2.1 Implementation Notes

All of the individual byte arrays are of the same length (256). The length of the program represented by an array is defined by the first byte of the array. This means that the extra bytes at the end of an individual with a length shorter than the maximum are introns.

While the system is running, competing worms are displayed. These worms are controlled by random individuals from the population. The result of the games does not, however, influence the evolutionary process, which is running at full speed in another thread. This visuals are there simply to make it easier to see if the system is making any progress.

3 Results

In our first experiments we saw that the worms quite quickly came to a unanimous decision on simplifying the problem by only turning in one direction. We regard this as a problem, since a human opponent could easilly take advantage of the fact that a worm cannot, for example, turn left. The problem was finally solved by introducing the post-execution correction procedure (which corrects the individual's output if it is trying to kill itself and reduce the problem of avoiding obvious obstacles).

In later experiments, we have seen some interesting results. Often there occurs a kind of "arms race" where a new way of trapping the opponent is met by counter traps, which are again countered. One example is the "inner track wins" - "U trap" - "U trap evasion" development. (Fig 3.1-3.5)

Fig 3.1: Inner track will win?

Fig 3.2: Outer track is trapped.

Fig 3.3: Almost same situation.

Fig 3.4: Inner track is trapped.

Fig 3.5: Inner track saw the trap. Hope that outer track knows how to turn left!

We have also seen even more complex behaviour, like being happy with slithering into a two squares wide corridor formed by the opponent and a wall, but turning around to escape when the opponent moves one step closer to the wall in order to create a trap. It seems this kind of behaviour requires a deeper "understanding" of the game and also quite a lot of computation and it is impressive that the individual has managed to fit it into its 256 quite simple instructions. (Fig 3.6 and 3.7)

Fig 3.6: The blue worm feels safe so far, but the green one has a plan...

Fig 3.7: The blue worm sees the trap and turns around.

It becomes even more impressive when we see that the same individual has the ability to move back and forth in order to maximize its chances of escape when trapped. (Fig 3.8 and 3.9)

Fig 3.8: Oh no, is the blue worm trapped now?

Fig 3.9: No, it can see this trap too.

3.1 Output From a Typical Run

A version of the genetic programming system is available as an applet: http://kolonisera.rymden.nu/gp/worms/GPapplet.html

This applet evolves from the beginning every time the applet is started (i.e. when the web page is displayed. Also, there is (probably) no way to get printouts from the system when it is run in an applet.

3.2 Successful Individual

A version of the game, where it is possible to compete with an individual, is available as an applet: http://kolonisera.rymden.nu/gp/worms/applet.html

The individual that is the opponent in this instance is just an example of a fairly successful individual. We cannot guarantee that this individual is the best one we have when it comes to competing against human players. It is not able to perform the tricks mentioned above in all instances, but it knows them and displays them sometimes.

4 Discussion

We feel that our choice of homologous crossover and mutation rates mimics the conditions in nature in an appropriately stylized way. Homologous crossover reduces the destructive effect of ordinary 2-point crossover. It does not, however, diversify the population very efficiently. Mutation is more destructive in this application than in nature, since the genome used here is smaller and less intron filled. We feel that 20% is a good compromise between destructiveness and diversification.

A small population (like our 1000 individuals) can produce results faster, since the "generations" progress faster than in a larger population. Hence, we feel that a smaller population is preferable as long as problems with "inbreeding" and lack of diversity can be avoided.

These decisions seems to be justified judging from the results we have achieved.

We tried several ways of getting the individuals to go from the first simple obstacle avoidance strategies to more complex behaviour. In the end it was not punishment that succeeded, but rather a bit of help with the most basic tasks enabled the individuals to concentrate on the more advanced ones.

When a human player is playing against a genetivally generated individual, it is obvious that most of the computer's shortcommings are due to a lack of overview of the board. It is diffucult for the individual computer player to see which areas are more cramped and therefore may be less advantageous to be in. A human player can use her superior pattern recognition abilities to make such decisions very efficiently.

5 Future Work

The worms could, in addition to the state function, be given access to a function finding out how many squares are freely connected to a given square. This would enable the worm to be better at seeing if it may be slithering into a trap.

6 Summary

We have been able to produce worms controlling programs with quite advanced strategies using a simple register machine programming language and a rather small program length.

We have seen quite complex behaviour and it is not an elementary task beating the best individuals at the game.

The way in which this has been achieved is freeing the genetic individuals from the most obvious and simple tasks. The conclusion is that, to teach individuals more complex behaviour, you should not force them to also learn how to deal with the simple things (that you can simply implement yourself).

7 References

JavaTM 2 Platform, Standard Edition, v 1.3.1 API Specification, http://java.sun.com/j2se/1.3/docs/api/index.html

Nokia Games, http://www.nokia.com/games/snake1.html