Connect four

We've previously said that most of the algorithms we are looking at on this post series are common to many other games. This post will explain how to make an engine that plays Connect Four and in the meantime prepare the road for the chess engine. We are going to finally see some code.

Connect Four move generator and board representation are trivial, so we can skip these steps and directly build our smart engine.

Last week my daily commute brought a new project on Github to share: c4. Let's see what it is about.

Some definitions

Algorithms like Minimax are usually referred as Search Algorithms, they actually search into a game tree. The game tree has board positions as nodes and game moves as edges. A node has a depth that is the distance from the root node. We can limit the search to a fixed depth avoiding to visit nodes farther than it. A node that is at the maximum established depth and it is not an ending node is part of the horizon. An ending node is a node that represent the end of a game.

As usually people refer to turn as a move made by both players, the word ply is used to identify the turn made by a single player, so one hop in the game tree.

Static evaluation

Connect Four is a simpler game compared to Chess, still it is not possible to enumerate all the move combinations for a Minimax: 8 move per ply, assuming we can find a winning path with a depth of 16 plies we have already million of billions of nodes to analyze.

We need to limit the search depth. What happen once we reach the horizon? In this case we have to predict the winner of the game and return a score. For example we can return the probability of the side to move to win or loose the game. The function that evaluates the score of a position is called static evaluation function. Static as it is opposed to the Dynamic evaluation given by the Search.

Many features of a connect four board can suggest us which is the winning player. c4 uses the following heuristics (this code is mostly copied from evaluate.py):

def evaluate(board, weights):
    scores = {PLAYER1: [0, 0, 0, 0, 0],
              PLAYER2: [0, 0, 0, 0, 0]}

    for s in Board.segments(board):
        z = (s == 0).sum()
        if z == 4:
            continue

        c1 = (s == PLAYER1).sum()
        c2 = 4 - (z+c1)
        if c2 == 0:
            scores[PLAYER1][c1] += 1
        elif c1 == 0:
            scores[PLAYER2][c2] += 1

    s1 = sum(x*y for x, y in zip(weights, scores[PLAYER1]))
    s2 = sum(x*y for x, y in zip(weights, scores[PLAYER2]))

    score = s1 - s2
    if board.stm == PLAYER1:
        return score
    else:
        return -score

segments method returns the list of all the 4-items line in the board, this includes horizontal, vertical and diagonal lines. We count and aggregate each segment dominated by a player (so there are no discs of the other one).

At the end we have a list of features like this:

number of segments of 4 discs of the red player
number of segments of 4 discs of the black player
number of segments of 3 discs of the red player
...

To compute the score from this vector of features we give a weight to each of them and then compute the sum.

Assume we set our weights to something like this:

Discs	Weight
1	0
2	1
3	4
4	infinite

And we have a board with the following features:

Discs	Red	Black
1	1	1
2	3	1
3	0	1
4	0	0

The red player score is 1 x 0 + 3 x 1 = 3. The black player score is 1 x 0 + 1 x 1 + 1 x 4 = 5.

The overall score relative to the red player is the difference of its score and the score of the black player, in this case 3 - 5 = -2.

The evaluate function returns a score relative to the side to move.

Search

Now that we know how to handle the horizon nodes properly we can finally write our search function. In the code of c4 you can see many search functions that we will cover. Let's start with the simplest, the greedy algorithm:

def greedy(board):
    moves = board.moves()
    m = moves[0]
    moves = moves[1:]

    bestmove = m
    bestscore = -evaluate(board.move(m))

    for m in moves:
        score = -evaluate(board.move(m))
        if score > bestscore:
            bestmove = m
            bestscore = score

    return bestmove

board.move(m) returns a new Board object. We didn't go the make/undo way as for the chess engine.

What does the greedy function do? It tries all the possible moves and check the static score for each of them. Eventually it selects the move with the highest score. Please note that as the evaluate function returns the score relative to the side to move, we need to invert it, as the next side to move is the opponent.

The greedy algorithm is very simple and can be overtaken easily by a novice player. It is a blind player and can even choose not to block a threat to make its own, ignoring the fact that the enemy is going to win first.

We can write ad-hoc code to avoid this particular issue, still there are several other defects of this strategy... also we already know something about Minimax:

def negamax(board, depth):
    if check_end(board) is not None:
        return endscore(board)

    if depth <= 0:
        return [], evaluate(board)

    bestmove = []
    bestscore = -INF
    for m in board.moves():
        nextmoves, score = negamax(board.move(m), depth-1)
        score = -score
        if not bestmove or score >= bestscore:
            bestscore = score
            bestmove = [m] + nextmoves

    return bestmove, bestscore

Negamax is a variant of Minimax. It is a more elegant way to avoid code duplication: negamax always maximize the score relative to the side to move. To make it works we just adopted a simple trick: invert the score of the recursive call.

We have limited Negamax recursion using the depth parameter.

The function check first the end of the game, then it returns the static score if the node is a leaf, otherwise visit all the children recursively.

Tournament

Now that we've got two engines we can start our first tournament by let them play against each other. Our engines are deterministic: same position, same choice. This means that actually we can just let them play 2 different games (one per side).

These are the results between the Greedy player and Negamax with max depth 1:

Rank	Name	Score	Win	WinX	WinO
1	Negamax(1)	3	1	0	1
2	Greedy	3	1	0	1

Negamax with depth 1 is exactly the Greedy player, the draw is actually the expected result.

Rank	Name	Score	Win	WinX	WinO
1	Negamax(2)	6	2	1	1
2	Greedy	0	0	0	0

Negamax with depth 2 always win against the Greedy player (2 over 2 games!)

Now we can make Negamax plays against itself with different depth.

Rank	Name	Score	Win	WinX	WinO
1	Negamax(3)	3	1	1	0
2	Negamax(2)	3	1	1	0

Surprise! An higher depth doesn't imply the player is smarter. Indeed this tournament is not statistically relevant, we need to make them play against other engines to have an overall result.

Another interesting aspect of this game is the time required by Negamax(3) to move compared to Negamax(2). Let's look at this table:

Depth	Time
1	0.03s
2	0.22s
3	1.82s
4	15.48s

Each level of depth requires about 8 times the time of the previous level. The number 8 is not abnormal, in Connect-Four we can do 8 move per turn in average and this is the time required for the first move (so we always have 8 moves available).

Next episodes

I hope the code on GitHub is clear. On the next posts we'll discuss about how to improve the engine search with standard algorithms before moving back to Chess.

Angry Bits

Words on bytes and bits