Training a Neural Network to Win The World’s Oldest Board Game

Arian Alavi
9 min readMar 19, 2021

--

The Game Board of The Royal Game Of Ur

Neural networks have been demolishing the competition in board games for years now. In 2016, Google’s AI won many games against one of the best Go players, Lee Sedol. In 2018, Google unleashed its AI onto the world of Chess. After 9 hours of training, Google’s Chess AI was able to stand toe to toe with even Grandmasters. These days it’s gotten bored of beating us humans and it's moved onto destroying its fellow robots.

While we teach the AI how to play our games, the AI teaches us a lesson in humility. Yes, the grandmasters are humbled — but humbled are we as programmers. How many of us can even understand how to create such an AI, uprooting some of the most respected board games of our time? The fact is it takes great expertise and great infrastructure to create an AI capable of seeing the millions of branching possibilities in games like Chess and Go.

Yet the grandmasters continue developing their strategies and playing amongst each other, even if they can never beat these AIs. Let us then develop our AI in the same spirit as the grandmasters. Our algorithm likely won’t be dominating the most respected board games of our time, but maybe we can dominate the most respected board game of a simpler time…

Before We Start

Before we start, let’s talk about the rules of this ancient game and the platform we’ll develop this game on.

The Royal Game Of Ur

The Royal Game of Ur is the first board game to be invented. Archeologists have found its boards dating as far back as around 2500 BC. The game is deceptively simple. 4 dice are rolled to count how many tiles you can move your pieces forward. In order to win, you have to get all your pieces through the board and out the end. However, these aren’t your modern 6 sided dice, but tetrahedral die — kind of like pyramids. Each die has 2 silver markings on one of its points and 2 dull markings. If a die lands with a silver marking facing up, that’s 1 move. So you can roll anywhere from a 0–4.

As for the board, some tiles are safe while others are not. The flower tiles are always safe (and provide you with an extra move as well). All tiles not in the middle are safe. When a tile is not safe, you can be sent back home if the enemy lands on your piece there — only one piece can occupy a tile at once.

It’s not a game of luck, but more a game of probability. A lot of the strategy of the game comes from minimizing your risks and maximizing your advantages. For example, because of the way the dice works there is a 37.5% chance of rolling a 2 but only a 6.25% chance of rolling a 4. This can make landing 2 spaces away from an enemy piece a risky move while being 4 tiles away from an enemy piece is a safer bet.

If you’d like to see these rules in action, there is a great video linked below featuring Tom Scott which shows his introduction to The Royal Game of Ur.

An Easy Development Platform

It’s quite simple to make a console-based video game in Python. However, I wanted to create something more akin to a video game than a science experiment. Creating a 3D or even 2D game from scratch is a job that could take a year or more of work. However, thanks to pre-made game engines like Unity, all that engine development time could be shaved off and I could focus on creating the game. I spent about a month developing the Royal Game of Ur in Unity and was able to play matches against myself without running into problems.

AI Infrastructure

The hardest part of the AI is often not training it, but properly integrating it into the application. Let’s talk about how it was done.

ML-Agents

When it comes to strategy games, the AI method of choice is the Monte Carlo simulation. Yet in this case, we’ll be going with ML-Agents. ML-Agents was developed with tasks involving more physical actions in mind, like balancing a ball on your head.

From the ML-Agents team. An example of balancing said ball on head

So why chose Machine learning for a task better suited to Monte Carlo? Well, it’s easier. ML-Agents is an already-made library for Unity, while a Monte Carlo simulation would have to be created and trained completely by hand.

Computer Vision

Neural networks must take in a list of numbers as inputs. What should those numbers be? If we used real computer vision, we would be converting an image of the board into a very long list of numbers. That would not only take a long time to train but be very complicated. Instead, let’s only feed it what the computer needs to know.

When playing a board game, you need to know two things. Where the pieces are and what the last roll was. It would also be nice to know where you can go and if any of the tiles have special abilities. So the list ended up looking like the following.

[Last roll, FlowerTileA, FlowerTileB, …,FlowerTileN, [PieceVisonAA, PieceVisionAB, …, PieceVisionAN], [PieceVisionBA, PieceVisionBB, …, PieceVisionBN],…, [PieceVisionNA, PieceVisionNB, …, PieceVisionNN] , [ValidStartA, ValidEndA], [ValidStartB, ValidEndB], …, [ValidStartN, ValidEndN]]

Each tile on the board has an ID associated with it. The flower tiles, being special, have their IDs given directly to the agent in “FlowerTile”. Is this necessary? Maybe. Let’s put it in because it’ll help train the neural network on different arrangements of The Game of Ur. If we add another flower tile or three to the board then the neural network may have an easier time adapting to the new board if it associates an extra move with an ID in the flower tile list.

PieceVision is simple. Take all the possible paths that go from the start of the board to the end. (In the classical Game of Ur, that is one). Then denote each friendly piece as +1 to the value of that tile and each of the opponent’s pieces as a -1 to the value of that tile.

[ValidStart, ValidEnd] is a list of vectors (two-item list). Each vector in the list represents a valid move. ValidStart is the starting location of the move and ValidEnd is where the piece would land if moved from there.

The Algorithmic Approach

Each tile has a value associated with it based on how dangerous/important it currently is. The algorithm seeks to maximize the current tile value.

Before building a neural network, it’s always good to write a naive algorithm first. Why? It’ll firstly function as a sanity check. Can an AI be built using the inputs I’ve specified? If I can successfully code an AI that can compete with a player, then the answer is yes. Secondly, it’ll help to kink out the bugs with computer vision. By having two algorithmic bots fight each other in Ur for an hour, I’ll definitely know if the computer vision script has any potential for errors. As if that’s not enough reasons, finally, it’ll be something for the budding neural network to play against before it’s ready to fight itself.

Training The Network

Before learning to fight, it must learn to walk without knocking itself out. I train a neural network first for a simpler version of The Royal Game of Ur. This simple version has 5 pieces and 3 dice. After successfully training that one I move onto the real game.

Baby Steps

So, the inputs of the network were talked about earlier, but what should its output be? Well, at the end of the input list is a list of vectors that are valid start and end locations for each piece belonging to the AI. To not confuse it, let’s use that list of vectors. It will pick any number from 1–7 (7 being the max possible number of moves), selecting that valid movement vector.

But what if the AI picks option 4, but there are only 2 valid options? Well, we have to be harsh here. It will lose immediately. Otherwise, the AI will just start guessing random numbers which would be detrimental to its training.

So our brand new neural network will run against a very watered-down version of the old algorithm we made earlier — called HorribleAI. HorribleAI will only pick the most optimal choice 20% of the time. Let’s call our budding hero RealAI.

The first thousand or so matches are RealAI losing immediately because it picks an invalid move.

You Know The Rules and So Does RealAI

The higher the value, the better the neural network is at accomplishing its goals.

After about 60k moves (~20k matches?) RealAI is no longer losing immediately. It now has a concept of which moves are valid and which moves are not. However, it’s also hit stagnation. It has risen to the challenge we have presented it. From now on, it’s RealAI vs RealAI.

RealAI vs RealAI

Ur Boards as far as the eye can see

The picture above shows 40 agents fighting on 20 boards, however, I quickly found this to be inefficient. While my PC could handle that many boards, the biggest bottleneck was waiting for a decision from the AI, which could only be done one at a time.

I then found 2 boards to be the most efficient as the AI always has a decision to make (removes the gaps between moving the pieces).

However, there was no problem running 4 instances side-by-side. That makes 16 agents and 8 games.

ML Agents has a neat feature that lets you run multiple instances of your game at once. My poor CPU could only handle 4 at most, allowing 16 agents to play at once.

From here is value tweaking. It’s a complex process and you can learn more about it from this youtube video. Stop the program to tweak the knobs once in a while. Afterward, it’s a matter of running it until we’re satisfied with the results. Maybe some redos if it really doesn’t turn out well.

1.3 Million Moves Later…

Leaving on the PC for a few nights with adjustments made in the morning eventually yielded an ELO graph like this. ELO is a measurement of the AI’s current skill. It fights past iterations of itself to see if it has improved.

An Anecdote: Your AI Can Be Smarter Than You

After the first night of training, I was excited to play against my AI in the morning. I opened a match against it and started playing. I was losing at the start but was starting to make significant gains when all of I sudden I see “Match Ended in Draw” — What? I look into the code and I see that there is a bug that draws the match given that the AI chooses a certain option under certain conditions, a very specific bug. After fixing the bug and retraining for another 20 or so minutes, all indicators of AI skill crashed. Meaning the AI wasn’t learning how to play the game well, but learning when to exploit a bug to force a draw when it knew it was most likely going to lose.

It’s Done

After weeks of work, the AI has been trained. It was a grueling process and ML-Agents was not very kind. The newest versions of the library (the good stuff) are under the category of “Preview Packages” meaning that they are likely to have many bugs in them. I got to experience many of them. Yet the final result was worth it. 2 AIs were created where one is better than the other. The first AI is made for a simplified ruleset of the Royal Game of Ur where there are only 5 pieces and 3 dice. The second one is for the classical ruleset with 7 pieces and 5 dice. The complexity was somewhat challenging for the AI, but it’s still a decent bot.

Release

Let’s play The Lost Game Of Ur

What I had made at this point has been a video game with a strange focus. Rather than focusing on graphics or many features (what features can you add to a 4000-year-old board game anyway?), it’s a video game that focuses on having a difficult opponent to fight against. If you’re interested in playing against the AI, you can purchase the game on Steam for about $3 called The Lost Game of Ur.

I’ve received some free giveaway keys from Steam. As thanks for reading the whole article - if you’re interested in the game I will be giving out 5 free copies. In order to receive a key please leave a comment with your email address or your Steam name/Steam profile link.

--

--