Do you know about Hide and Seek game, which we play in our childhood times!
Recently AI researchers from OpenAI developed Artificial Intelligence agents (bots) and trained them by letting play hide and seek with each other. The bots are categorized into two, namely Hiders (blue), Seekers (red).
Here are the Rules of the game that are predefined in their coding:
- Seeker(s) would count for some time (remains idle), in the meantime hider(s) would hide.
- Once the counter ends, the seeker would start searching for the hider.
- Upon finding the hider, the seeker has to chase down the hider who would be trying to flee upon being seen.
- The game provides several blocks (rectangular boxes) and ramps (triangular boxes), both the hiders and seekers can move them from one place to another inside a confined arena. But once a hider/seeker has locked the box/ramp at a certain position, it can’t be moved further.
- Here this follows Reinforcement learning, once seekers find hiders then seekers will be given credits and if seekers can’t find then hiders will be given credits.
Stage-1: At the initial learning phase, the seekers learned to chase the hider if it has seen any and the hider learned to run away on being seen.
Stage-2: However after playing millions of rounds, the hiders found a solution: They started creating shelters using the blocks (see below picture), this avoids the seekers to see or catch the hider.
Stage-3: After several failed attempts to break-in to the shelter created by hiders, the seekers eventually learn to use the ramps. They push the ramp beside the shelter and climb up in order to move inside the shelter (see below picture)
After having their shelter breached millions of times, the hiders also learned something new: they took away all the ramps with them into the shelter, they call this as ramp defense (see below)
Stage-4: The researchers believed that this would be the final strategy that the agents learned but then something weird happened. After several more rounds, the seekers discover that they can jump on the top of the boxes and surf them to the hider’s shelter, they call this as box surfing (see below picture)
Now the hiders need to learn something new and playing millions of more rounds they do so. In order to prevent the seekers to box surf, the hiders learned to lock as many boxes as they can before constructing their shelter (see below picture)
Phase-5 (Here comes the disturbing part): When it seems that there are no further moves left, the agents learned to exploit the environment in other unique ways:
- Some hider took a box and literally ran out of the arena!
- Some of them push the ramp out of play as the game starts.
- Few seekers learned that if they ran at the wall at just the right angle, they could launch themselves in the air over the walls to find the hiders.
The thing here is that none of the behavior shown by the agents was directly taught. They learned these behaviors themselves by playing the game for millions of times until they get rewarded, and they get the reward only when they win the game.
This study tells us a lot, it shows that AI learning by competition will be a much more feasible alternative than training under supervision. But on the flip side, it also tells that AI can learn to use sophisticated techniques when its only goal is to beat its opponent, it will behave in ways that we don’t expect, which gives us a glimpse of what AI can be!