Pommerman GIF

Playing Pommerman (GIF taken from the Pommerman website)

I recently came across Pommerman as one of eight competitions in the NIPS 2018 Competition Track.

I had previously done some work on Starcraft2 and thought it would be a good idea to brush up my RL knowledge and also learn more about multi-agent RL with Pommerman. Also, the top two agents in Pommerman will win a Titan V GPU.

Read the rules of Pommerman here!


Installing Pommerman

Installation is straightforward, just follow the instructions in the documentation or below, to install the pommerman library via pip:

$ git clone https://github.com/MultiAgentLearning/playground
$ cd playground
$ pip install -U .

Note the period .! Missed it out when I was trying to install it and wasted a few minutes of my life.

On a side note, this also means that if the organizers update the library, we just have to pull the latest repository and then run pip install -U . again.

After the installation, run a quick test by going into the examples folder and running the simple_ffa_run.py script.

$ cd examples
$ python3 simple_ffa_run.py

The game should appear and run with four bots, just like the GIF at the beginning of this post.


Agents

Here we take a look at the default agents that the library comes with.

BaseAgent

First, note that all agents (ie. any agents that we build) should inherit from this class, since it defines certain important methods that work with the game environment.

def __init__(self, character=characters.Bomber):
	self._character = character

def __getattr__(self, attr):
	return getattr(self._character, attr)

The __init__ and __getattr__ methods indicate that the BaseAgent class acts like a wrapper around the characters.Bomberman class.

This means that after init_agent is called in BaseAgent (by the environment), we will be able to access attributes of Bomberman, such as self.agent_id, self.ammo, self.can_kick etc. These will be helpful for determining the state of the agent. (These attributes are not provided by the observation dict returned from the environment. See Environment section below for more details.)

def act(self, obs, action_space):
	raise NotImplementedError()

Note to self: TIL about NotImplementedError

Your agent (that inherits BaseAgent) has to implement this method!

This method is named act but a better name is probably get_action, because we do not actually perform the action here. Instead, the environment calls this method for every agent (total of 4), to get their actions, before performing them all at once.

As such, this method takes as arguments obs and action_space and should return an action.

obs

This is a Python dict containing the state of the environment. The Pommerman description seems to be outdated, but I explain obs in detail here.

action_space

This is actually not used much, since there seems to be no illegal actions (moving into a wall is the same as not moving). For the curious, this is actually a gym.spaces.Discrete object. The most important thing to know is that there are 6 possible actions and there are no illegal actions at any time.

action

This just has to be an int in the range of 0 to 5:

  • 0 Stop
  • 1 Up
  • 2 Down
  • 3 Left
  • 4 Right
  • 5 Bomb

Note: 5 plants the bomb at the agent’s position. If you are rendering the environment, the bomb will not be shown when it is planted, until the agent moves away. The Can Kick ability will NOT kick the bomb while the agent is on top of the bomb.

RandomAgent

This is a good example of a bare-minimum agent. We just have to inherit from BaseAgent and implement the act method to return an int from the action space (0 to 5).

SimpleAgent

This is a baseline agent. After you can beat it, submit your agent to compete.

As suggested, this is a hand-engineered agent provided by the organizers as a baseline.

While it’s called SimpleAgent, I think quite a bit of thought was put into designing the strategy of this agent. That being said, it is not great and seems to suicide quite a bit.

As mentioned in the documentation:

The SimpleAgent is very useful as a barometer for your own efforts. Four SimpleAgents playing against each other have a win rate of ~18% each with the remaining ~28% of the time being a tie. Keep in mind that it can destroy itself. That can skew your own results if not properly understood.

This also means that having an agent do nothing but return 0 (stop) will have a positive probability of winning against three other SimpleAgents. I am not going to go into detail but it is definitely worth taking a look at the script to see what sort of strategy the agent uses.

TensorForceAgent

This agent uses the TensorForce library. I haven’t actually tried it out yet and the script mentioned this is a work-in-progress. It is also supposed to be used with the train_with_tensorforce.py script.

Note: the act method here returns None. This is because the action is predicted ‘externally’ in the train_with_tensorforce.py script. I will use a similar trick for implementing my agents.

PlayerAgent

This agent can be called for actual real-live humans to play Pommerman. Wow!

This can be done by replacing agent_list in the simple_ffa_run.py script with the following:

agent_list = [
    agents.SimpleAgent(),
    agents.PlayerAgent(agent_control="arrows"), # arrows to move, space to lay bomb
    agents.SimpleAgent(),
    agents.PlayerAgent(agent_control="wasd"), # W,A,S,D to move, E to lay bomb
]

There are two modes for two humans to duke it out. Although, I’ve tried it and I must say, it’s not going to win Game of the Year anytime.

This might be an interesting way for evaluating a trained agent. Also, there is a facinating documentation at the beginning of the script that discusses issues with human control.


That’s all for now. I will update with another post soon with an implementation of a custom agent.