My First Imitation Learning Model

Behavioral Cloning of the Atari breakout game

My First Imitation Learning Model

I decided to write about this because ..boy it was difficult doing it. Also I'd like to improve on it soon so this will hopefully help clear up the clutter in my head and I will have an easy ride when I pick it back up

I recently completed my first imitation learning model. It was a behavioral cloning model which was built for the atari breakout game. For it I had to do a lot of research and reading.

First of all here is the github link

Imitation learning is a technique in ML in which an agent learns from the recorded behavior of an expert or human and tries to replicate the behavior demonstrated by the expert. Behavioral cloning is a type of imitation learning that uses supervised learning to achieve the main purpose of imitation learning.

There were 3 key parts to the implementation of this problem:

  1. the expert data
  2. the environment
  3. the model


This is looking a bit like reinforcement learning, ...maybe but also, no, not really.

The data

The data for this project was obtained according to the process described in this article
The data consists of images of the game being played by the expert as well as the actions taken by the expert for each image. We do not need the reward in our case because we are performing behavioral cloning, a supervised learning approach to imitation learning.

Further Preprocessing

Further Pre-processing of the images in terms of converting them to grey-scale and then reshaping them is perfromed. The following code achieves this for us:


def process_obs():
    '''
    converts the images to grey scale
    '''

    obs_as_is = []
    gray_obs = []

    file_list = os.listdir(OBS_IMG_DIR)
    new_list = []
    for x in file_list:
        x = int(x.split('.')[0])
        new_list.append(x)
    the_list = [str(a) + '.png' for a in sorted(new_list)]
    for img in the_list:
        path = os.path.join(OBS_IMG_DIR, img)
        # convert to grayscale
        img_array_gray = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
        gray_obs.append(img_array_gray)
    return gray_obs

#reshapes the images
X_gray = process_obs()
X_gray = np.array(X_gray).reshape(-1, 84, 84, 1)
pickle_out = open("X_gray_full.pickle", "wb")
pickle.dump(X_gray, pickle_out)
pickle_out.close()

The environment

The environment here refers to the openAI gym environment, specifically its atari breakout game component. The expert data was obtained from playing the game and recording the actions as well as the corresponding image of game screen.
Subsequently after the model has been developed and the agent is to carry out what it has learnt from training, it does so on this environment.
You could either run the gym environment on locally or another option is to use it on colab. Local installation of the gym environment on different operating systems are fairly trivial.
Google colab provides an inbuilt OpenAI gym environment that can be accessed the following way:

  1. The installation and import
!pip install gym
!apt-get install python-opengl -y
!apt install xvfb -y
!pip install gym[atari]
!pip install pyvirtualdisplay
!pip install piglet

#sets up the virtual display for the game
from pyvirtualdisplay import Display
display = Display(visible=0, size=(1400, 900))
display.start()

#we need all these other modules because we're running it on colab
import gym
from gym import logger as gymlogger
from gym.wrappers import Monitor
gymlogger.set_level(40) # error only
import tensorflow as tf
import numpy as np
import random
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import math
import glob
import io
import base64
from IPython.display import HTML

from IPython import display as ipythondisplay
  1. The recording of the display screen
"""
Utility functions to enable video recording of gym environment and displaying it
To enable video, just do "env = wrap_env(env)""
"""

def show_video():
  mp4list = glob.glob('video/*.mp4')
  if len(mp4list) > 0:
    mp4 = mp4list[0]
    video = io.open(mp4, 'r+b').read()
    encoded = base64.b64encode(video)
    ipythondisplay.display(HTML(data='''<video alt="test" autoplay 
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
  else: 
    print("Could not find video")


def wrap_env(env):
  env = Monitor(env, './video', force=True)
  return env

The above two sections are provided to us by default by colab. A simple search for how to use gym on colab will bring up a colab file containing the above two sections (with the code provided).

The actual environment of the game

This is the only section required when working with gym locally. This is also the 3rd section required when working with it on colab. This is where the environment is actually called and used:

"""
where the model is used
"""
import gym
import matplotlib.pyplot as plt

env = gym.make("Breakout-v0")
env = wrap_env(env)

while True:
  env.render()

  # your agent goes here
  prediction = model.predict_classes(prepare(observation))
  observation, reward, done, info = env.step(prediction) 

  if done: 
    break;

env.close()
show_video()

For the breakout game, there are four possible actions that can be taken: start(fire ball), left, right, do nothing. These actions have been encoded with numbers from 0-3. Our model predicts the right actions for a given image /game screen and the agent executes the predicted action.

Helper functions to convert the game screen to gray-scale when the model is being used

#helper fucntionf for greyscale conversion
def grayConversion(image):
    grayValue = 0.07 * image[:,:,2] + 0.72 * image[:,:,1] + 0.21 * image[:,:,0]
    gray_img = grayValue.astype(np.float64)
    return gray_img

#prepares game current game screen to suit the type of the images the model was trained on
def prepare(obs):
    IMG_SIZE = 84
    # img_array = cv2.imread(filepath, cv2.IMREAD_GRAYSCALE)
    obs = obs/255
    gray_obs =  grayConversion(obs)
    print("shape of gray_obs{}".format(gray_obs.shape))
    new_obs = cv2.resize(gray_obs, (IMG_SIZE, IMG_SIZE))
    print("shape of new_obs{}".format(new_obs.shape))
    return new_obs.reshape(-1, IMG_SIZE, IMG_SIZE, 1)

The model

I decided to use the architecture described in deepmind's nature paper with 3 convolutional layers, 1 fully connected and 1 output layer.
The following code shows exactly what the architecture looks like:

#----------------------------------------#
# create model#
#----------------------------------------#

model = Sequential()
# conv 1
model.add(Conv2D(64, (3, 3), input_shape=X[0].shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# conv 2
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# conv 3
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# fc 1
model.add(Flatten())
model.add(Dense(64))
model.add(Dropout(0.25))
model.add(Activation('relu'))
# fc 2
model.add(Dense(64))
model.add(Activation('relu'))
#out==logits
model.add(Dense(4))
model.add(Activation('softmax'))


model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam', metrics=['accuracy'])

history = model.fit(x=X, y=y, batch_size=128, epochs=25,
          validation_split=0.2)

Notes to self:

  1. The entire dataset of over 500,000 examples could not be used due to limitations(i.e ram) of colab. so I restricted the examples used to just 100,000
  2. The model achieved an accuracy of about 65-70% for epoch of about 10
  3. Its unclear whether the architecture of the model is too dense or not for the data because not all of the data was used.
  4. The model has not learned. It tends to overfit beyond epoch of 5-8.