Playing Street Fighter with DQN

Overview


State,Reward and Action are the core elements in reinforcement learning. So when considering playing streetfighter by DQN, the first coming question is how to receive game state and how to control the player. If we can get access to the game’s inner variables like players’ blood, action,dead or live, etc, it’s really clean and friendly to use. But to implement the idea as soon as possible I choose to use a raw method(screen_grab and keyboard/mouse event) to save time.

Here is the game.

How to get game state?


The following code aims to get a quick grab of the game screen.

1
2
3
4
5
6
7
8
import numpy as np
from PIL import Image, ImageGrab
g_game_box = (142,124,911,487) #(x1,y1,w,h)
# Get current game image
def get_screen():
im = np.array(ImageGrab.grab((g_game_box[0],g_game_box[1], g_game_box[0]+g_game_box[2]-1,g_game_box[1]+g_game_box[3]-1)))
return im

Here the g_game_box is the meaningful game region.When called without arguments, ImageGrab.grabwill return the WHOLE SCREEN REGION. Basically everytime you open a new game, it will appear at the same cordinates, So I set the box fixed to (142,124,911,487). You must modify it on your computer since it very likely changes.

The im we get is the state at step t, more specifically, the observation of step t. Besides the screen image we need to know about the players’ blood.

1
2
3
4
5
6
7
8
9
10
# Check player blood
def check_player_blood(img, box):
img = img[box[1]:(box[1]+box[3]), box[0]:(box[0]+box[2]), 1]
width, height = img.shape
blood_points = 0
for x in range(width):
for y in range(height):
if(img[x,y] > 128): # blood point
blood_points += 1
return blood_points*100/(width*height)

This time the box is the players’ blood bar region(i.e, the blue box in the following image):

check_player_blood receives the game image and certain player’s, i.e, your or enemy’s, blood bar box, and returns his blood left between 0~100.

Dependency


Lasagne and Theano

CUDA

To install Lasagne on window, you can refer to my another blog “Lasagne On Win7 With GPU Environment & VizDoom Compilation” written in Chinese.

How to control the player?


For the host player, you can use W/D/A/S to control directions and J/K/U/I for fist and leg. Python can simulate the keyborad and mouse operations by win32api and win32con.

Keyboard simulation
Below is used keys’ virtual value:

1
2
3
4
5
6
7
8
g_available_keys = (65,# A
87,# W
83,# S
68,# D
74,# J
75,# K
85,# U
73)# I

We can give player commands by following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import win32api,win32con
g_single_frame_time = 0.02
frame_repeat = 4
# Action = which buttons are pressed
n = len(g_available_keys)
actions = [list(a) for a in it.product([0, 1], repeat=n)]
# Key push
def key_do(keys, frame_repeat):
for key in keys:
win32api.keybd_event(key,0,0,0)
sleep(frame_repeat*g_single_frame_time)
for key in keys:
win32api.keybd_event(key,0,win32con.KEYEVENTF_KEYUP,0)
# Make actions
def make_action(action, frame_repeat):
keys = []
for i in range(len(action)):
if(action[i]):
keys = keys + [g_available_keys[i],]
key_do(keys,frame_repeat)

Then when we call make_action((1,0,0,0,0,0,0,0), 4), the player receive “keep back” command, while make_action((0,1,0,0,0,0,0,0), 4) means “jump”, make_action((1,1,0,0,0,0,0,0), 4) means “jump back”.

Mouse simulation
Although we do not use mouse to control the player, under some conditions when certain player is dead the game gets renewed, but the players’ blood bar will be wrong. I mean the bar region gets longer or shorter. If you keep observing the game progress for some time this will appear. I believe it’s some kind of bug so when this occurs you’d better close the game and reopen it. That’s why mouse simulation is needed, to reopen the game in Chrome.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def mousePos(cord):
win32api.SetCursorPos(cord)
def leftClick():
win32api.mouse_event(win32con.MOUSEEVENTF_LEFTDOWN,0,0)
sleep(.1)
win32api.mouse_event(win32con.MOUSEEVENTF_LEFTUP,0,0)
def doubleClick():
leftClick()
sleep(.1)
leftClick()
# Close and Open a new game
def close_game():
mousePos((1151,9))
leftClick()
def open_game():
mousePos((212,205))
doubleClick()
# New an game environment
def new_game():
time_looping = time()
while(True):
image = get_screen()
red_blood = check_player_blood(image, g_red_blood_box)
blue_blood = check_player_blood(image, g_blue_blood_box)
if(red_blood >= 100 and blue_blood >= 100):
break
time_wait = time() - time_looping
if(time_wait > 10):
close_game()
open_game()
mousePos((30,60))
leftClick()

(1151,9) is the while (212,205) is the index.html positon in the screen.leftClick on (30,60) is to give the chrome focus.

How to set reward?


I give the player living_reward = -1 and -self_blood_lose, 3*enemy_blood_lose, -200 for self death and 200 for enemy’s death. Here 3*enemy_blood_losemeans encouragement for attack.
The reward meachanism is REALLY IMPORTANT and more experments need to be done on this topic.

Training setting


Network architecture
Input228x122x3

Conv12x12x8@s=3,sigmoid

MaxPool2x2@s=1

Conv7x7x16@s=2,sigmoid

MaxPool2x2@s=1

33x13x16 = 6864 elements by estimation

Dense128@sigmoid

Dense256(output)

Parameters

1
2
3
4
5
6
7
8
g_learning_rate = 0.00025
g_discount_factor = 0.95
g_epochs = 100
g_learning_steps_per_epoch = 50000
g_replay_memory_size = 10000
g_batch_size = 64
g_test_episodes_per_epoch = 10
g_frame_repeat = 4

Others


To accelerate the training process, I commented the test in every epoch and only focus on training scores.
And also for training simplicity, the game backgroud is changed to white. If necessary you can set it by yourself. Just modify StreetFighter/images/g/behind.gif and front.gif.

Current Result


At the beginning I set epochs = 20 and learning_steps_per_epoch = 2000. And make following attempts:

  1. change gray input to color input — no convergence
  2. change frame_repeat to 2,4,12 — no convergence
  3. change reward:

    add death penalty — no convergence

    add living reward — no convergence

    give more reward on attack — no convergence

  4. change relu to sigmoid — no convergence
  5. add maxpooling layer — no convergence
  6. change the input size — no convergence
  7. change the keys to available actions
  8. try dueling dqn

And after ranging from these params, I think the problem is that the total train steps(minibatch weight update times) aren’t enough as the paper “Playing Atari with Deep Reinforcement Learning” points, it takes 100 epochs x 50000 minibatch updates.Comparing to this number my setting is too naive.So now I’m doing experiment on new setting.

REALLY EXPECT FOR A GOOD PERFORMANCE, OR AT LEAST CONVERGENCE!!!!

The Rainbow comes after the storm.

Reference


  1. Project
  2. Playing Atari with Deep Reinforcement Learning
  3. How to build a python bot that can play web games
  4. Street Fighter
  5. VizDoom
  6. virtual keys value

Comments: