Exploring AWS DeepRacer: I Spent the Money So You Didn't Have To - Part 1

Posted on June 4, 2023 • 4 minutes • 711 words

Table of contents

Introduction
Experiments
Conclusion

Introduction

AWS DeepRacer is an intriguing concept. It’s a miniature autonomous vehicle that offers an introduction to the world of reinforcement learning, a branch of machine learning. You begin by training your own model in a virtual sandbox, tinkering with reward functions and hyperparameters. The real excitement comes with the DeepRacer League - an international competition where your model is tested. A blend of competition and learning, the DeepRacer serves as a unique, hands-on path into AI.

The issue with DeepRacer is the cost, it involves a lot of trial and error and naturally nobody wants to share too much specific information as that could make the competition more difficult for them!

Therefore I thought I would try some experiments, training on EC2 instances which train faster and at a reduced cost to the console. I luckily have credits to use so it comes at no actual cost.

Experiments

All the below were ran on the A to Z Speedway track (reInvent2019_wide_cw) in a clockwise direction. A world record pace for this track is around 7-8 seconds.

Experiment 1 - Pursuit Function and High Top Speed

Reward Function

def reward_function(params):

    if params["all_wheels_on_track"] and params["steps"] > 0:
        reward = ((params["progress"] / params["steps"]) * 100) + (params["speed"]**2)
    else:
        reward = 0.01
        
    return float(reward)

Hyperparameters

Hyperparameter	Value
Entropy	0.01
Gradient descent batch size	128
Learning rate	0.0003
Discount factor	0.995
Loss type	huber
Number of experience episodes between each policy-updating iteration	25
Number of epochs	10

Action Space


Type	Continuous
Speed	1.1 : 4
Steering angle	-30 : 30

Training Time

Ran for 3 hours, but on a large server so not equivalent to using the Deep Racer console.

Results


Final Evaluation Fastest Lap	10.597
Final Evaluation Fastest Lap Off-track Number	1
Final Evaluation Laps	10.597, 14.401, 16.068
Final Evaluation Total Off-track	3

Experiment 2 - Pursuit Function and Medium Top Speed

A brand new model, with all the same as above but the action space has a smaller top speed of 3 to see if that makes the car more stable and quicker at learning with less chance of coming off-track.

Action Space


Type	Continuous
Speed	1.1 : 3
Steering angle	-30 : 30

Training Time

Ran for 3 hours again.

Results


Final Evaluation Fastest Lap	10.000
Final Evaluation Fastest Lap Off-track Number	0
Final Evaluation Laps	10.170, 10.000, 11.398
Final Evaluation Total Off-track	0

Experiment 3 - Pushing the top speed

A clone of Experiment 2, meaning it is built on top of the model, rather than from scratch. Configuration was the same as above but the action space has a slightly faster top speed of 3.5 to see if that makes the car quicker but hopefully stays stable.

Action Space


Type	Continuous
Speed	1.1 : 3.5
Steering angle	-30 : 30

Training Time

Ran for 1 hour.

Results


Final Evaluation Fastest Lap	09.257
Final Evaluation Fastest Lap Off-track Number	0
Final Evaluation Laps	09.257, 09.730, 10.730
Final Evaluation Total Off-track	0

Conclusion

Training with a maximum of 3ms was a much healthier training session - it was learning right until the end, evaluating at 100% completion and started to level off around 8k reward, whereas the attempt with a maximum speed of 4ms struggled to get more than 5k reward and wasn’t managing to finish a lap during training or evaluation.

Overall this isn’t too surprising because the Reward Function rewards going as fast as possible, so it’ll always be trying to go at it’s top speed, and if that speed is too high then it’ll spin out a lot. The issue is that training too slow means it might be consistent but can it then be trained quicker later on so it can finish with a strong fast result? The numbers baked into the Neural Network might be too low to ever be useful - it’s potentially learned bad behaviours!

The third experiment showed this not to be the case though, after an hour of training Experiment 2 again but with a slightly faster top speed it managed to train in a healthy way and decrease the lap time without coming off the track during evaluation. When racing against a community circuit it would leave the track (only just) once per 3 lap race around 2/3 of the time though.