Software Engineering, Architecture and AWS Serverless Technology from makit
Exploring AWS DeepRacer: I Spent the Money So You Didn't Have To - Part 1
June 4, 2023

Exploring AWS DeepRacer: I Spent the Money So You Didn't Have To - Part 1

Posted on June 4, 2023  •  4 minutes  • 711 words
Table of contents

Introduction

AWS DeepRacer is an intriguing concept. It’s a miniature autonomous vehicle that offers an introduction to the world of reinforcement learning, a branch of machine learning. You begin by training your own model in a virtual sandbox, tinkering with reward functions and hyperparameters. The real excitement comes with the DeepRacer League - an international competition where your model is tested. A blend of competition and learning, the DeepRacer serves as a unique, hands-on path into AI.

The issue with DeepRacer is the cost, it involves a lot of trial and error and naturally nobody wants to share too much specific information as that could make the competition more difficult for them!

Therefore I thought I would try some experiments, training on EC2 instances which train faster and at a reduced cost to the console. I luckily have credits to use so it comes at no actual cost.

Experiments

All the below were ran on the A to Z Speedway track (reInvent2019_wide_cw) in a clockwise direction. A world record pace for this track is around 7-8 seconds.

Experiment 1 - Pursuit Function and High Top Speed

Reward Function

def reward_function(params):

    if params["all_wheels_on_track"] and params["steps"] > 0:
        reward = ((params["progress"] / params["steps"]) * 100) + (params["speed"]**2)
    else:
        reward = 0.01
        
    return float(reward)

Hyperparameters

HyperparameterValue
Entropy0.01
Gradient descent batch size128
Learning rate0.0003
Discount factor0.995
Loss typehuber
Number of experience episodes between each policy-updating iteration25
Number of epochs10

Action Space

TypeContinuous
Speed1.1 : 4
Steering angle-30 : 30

Training Time

Ran for 3 hours, but on a large server so not equivalent to using the Deep Racer console.

Results

Final Evaluation Fastest Lap10.597
Final Evaluation Fastest Lap Off-track Number1
Final Evaluation Laps10.597, 14.401, 16.068
Final Evaluation Total Off-track3

Experiment 2 - Pursuit Function and Medium Top Speed

A brand new model, with all the same as above but the action space has a smaller top speed of 3 to see if that makes the car more stable and quicker at learning with less chance of coming off-track.

Action Space

TypeContinuous
Speed1.1 : 3
Steering angle-30 : 30

Training Time

Ran for 3 hours again.

Results

Final Evaluation Fastest Lap10.000
Final Evaluation Fastest Lap Off-track Number0
Final Evaluation Laps10.170, 10.000, 11.398
Final Evaluation Total Off-track0

Experiment 3 - Pushing the top speed

A clone of Experiment 2, meaning it is built on top of the model, rather than from scratch. Configuration was the same as above but the action space has a slightly faster top speed of 3.5 to see if that makes the car quicker but hopefully stays stable.

Action Space

TypeContinuous
Speed1.1 : 3.5
Steering angle-30 : 30

Training Time

Ran for 1 hour.

Results

Final Evaluation Fastest Lap09.257
Final Evaluation Fastest Lap Off-track Number0
Final Evaluation Laps09.257, 09.730, 10.730
Final Evaluation Total Off-track0

Conclusion

Training with a maximum of 3ms was a much healthier training session - it was learning right until the end, evaluating at 100% completion and started to level off around 8k reward, whereas the attempt with a maximum speed of 4ms struggled to get more than 5k reward and wasn’t managing to finish a lap during training or evaluation.

Overall this isn’t too surprising because the Reward Function rewards going as fast as possible, so it’ll always be trying to go at it’s top speed, and if that speed is too high then it’ll spin out a lot. The issue is that training too slow means it might be consistent but can it then be trained quicker later on so it can finish with a strong fast result? The numbers baked into the Neural Network might be too low to ever be useful - it’s potentially learned bad behaviours!

The third experiment showed this not to be the case though, after an hour of training Experiment 2 again but with a slightly faster top speed it managed to train in a healthy way and decrease the lap time without coming off the track during evaluation. When racing against a community circuit it would leave the track (only just) once per 3 lap race around 2/3 of the time though.

Follow me

If you are interested in Coding, AWS, Serverless or ML