Tackling Reinforcement Learning with the Aurora OPU

Original article was published on Artificial Intelligence on Medium

This is the result of a short run using a latest-generation Aurora OPU on the LightOn Cloud.

Relationship with convolutions

A typical way of extracting features from images is using convolutional neural networks (CNN). Let’s say we use the “convolutional” part of the CNN, as not to spend time training the classification layers of such a network. A simple architecture yields similar results. However, processing the observations with CNNs consumes a lot more energy than the OPU for each input image [6].

We can improve the algorithm by combining both approaches: a rough reshaping of the input images to even out the useless details followed by a random projection bears the best results with little to no overhead. Or contrarily, we can make case-specific enhancements to the preprocessing to easily build a finer feature distiller over (actually, before) the generic framework that is the random projection.

Conclusion

Model-free episodic control is not the panacea of RL. It is actually a rather modest algorithm, but a good starting point to understand the success of neuro-inspired episodic control methods [7]. These have been shown to outperform deep RL algorithms at least in the first stage of learning (see Figure 3). We therefore have the opportunity to devise robust AIs (possibly with the aid of imitation learning to combine episodic control with another agent), leveraging the properties of RP and light-based computing to address one of deep learning’s major flaw: using a sample-efficient algorithm early on, such as episodic control, reduces the data-hunger often associated with pure-RL techniques, which in turn brings down the electricity bill.

Figure 3. Learning curves of Ms. Pacman for different RL algorithms (taken from [7]).

Have a look at the Github repository to see the implementation details and reproduce our results. LightOn supports research through the LightOn Cloud for Research program. This program allows you to get free credits to speed up your computations. Apply here!

About Us

LightOn is a hardware company that develops new optical processors that considerably speed up Machine Learning computation. LightOn’s processors open new horizons in computing and engineering fields that are facing computational limits. Interested in speeding your computations up? Try out our solution on LightOn Cloud! 🌈

Follow us on Twitter at @LightOnIO, subscribe to our newsletter and/or register to our workshop series. We live stream, so you can join from anywhere. 🌍

The author

Martin Graive, Intern in the Machine Learning Team at LightOn AI Research from July to December 2019.

Acknowledgements

Thanks to Victoire Louis and Iacopo Poli for reviewing this blog post.

References

[1] Strubell, Emma, Ananya Ganesh, and Andrew McCallum. “Energy and policy considerations for deep learning in NLP.arXiv preprint arXiv:1906.02243 (2019).

[2] Risi, Sebastian, and Mike Preuss. “From Chess and Atari to StarCraft and Beyond: How Game AI is Driving the World of AI.KI-Künstliche Intelligenz 34.1 (2020): 7–17.

[3] Schwartz, R., J. Dodge, and N. A. Smith. “Green AI.arXiv preprint arXiv:1907.10597 (2019).

[4] Blundell, Charles, et al. “Model-free episodic control.arXiv preprint arXiv:1606.04460 (2016).

[5] Johnson, William B., and Joram Lindenstrauss. “Extensions of Lipschitz mappings into a Hilbert space.Contemporary mathematics 26.189–206 (1984): 1.

[6] Lacoste, Alexandre, et al. “Quantifying the Carbon Emissions of Machine Learning.arXiv preprint arXiv:1910.09700 (2019).

[7] Pritzel, Alexander, et al. “Neural episodic control.Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.