The Future with Reinforcement Learning, Part 3: Practical Reinforcement Learning
If you haven’t yet read Part 1 or Part 2 of our reinforcement learning series, you can check them out here and here. In Part 1, You will learn about key concepts of reinforcement learning that will help you better understand reinforcement learning. Part 2 will take you through comparisons and specific considerations for reinforcement learning algorithms.
In this article, we are going to celebrate what we’ve learned about reinforcement learning! We will take a look into some of the cool things people have done with reinforcement learning, some of the major obstacles that remain open challenges in reinforcement learning, and discuss some resources you could get started with if you wanted to start working with reinforcement learning yourself!
Cool RL Accomplishments
With all our reinforcement learning knowledge in hand, we now have a good basis for how RL works and some of the factors that developers must look at when deciding how to make their RL application. Let’s go through what kind of cool things has RL achieved.
RL Beats Humans at Dota 2
OpenAI has developed a set of five neural networks that have learned to coordinate with one another, defeating real humans at the RTS game, Dota 2. On August 5th, 2018, the team of five neural networks competed against real, human, competitive players and won two out of three games. A huge accomplishment in game AI!
RL for Hyperparameter Tuning
Google has developed an approach to hyperparameter tuning using reinforcement learning that they call AutoML. They set up a problem, and evolve new neural networks based on potential network mutations (actions) and get feedback in the form of new network performance.
Bonsai for Industrial Applications
Pit.ai to Understand Trading Strategies
Pit.ai is a cool group that is leveraging RL to better reason about and understand trading algorithms. They have a lofty mission of using RL to help replace humans from investment management to help to cut down costs.
DeepMind Reduces Cooling Costs
Using RL, Google’s DeepMind helped to reduce the cost to cool its data centers by 40%.
Obstacles in RL
There’s no denying that reinforcement learning can do a lot of cool things. It provides a new way of thinking about machine learning; it’s a different way to approach a machine learning problem.
That doesn’t mean it is the best way to approach every problem. Reinforcement learning can sometimes be the hardest way to solve a problem. We can best understand this by looking at some of the obstacles that deter applications from being built around RL.
Data
Data is critical for machine learning. Full stop. RL requires enormous amount of data to be functional. Think of our agent playing through Mario. It must play the game over and over again to learn how to do even the most basic tasks. Without all that gameplay data, our agent would never learn to play the game, let alone play the game well. This is an issue, particularly when data is hard to obtain.
Data is a big issue for all machine learning for sure. But where for supervised tasks, sometimes data is simply an input and label pair, RL tasks oftentimes require much more complex data in order to teach systems to do what we wish.
What is the goal
RL algorithms need to have goals. Since they are task-driven, they always need to strive towards that goal whether that’s to earn the most money trading or to beat the level as fast as it can. In complex tasks, the question of “what is the goal?” quickly becomes harder and harder to answer. If the objective is not properly thought out, an agent may gravitate towards doing something that you might not intend for it to do.
Think of a hypothetical algorithm placed in a robot that is tasked with keeping a human safe. Let’s say it runs a simulation and concludes that the best way to keep the human safe, is to eradicate all other human life and to sedate the human in question. That’s not at all what we wanted to begin with, but that’s what the algorithm calculated would best keep that person safe for as long as possible based on the way its goal, policy, and value function were defined. Therefore, goal definition is critical.
Complex tasks in sparse environments
This issue inherits from the worst-case scenarios of the last two. How do we take an agent that needs to learn to do something very complex in an environment where it rarely receives a reward signal? There are many approaches to solving this issue, such as creating a complex policy to handle complex tasks, or breaking complex tasks into smaller, more obvious tasks (see OpenAI with Dota 2 where they formulate small rewards agents can received that inherently result in the large reward that is desired). This is still a huge area of research.
Large number of states and actions
In Mario, the number of actions an agent can take is limited. In the real world, the number of actions an agent can take is infinite. So are the number of states of the environment that can be observed. How does an agent handle this? How can an algorithm mathematically represent this? These are huge questions, big areas of research, and critical things that need to be better understood to make complex agents that can interact in the real world.
By now, you may be thinking “Wow, there’s so many cool things that reinforcement learning can do and so many cool problems still left to solve. How can I get started?”
With that in mind, I pulled some resources that I think would be a great place to start learning RL:
- Reinforcement Learning: An Introduction — If you are up for some heavy reading, this is a good book to dive into to really break down the theoretical components behind reinforcement learning. It’s written by Richard Sutton and Andrew Barto (who have done a good deal of work in RL) and is really nice (I’m currently working through it myself).
- University College London’s Reinforcement Learning Course — This is a course (largely based on the previous book) that is good to work through. It features slides and video lectures too!
- UC Berkley — CS 294 — These are the course videos from UC Berkley’s course on reinforcement learning.
- Udacity’s Deep Reinforcement Learning Course — Feeling like you want to get more hands on? Do you learn better by doing? Then maybe trying out Udacity’s Deep Reinforcement Learning Course might be more your speed!
- Reinforcement Learning GitHub Repo — This repo has a collection of reinforcement learning algorithms implemented in Python. But more than that, it takes the book by Sutton and Barto as well as the UCL videos and combines them into a bit of a learning plan with some exercises to guide how you might approach using the two resources. If that sounds more like your speed, you should check it out!
Conclusion
It’s my belief that reinforcement learning is going to be the technique that brings forth a new revolution in machine learning, creating truly intelligent applications that use techniques from supervised and unsupervised learning to observe the environment that the agent is acting in. If reinforcement learning is in the future, it is going to be a bright future!