The Future with Reinforcement Learning, Part 2: Comparisons and Applications

Intro

If you haven’t yet read the reinforcement learning primer go back and check it out first here. That article will provide you with the key concepts in reinforcement learning. Then you will be ready to fully compare the different types of machine learning.

Comparing Reinforcement Learning to Other Machine Learning Types

You may have heard about other types of machine learning ie: supervised learning, unsupervised learning, etc. Understanding how reinforcement learning (RL) differs from them is a good way to grasp the machine learning landscape.

Supervised Learning

The easiest type of ML to grasp is supervised learning. Supervised learning is learning with human labels. Image classification is a type of supervised learning. You have an algorithm and based on labeled images the system can classify the image as a cat or a dog. The algorithm learns from observing the training set and then can correctly infer the subject of an unknown image.

Unsupervised Learning

On the flip-side, we have unsupervised learning: Learning without labels. A good example of this is taking user purchase data and grouping your customers into categories with similar buying patterns. Your algorithm does the grouping and you can suggest products to people within a certain category. We do not tell the algorithm what a label or a category name is, we simply hand it a bunch of data and it creates groups based on patterns in the data. Unsupervised learning is also used extensively in visualizing a large amount of complex data. It makes it easier for a human to see all the information in one image.

Reinforcement Learning

Reinforcement learning is frequently described as falling somewhere in between supervised and unsupervised learning. There are time-delay labels (rewards), that are given to an algorithm as it learns to interact in an environment. An algorithm learns based on how the problem of learning is phrased. This is exactly what makes reinforcement learning excel at things like real-time decision making, video game AI, robot navigation, and other complex tasks. The key is giving the system the ability to understand which decisions are good and which ones are bad, based the current state of the environment.

Applying These Concepts

In the previous article, we covered the basic concepts of reinforcement learning . Here is a little summary of what we have covered so far in the form of a concrete example:

Goal: The mouse has a goal of maximizing the amount of cheese it obtains

Actions: The mouse can move in any of the four cardinal directions

Senses: The mouse can observe the state of the environment it is in (start, nothing, small cheese, two small cheese, big cheese, and death). For our simple example, only having a simple sense of the state of the environment is more than enough.

The policy: In any given the state, which of the four actions will our mouse take?

The reward signal: Positive (a cheese was obtained; but how big of a cheese?), neutral (nothing state was reached), or negative (death state has ended our game).

The value function: This is something that our mouse will construct and maintain on the fly. It may be adjusted through the course of an iteration or over many runs through the maze.

The model: If we allow our mouse to be aware of the size of its environment, it can store a model of it in its memory. We can represent the world as a 2D grid (array), allowing the mouse to fill in whether there is positive, negative, or no reward in a given grid square as it runs through and observes the actual environment

Let’s dissect a basic, greedy policy an agent might employ:

Phrasing Reinforcement Learning with Tasks

One of the major components to look at for an reinforcement learning application is how is the task structured. These are typically broken down into two categories: episodic or continuous.

1. Episodic Tasks

Episodic tasks have distinct start and end states. We can save these “episodes” and train on them “off-line.” A prime example would be our Mario levels from our previous article.

2. Continuous Tasks

Continuous tasks have no end. This could be like a decision-making algorithm that predicts when someone should buy or sell stocks in the stock market. This is always evolving and changing, with a lot of environmental factors. There are no clear starting and stopping states that would allow us to easily section off an episode to train on for fear of fitting our algorithm to fit too closely to a small segment of time.

When to Learn

Timing is critical in how an agent will perform on a task. Perhaps an agent should be learning at every frame of gameplay, or maybe the agent learns in episodes. We could employ a Monte Carlo strategy of cycling through the entire episode of learning and then get better and smarter with each iteration. These options have different tradeoffs and may or may not be feasible depending on the type of task our agent is trying to complete (a continuous task may never use the Monte Carlo strategy since it requires cycling through an episode for training, something that doesn’t even exist for a continuous task!).

Exploration vs. Exploitation

The exploration-versus-exploitation tradeoff is something that is quickly encountered when an agent explores an environment. If an agent finds out early on that if it does something simple, it will receive a small amount of reward, it will likely continue to do that simple thing over and over again, accumulating small rewards overtime. If it explores the unknown and tries to find new situations it may gain an even larger reward.

Approaches

This brings us to another significant factor in making a reinforcement learning application. Is it value-based or policy-based?

Policy Based Approach

We’ve mentioned before that an agent’s policy is how it makes decisions on what actions to take based on the current state of the environment. An RL agent with a policy-based approach to learning will try and learn a complex policy with a decision structure that allows it to try and take the optimal action in any given situation.

Value Based Approach

On the other end of the spectrum, we have out value-based RL applications. The value function is the current estimate of the long-term reward that our RL algorithm will accumulate. If we have a value-based agent, it will focus on optimizing based on that function. That includes focusing on learning better and better estimates for the long-term reward as well as taking greedy actions to maximize that function at any given time. In a lot of ways, we can think of this as an agent learning an implicit greedy policy for taking actions.

Actor-Critic Approach

The decision between a value-based and a policy-based algorithm is a significant one in deciding what an reinforcement learning algorithm will look like. The cross-section of these two lines of thinking is called the actor-critic approach. It features keeping track of estimated future reward earnings (our value function) as well as learning new, more complex policies to follow to get our agent larger rewards over longer time scales. It quickly becomes a much harder problem since the algorithm now optimizes two functions at once.

Conclusion

Over the last two articles, we have covered the basic terminology as well as some of the more complicated concepts around a reinforcement learning problem. Hopefully, with these two components, you feel that you have a good grasp on what reinforcement learning is and some of the considerations that go into writing an algorithm using it.

This blog post was written by Hunter Heidenreich. Thanks!

Visit Recast.AI, our collaborative Bot Platform & join us on Twitter, Facebook and LinkedIn :)

Chatbots Magazine

Chatbots, AI, NLP, Facebook Messenger, Slack, Telegram, and…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store