https://cdn-images-1.medium.com/max/2600/0*yJBOBZqejYExYN20

Original Source Here

When you a baby, you had to learn a lot of things. How to talk, how to walk, how to use the washroom, and how to not throw a tantrum when you don’t get what you want at the store. All things we have to learn (except some don’t learn the last one, looking at you, Karen).

We also spend a large chunk of our lives, just learning things in school. Learning is an integral part of our lives, it helps up understand the world around us and also know what to do when.

Now when we think about machines or robots, we think of them as things we have to program every move into. If and else statements everywhere! But what if machines could learn, just like we do?

A Brief Explanation of Reinforcement Learning

Reinforcement learning is basically how humans learn. There’s an agent (the machine), the agent makes an action (the output) in the environment, and the environment returns a state (inputs) and rewards (positive or negative things to either reinforce or punish the agent).

This is similar to training a pet. If the pet does something you want, you reinforce that behaviour by giving it a treat. This causes it to keep doing the thing you want it to do.

Using reinforcement learning is extremely powerful since we do not need necessarily need to program exactly what it should do, we only need to program the things it needs to learn. This allows it to surpass humans at many things and can find novel ways to do things that we never knew about!

In this article specifically, I’ll be diving into how DQNs work!

DQNs

DQN stands for Deep Q-Networks, and is a type of RL algorithm. This algorithm combines regular RL with deep neural networks. It works by using a normal RL algorithm in combination with Q-Learning!

Diagrams of each algorithm, one being Q-Learning and another being Deep Q Learning. As you can tell, Deep Q Learning is much more complex and has more layers, making it “deep”.

Q-functions, Bellman equation, and Q-Learning

To first explain Q-Learning I have to explain the Q-function. The Q-function measures the expected outcome after factoring in the current state after performing an action. The best Q-function would return the maximal outcome (highest number) that is possible given it’s current state.

The Q-function acts like the decision maker in RL. You can think of it like your decision making process in your own brain! Given the current state (deciding whether or not to continue reading this article) should you do the action or continuing reading (+1 for knowledge) or stop reading (-1 for knowledge). The obvious choice would be to continue, and with RL they will always choose the action which delivers the highest outcome.

Richard Bellman, the creator of the Bellman equation

The optimal Q-funtion will follow the Bellman equation:

Let’s decipher what this actually means. s is the current state that the agent is in. a is the possible action that it can take. r is the reward which takes into account γ. γ is optained through the maximal reward of the next state (which are why the ‘ (primes) are there). The higher the Q-value, the better.

What does this have to do with Q-Learning? Q-Learning uses this Bellman equation iteratively like this:

This will be used to find the best Q-function and take actions that lead to the best results for the agent!

Deep Q-Learning

The way Q-learning works, the model would need to create a table with the inputs being the state and potential actions and outputs being the result. You can see why this would be extremely inefficient, it needs to calculate all of these possibilities, which takes a lot of computational power if you want results quickly.

This is like if you wanted to make a decision, but each time you needed to list out every single possible action and state along with the results then make a decision based off of that.

Instead, using Deep Q-Learning we can train an approximator, which is the neural network, to estimate what the Q-values are. We can do this by using θ as a parameter that provides an accurate enough value in the end.

Architecture of DQNs

This is an example of a DQN model architecture. You’d first have convolutional layers which are used as the inputs. The convolutional layers are there to extract features and pass on what is happening to the next layers. They essentially act as the eyes of the network. Next you have the dense layers which make the decision of what the agent should do. In DQNs, it will use Deep Q-Learning to determine the maximal Q-value and thus perform the action which will be most advantageous.

What Can We Do With RL?

Self Driving Cars

By using reinforcement learning we can train cars to learn how to drive on their own. The safest way to do this would be through a simulation (since it can’t hit or damage anything) and this would allow for unsupervised learning. However, simulations can only be so accurate so achieving autonomous cars through reinforcement learning would likely require a human in the driver’s seat to correct the car.

The way this would work is that the car would be the agent and the environment would be what’s around it. The car would receive information through cameras and sensors. The car would then need to use reinforcement learning to determine the best action to take, and for every good action it is reinforced to keep doing that (stopping at red lights) and for bad actions it would be disintevised to do it (going into other lanes).

Photo by Charlie Deets on Unsplash

Playing Games

One of the original uses for reinforcement learning was trying to get it to beat humans at games like chess and go. Deep Blue was the first RL model to beat a human at a game, and the game was chess. It played against Garry Kasparov, the best chess player at the time, and won in May 1997. Some other notable examples include AlphaGo, AlphaZero, and OpenAI Five.

Games and RL fit very nicely since all games have players (the agent) and a clear environment to work in. There are also clear reward mechanisms such as points or capturing/losing pieces.

Deep Blue playing against Garry Kasparov in 1997.

Robotics

Reinforcement learning is extremely useful for robotics, because if you think about it, it would be extremely difficult to program every single small action that a robot can make. Instead, by using RL, the robot can learn on it’s own how to do things like picking up an object.

This isn’t that complicated either to implement, you just need to give the robot agent a clear model of what’s around it, and reward mechanisms for how well it’s doing.

Robot arms learning how to manipulate and pick up objects of varying sizes.

Key Takeaways

Reinforcement learning, or RL for short, is extremely similar to how we learn
RL has 5main components, an agent, an environment, states, rewards, and actions
Q-funtions are used to figure out what is the maximal Q-value an agent can obtain given it’s current state and possible actions
Bellman equation is an intergral part to RL and is used to find the best Q-function
Q-Learning just uses the Bellman equation iteratively to continually find the best action for the agent
Q-Learning is not very efficient since it requires a lot of time and computational power to be able to calculate the possible results
Instead we can use Deep Q-Learning which uses a neural network to approximate the final value
DQN architectures generally consist of convolutional layers and dense layers
RL has many real world application including in autonomous vehicles, playing games, and robotics

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

via WordPress https://ramseyelbasheer.wordpress.com/2021/01/31/can-machines-learn-on-their-own/

Can Machines Learn on Their Own?