Reinforcement Agent

Published in

Analytics Vidhya

2 min readDec 7, 2020

In my previous articles, I have given a brief detail on reinforcement learning and some of its algorithms.

Today, I am going to tell you about some simple explanation about how reinforcement agent work in an environment that is never interact before.

Reinforcement agent when acting within an unknown environment, it learns the optimal behavior to achieve the value function to get the reward. The behavior of agent will be controlled by the policy. First of all, an agent link with input values. Then it changes representation of that data set. This will help to agent to identify hard input representation and it will help to go forward than initial input values.

There has working memory with every agent. Some of it can be divide into two buckets mainly. These memories consist with memory cells. The first bucket is perceptual buffer. It obtains original vector input. Attend buffer is the second one and it is empty at start. Finally, the third buffer keeps the copy of initial input and it is called as ignore buffer.

The agent handles these baskets in two ways. To get three copies of the inputs, the agent copies the initial input in bucket 2. Otherwise, the bucket is deleted by agent, because of that it will be able to get very clear two buckets. The reward function, which was discussed in my previous article, is not class label independent. To learn value function, reward function can be used. When considering classifying new(unseen) data set, at the beginning the agent is able to interact with it. Then it decides whether the intake of reward is positive or negative. By interacting with data set, the agent will learn negative or positive value for a pattern. For classifying new unseen pattern, this value can be used.

Reinforcement Agent

Written by Imalka Prasadini