The Reinforcement Learning Library for Education and Research

Previous topic


Next topic


This Page


class rlpy.Agents.Agent.Agent(policy, representation, discount_factor, seed=1, **kwargs)[source]

Learning Agent for obtaining good policices.

The Agent receives observations from the Domain and incorporates their new information into the representation, policy, etc. as needed.

In a typical Experiment, the Agent interacts with the Domain in discrete timesteps. At each Experiment timestep the Agent receives some observations from the Domain which it uses to update the value function Representation of the Domain (ie, on each call to its learn() function). The Policy is used to select an action to perform. This process (observe, update, act) repeats until some goal or fail state, determined by the Domain, is reached. At this point the Experiment determines whether the agent starts over or has its current policy tested (without any exploration).

Agent is a base class that provides the basic framework for all RL Agents. It provides the methods and attributes that allow child classes to interact with the Domain, Representation, Policy, and Experiment classes within the RLPy library.


All new agent implementations should inherit from this class.


  • representation – the Representation to use in learning the value function.
  • policy – the Policy to use when selecting actions.
  • discount_factor – the discount factor of the optimal policy which should be learned
  • initial_learn_rate – Initial learning rate to use (where applicable)


initial_learn_rate should be set to 1 for automatic learning rate; otherwise, initial_learn_rate will act as a permanent upper-bound on learn_rate.

  • learn_rate_decay_mode – The learning rate decay mode (where applicable)
  • boyan_N0 – Initial Boyan rate parameter (when learn_rate_decay_mode=’boyan’)
discount_factor = None

discount factor determining the optimal policy

eligibility_trace = []

The eligibility trace, which marks states as eligible for a learning update. Used by ref Agents.SARSA.SARSA “SARSA” agent when the parameter lambda is set. See: http://www.incompleteideas.net/sutton/book/7/node1.html


This function adjusts all necessary elements of the agent at the end of the episodes.


Every agent must call this function at the end of the learning if the transition led to terminal state.

episode_count = 0

number of seen episodes


Any stochastic behavior in __init__() is broken out into this function so that if the random seed is later changed (eg, by the Experiment), other member variables and functions are updated accordingly.

learn(s, p_actions, a, r, ns, np_actions, na, terminal)[source]

This function receives observations of a single transition and learns from it.


Each inheriting class (Agent) must implement this method.

  • s – original state
  • p_actions – possible actions in the original state
  • a – action taken
  • r – obtained reward
  • ns – next state
  • np_actions – possible actions in the next state
  • na – action taken in the next state
  • terminal – boolean indicating whether next state (ns) is terminal
logger = None

A simple object that records the prints in a file

policy = None

The policy to be used by the agent