The Policy determines the discrete action that an Agent will take given its Representation.
The Agent learns about the Domain as the two interact. At each step, the Agent passes information about its current state to the Policy; the Policy uses this to decide what discrete action the Agent should perform next (see pi())
The Policy class is a base class that provides the basic framework for all policies. It provides the methods and attributes that allow child classes to interact with the Agent and Representation within the RLPy library.
Note
All new policy implementations should inherit from Policy.
Parameters: | representation – the Representation to use in learning the value function. |
---|
Any stochastic behavior in __init__() is broken out into this function so that if the random seed is later changed (eg, by the Experiment), other member variables and functions are updated accordingly.
Abstract Method: Select an action given a state.
param s: The current state param terminal: boolean, whether or not the s is a terminal state. param p_actions: a list / array of all possible actions in s.
Abstract Method: Turn off exploration (e.g., epsilon=0 in epsilon-greedy)
Abstract Method:
If turnOffExploration() was called previously, reverse its effects (e.g. restore epsilon to its previous, possibly nonzero, value).