RLPy

RLPy

The Reinforcement Learning Library for Education and Research

Table Of Contents

Previous topic

BlocksWorld

Next topic

Infinite Track CartPole: Balance Task

This Page

Finite Track CartPole: Balance Task

class rlpy.Domains.FiniteTrackCartPole.FiniteCartPoleBalance[source]

Goal

Reward 1 is received on each timestep spent within the goal region, zero elsewhere. This is also the terminal condition.

The bounds for failure match those used in the RL-Community (see Reference)

theta: [-12, 12] degrees –> [-pi/15, pi/15]

x: [-2.4, 2.4] meters

Pendulum starts straight up, theta = 0, with the cart at x = 0.

Reference

See RL-Library CartPole

Domain constants per RL Community / RL-Library CartPole implementation

discount_factor = 0.999

Discount factor

euler_int(df, x0, times)

performs Euler integration with interface similar to other methods.

Parameters:
  • df – TODO
  • x0 – initial state
  • times – times at which to estimate integration values

Warning

All but the final cell of times argument are ignored.

init_randomization()

Any stochastic behavior in __init__() is broken out into this function so that if the random seed is later changed (eg, by the Experiment), other member variables and functions are updated accordingly.

loadRandomState()

Loads the random state stored in the self.random_state_backup

possibleActions(s=None)

Returns an integer for each available action. Some child domains allow different numbers of actions.

sampleStep(a, num_samples)

Sample a set number of next states and rewards from the domain. This function is used when state transitions are stochastic; deterministic transitions will yield an identical result regardless of num_samples, since repeatedly sampling a (state,action) pair will always yield the same tuple (r,ns,terminal). See step().

Parameters:
  • a – The action to attempt
  • num_samples – The number of next states and rewards to be sampled.
Returns:

A tuple of arrays ( S[], A[] ) where S is an array of next states, A is an array of rewards for those states.

saveRandomState()

Stores the state of the the random generator. Using loadRandomState this state can be loaded.

show(a=None, representation=None)

Shows a visualization of the current state of the domain and that of learning.

See showDomain() and showLearning(), both called by this method.

Note

Some domains override this function to allow an optional s parameter to be passed, which overrides the self.state internal to the domain; however, not all have this capability.

Parameters:
  • a – The action being performed
  • representation – The learned value function Representation.
showDomain(a=0)

Display the 4-d state of the cartpole and arrow indicating current force action (not including noise!).

showLearning(representation)

xSlice and xDotSlice - the value of x and xDot respectively, associated with the plotted value function and policy (which are each 2-D grids across theta and thetaDot).

Finite Track CartPole: Balance Task (“Original”) per Sutton + Barto Definition

class rlpy.Domains.FiniteTrackCartPole.FiniteCartPoleBalanceOriginal(good_reward=0.0)[source]

Reference

Sutton, Richard S., and Andrew G. Barto: Reinforcement learning: An introduction. Cambridge: MIT press, 1998.

See Domains.FiniteTrackCartPole.FiniteCartPoleBalance

euler_int(df, x0, times)

performs Euler integration with interface similar to other methods.

Parameters:
  • df – TODO
  • x0 – initial state
  • times – times at which to estimate integration values

Warning

All but the final cell of times argument are ignored.

init_randomization()

Any stochastic behavior in __init__() is broken out into this function so that if the random seed is later changed (eg, by the Experiment), other member variables and functions are updated accordingly.

loadRandomState()

Loads the random state stored in the self.random_state_backup

possibleActions(s=None)

Returns an integer for each available action. Some child domains allow different numbers of actions.

sampleStep(a, num_samples)

Sample a set number of next states and rewards from the domain. This function is used when state transitions are stochastic; deterministic transitions will yield an identical result regardless of num_samples, since repeatedly sampling a (state,action) pair will always yield the same tuple (r,ns,terminal). See step().

Parameters:
  • a – The action to attempt
  • num_samples – The number of next states and rewards to be sampled.
Returns:

A tuple of arrays ( S[], A[] ) where S is an array of next states, A is an array of rewards for those states.

saveRandomState()

Stores the state of the the random generator. Using loadRandomState this state can be loaded.

show(a=None, representation=None)

Shows a visualization of the current state of the domain and that of learning.

See showDomain() and showLearning(), both called by this method.

Note

Some domains override this function to allow an optional s parameter to be passed, which overrides the self.state internal to the domain; however, not all have this capability.

Parameters:
  • a – The action being performed
  • representation – The learned value function Representation.
showDomain(a=0)

Display the 4-d state of the cartpole and arrow indicating current force action (not including noise!).

showLearning(representation)

xSlice and xDotSlice - the value of x and xDot respectively, associated with the plotted value function and policy (which are each 2-D grids across theta and thetaDot).

Finite Track CartPole: Balance Task (“Modern”) with 3 (not 2) possible actions

class rlpy.Domains.FiniteTrackCartPole.FiniteCartPoleBalanceModern[source]

A more realistic version of balancing with 3 actions (left, right, none) instead of the default (left, right), and nonzero, uniform noise in actions.

See Domains.FiniteTrackCartPole.FiniteCartPoleBalance.

Note that the start state has some noise.

AVAIL_FORCE = array([-10., 0., 10.])

Newtons, N - Force values available as actions (Note we add a 0-force action)

euler_int(df, x0, times)

performs Euler integration with interface similar to other methods.

Parameters:
  • df – TODO
  • x0 – initial state
  • times – times at which to estimate integration values

Warning

All but the final cell of times argument are ignored.

force_noise_max = 1.0

Newtons, N - Maximum noise possible, uniformly distributed

init_randomization()

Any stochastic behavior in __init__() is broken out into this function so that if the random seed is later changed (eg, by the Experiment), other member variables and functions are updated accordingly.

loadRandomState()

Loads the random state stored in the self.random_state_backup

possibleActions(s=None)

Returns an integer for each available action. Some child domains allow different numbers of actions.

sampleStep(a, num_samples)

Sample a set number of next states and rewards from the domain. This function is used when state transitions are stochastic; deterministic transitions will yield an identical result regardless of num_samples, since repeatedly sampling a (state,action) pair will always yield the same tuple (r,ns,terminal). See step().

Parameters:
  • a – The action to attempt
  • num_samples – The number of next states and rewards to be sampled.
Returns:

A tuple of arrays ( S[], A[] ) where S is an array of next states, A is an array of rewards for those states.

saveRandomState()

Stores the state of the the random generator. Using loadRandomState this state can be loaded.

show(a=None, representation=None)

Shows a visualization of the current state of the domain and that of learning.

See showDomain() and showLearning(), both called by this method.

Note

Some domains override this function to allow an optional s parameter to be passed, which overrides the self.state internal to the domain; however, not all have this capability.

Parameters:
  • a – The action being performed
  • representation – The learned value function Representation.
showDomain(a=0)

Display the 4-d state of the cartpole and arrow indicating current force action (not including noise!).

showLearning(representation)

xSlice and xDotSlice - the value of x and xDot respectively, associated with the plotted value function and policy (which are each 2-D grids across theta and thetaDot).

Finite Track CartPole: Swing-Up Task

class rlpy.Domains.FiniteTrackCartPole.FiniteCartPoleSwingUp[source]

Goal

Reward is 1 within the goal region, 0 elsewhere.

Pendulum starts straight down, theta = pi, with the cart at x = 0.

The objective is to get and then keep the pendulum in the goal region for as long as possible, with +1 reward for each step in which this condition is met; the expected optimum then is to swing the pendulum vertically and hold it there, collapsing the problem to InfCartPoleBalance but with much tighter bounds on the goal region.

See parent class Domains.FiniteTrackCartPole.FiniteTrackCartPole for more information.

ANGLE_LIMITS = [-3.141592653589793, 3.141592653589793]

Limit on pendulum angle (no termination, pendulum can make full cycle)

euler_int(df, x0, times)

performs Euler integration with interface similar to other methods.

Parameters:
  • df – TODO
  • x0 – initial state
  • times – times at which to estimate integration values

Warning

All but the final cell of times argument are ignored.

init_randomization()

Any stochastic behavior in __init__() is broken out into this function so that if the random seed is later changed (eg, by the Experiment), other member variables and functions are updated accordingly.

loadRandomState()

Loads the random state stored in the self.random_state_backup

possibleActions(s=None)

Returns an integer for each available action. Some child domains allow different numbers of actions.

sampleStep(a, num_samples)

Sample a set number of next states and rewards from the domain. This function is used when state transitions are stochastic; deterministic transitions will yield an identical result regardless of num_samples, since repeatedly sampling a (state,action) pair will always yield the same tuple (r,ns,terminal). See step().

Parameters:
  • a – The action to attempt
  • num_samples – The number of next states and rewards to be sampled.
Returns:

A tuple of arrays ( S[], A[] ) where S is an array of next states, A is an array of rewards for those states.

saveRandomState()

Stores the state of the the random generator. Using loadRandomState this state can be loaded.

show(a=None, representation=None)

Shows a visualization of the current state of the domain and that of learning.

See showDomain() and showLearning(), both called by this method.

Note

Some domains override this function to allow an optional s parameter to be passed, which overrides the self.state internal to the domain; however, not all have this capability.

Parameters:
  • a – The action being performed
  • representation – The learned value function Representation.
showDomain(a=0)

Display the 4-d state of the cartpole and arrow indicating current force action (not including noise!).

showLearning(representation)

xSlice and xDotSlice - the value of x and xDot respectively, associated with the plotted value function and policy (which are each 2-D grids across theta and thetaDot).

Finite Track CartPole: Swing-Up Task with Friction

class rlpy.Domains.FiniteTrackCartPole.FiniteCartPoleSwingUpFriction[source]

Modifies CartPole dynamics to include friction.

This domain is a child of Domains.FiniteTrackCartPole.FiniteCartPoleSwingUp.

ANGLE_LIMITS = [-3.141592653589793, 3.141592653589793]

Limit on pendulum angle (no termination, pendulum can make full cycle)

ANGULAR_RATE_LIMITS = [-3.0, 3.0]

Limits on pendulum rate

LENGTH = 0.6

meters, m - Physical length of the pendulum, meters (note the moment-arm lies at half this distance)

POSITION_LIMITS = [-2.4, 2.4]

m - Limits on cart position

VELOCITY_LIMITS = [-3.0, 3.0]

m/s - Limits on cart velocity

dt = 0.1

seconds, s - Time between steps

episodeCap = 400

Max number of steps per trajectory (reduced from default of 3000)

euler_int(df, x0, times)

performs Euler integration with interface similar to other methods.

Parameters:
  • df – TODO
  • x0 – initial state
  • times – times at which to estimate integration values

Warning

All but the final cell of times argument are ignored.

init_randomization()

Any stochastic behavior in __init__() is broken out into this function so that if the random seed is later changed (eg, by the Experiment), other member variables and functions are updated accordingly.

loadRandomState()

Loads the random state stored in the self.random_state_backup

possibleActions(s=None)

Returns an integer for each available action. Some child domains allow different numbers of actions.

sampleStep(a, num_samples)

Sample a set number of next states and rewards from the domain. This function is used when state transitions are stochastic; deterministic transitions will yield an identical result regardless of num_samples, since repeatedly sampling a (state,action) pair will always yield the same tuple (r,ns,terminal). See step().

Parameters:
  • a – The action to attempt
  • num_samples – The number of next states and rewards to be sampled.
Returns:

A tuple of arrays ( S[], A[] ) where S is an array of next states, A is an array of rewards for those states.

saveRandomState()

Stores the state of the the random generator. Using loadRandomState this state can be loaded.

show(a=None, representation=None)

Shows a visualization of the current state of the domain and that of learning.

See showDomain() and showLearning(), both called by this method.

Note

Some domains override this function to allow an optional s parameter to be passed, which overrides the self.state internal to the domain; however, not all have this capability.

Parameters:
  • a – The action being performed
  • representation – The learned value function Representation.
showDomain(a=0)

Display the 4-d state of the cartpole and arrow indicating current force action (not including noise!).

showLearning(representation)

xSlice and xDotSlice - the value of x and xDot respectively, associated with the plotted value function and policy (which are each 2-D grids across theta and thetaDot).