RLPy

RLPy

The Reinforcement Learning Library for Education and Research

Table Of Contents

Previous topic

Finite Track CartPole: Balance Task

Next topic

Tools.run - Running Experiments in Batch

This Page

Infinite Track CartPole: Balance Task

class rlpy.Domains.InfiniteTrackCartPole.InfCartPoleBalance(episodeCap=3000)[source]

Goal:

Balance the Pendulum on the cart, not letting it fall below horizontal.

Reward:

Penalty of FELL_REWARD is received when pendulum falls below horizontal, zero otherwise.

Domain constants per 1Link implementation by Lagoudakis & Parr, 2003.

Warning

L+P’s rate limits [-2,2] are actually unphysically slow, and the pendulum saturates them frequently when falling; more realistic to use 2*pi.

ANGLE_LIMITS = [-1.5707963267948966, 1.5707963267948966]

Limit on theta (Note that this may affect your representation’s discretization)

ANGULAR_RATE_LIMITS = [-2.0, 2.0]

Limits on pendulum rate, per 1Link of Lagoudakis & Parr

FELL_REWARD = -1

Reward received when the pendulum falls below the horizontal

euler_int(df, x0, times)

performs Euler integration with interface similar to other methods.

Parameters:
  • df – TODO
  • x0 – initial state
  • times – times at which to estimate integration values

Warning

All but the final cell of times argument are ignored.

init_randomization()

Any stochastic behavior in __init__() is broken out into this function so that if the random seed is later changed (eg, by the Experiment), other member variables and functions are updated accordingly.

loadRandomState()

Loads the random state stored in the self.random_state_backup

possibleActions(s=None)

Returns an integer for each available action. Some child domains allow different numbers of actions.

sampleStep(a, num_samples)

Sample a set number of next states and rewards from the domain. This function is used when state transitions are stochastic; deterministic transitions will yield an identical result regardless of num_samples, since repeatedly sampling a (state,action) pair will always yield the same tuple (r,ns,terminal). See step().

Parameters:
  • a – The action to attempt
  • num_samples – The number of next states and rewards to be sampled.
Returns:

A tuple of arrays ( S[], A[] ) where S is an array of next states, A is an array of rewards for those states.

saveRandomState()

Stores the state of the the random generator. Using loadRandomState this state can be loaded.

show(a=None, representation=None)

Shows a visualization of the current state of the domain and that of learning.

See showDomain() and showLearning(), both called by this method.

Note

Some domains override this function to allow an optional s parameter to be passed, which overrides the self.state internal to the domain; however, not all have this capability.

Parameters:
  • a – The action being performed
  • representation – The learned value function Representation.
showDomain(a=0)

Display the 4-d state of the cartpole and arrow indicating current force action (not including noise!). Note that for 2-D systems the cartpole is still displayed, but appears static; see Domains.InfiniteTrackCartPole.InfTrackCartPole.

step(a)

Append arbitrary lateral position and velocity of cart: [0,0] and perform a step().

Infinite Track CartPole: Swing-Up Task

class rlpy.Domains.InfiniteTrackCartPole.InfCartPoleSwingUp[source]

Goal

Reward is 1 whenever theta is within GOAL_LIMITS, 0 elsewhere.

There is no terminal condition aside from episodeCap.

Pendulum starts straight down, theta = pi. The task is to swing it up, after which the problem reduces to Domains.InfiniteTrackCartPole.InfCartPoleBalance, though with (possibly) different domain constants defined below.

ANGLE_LIMITS = [-3.141592653589793, 3.141592653589793]

Limits on theta

ANGULAR_RATE_LIMITS = [-9.42477796076938, 9.42477796076938]

Limits on pendulum rate

GOAL_LIMITS = [-0.5235987755982988, 0.5235987755982988]

Goal region for reward

discount_factor = 0.9

Discount factor

episodeCap = 300

Max number of steps per trajectory

euler_int(df, x0, times)

performs Euler integration with interface similar to other methods.

Parameters:
  • df – TODO
  • x0 – initial state
  • times – times at which to estimate integration values

Warning

All but the final cell of times argument are ignored.

init_randomization()

Any stochastic behavior in __init__() is broken out into this function so that if the random seed is later changed (eg, by the Experiment), other member variables and functions are updated accordingly.

loadRandomState()

Loads the random state stored in the self.random_state_backup

possibleActions(s=None)

Returns an integer for each available action. Some child domains allow different numbers of actions.

s0()[source]

Returns the initial state: pendulum straight up and unmoving.

sampleStep(a, num_samples)

Sample a set number of next states and rewards from the domain. This function is used when state transitions are stochastic; deterministic transitions will yield an identical result regardless of num_samples, since repeatedly sampling a (state,action) pair will always yield the same tuple (r,ns,terminal). See step().

Parameters:
  • a – The action to attempt
  • num_samples – The number of next states and rewards to be sampled.
Returns:

A tuple of arrays ( S[], A[] ) where S is an array of next states, A is an array of rewards for those states.

saveRandomState()

Stores the state of the the random generator. Using loadRandomState this state can be loaded.

show(a=None, representation=None)

Shows a visualization of the current state of the domain and that of learning.

See showDomain() and showLearning(), both called by this method.

Note

Some domains override this function to allow an optional s parameter to be passed, which overrides the self.state internal to the domain; however, not all have this capability.

Parameters:
  • a – The action being performed
  • representation – The learned value function Representation.
showDomain(a=0)

Display the 4-d state of the cartpole and arrow indicating current force action (not including noise!). Note that for 2-D systems the cartpole is still displayed, but appears static; see Domains.InfiniteTrackCartPole.InfTrackCartPole.

step(a)

Append arbitrary lateral position and velocity of cart: [0,0] and perform a step().