RLPy

RLPy

The Reinforcement Learning Library for Education and Research

Table Of Contents

Previous topic

Domain

Next topic

Bicycle Balancing

This Page

Acrobot with Euler Integration

class rlpy.Domains.Acrobot.AcrobotLegacy[source]

Acrobot is a 2-link pendulum with only the second joint actuated. Initially, both links point downwards. The goal is to swing the end-effector to a height at least the length of one link above the base.

Both links can swing freely and can pass by each other, i.e., they don’t collide when they have the same angle.

STATE: The state consists of the two rotational joint angles and their velocities [theta1 theta2 thetaDot1 thetaDot2]. An angle of 0 corresponds to the respective link pointing downwards (angles are in world coordinates).

ACTIONS: The action is either applying +1, 0 or -1 torque on the joint between the two pendulum links.

Note

The dynamics equations were missing some terms in the NIPS paper which are present in the book. R. Sutton confirmed in personal correspondance that the experimental results shown in the paper and the book were generated with the equations shown in the book.

However, there is the option to run the domain with the paper equations by setting book_or_nips = ‘nips’

REFERENCE:

See also

R. Sutton: Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding (NIPS 1996)

See also

Sutton, Richard S., and Andrew G. Barto: Reinforcement learning: An introduction. Cambridge: MIT press, 1998.

init_randomization()

Any stochastic behavior in __init__() is broken out into this function so that if the random seed is later changed (eg, by the Experiment), other member variables and functions are updated accordingly.

loadRandomState()

Loads the random state stored in the self.random_state_backup

possibleActions(s=None)

The default version returns an enumeration of all actions [0, 1, 2...]. We suggest overriding this method in your domain, especially if not all actions are available from all states.

Parameters:s – The state to query for possible actions (overrides self.state if s != None)
Returns:A numpy array containing every possible action in the domain.

Note

These actions must be integers; internally they may be handled using other datatypes. See vec2id() and id2vec() for converting between integers and multidimensional quantities.

sampleStep(a, num_samples)

Sample a set number of next states and rewards from the domain. This function is used when state transitions are stochastic; deterministic transitions will yield an identical result regardless of num_samples, since repeatedly sampling a (state,action) pair will always yield the same tuple (r,ns,terminal). See step().

Parameters:
  • a – The action to attempt
  • num_samples – The number of next states and rewards to be sampled.
Returns:

A tuple of arrays ( S[], A[] ) where S is an array of next states, A is an array of rewards for those states.

saveRandomState()

Stores the state of the the random generator. Using loadRandomState this state can be loaded.

show(a=None, representation=None)

Shows a visualization of the current state of the domain and that of learning.

See showDomain() and showLearning(), both called by this method.

Note

Some domains override this function to allow an optional s parameter to be passed, which overrides the self.state internal to the domain; however, not all have this capability.

Parameters:
  • a – The action being performed
  • representation – The learned value function Representation.
showDomain(a=0)

Plot the 2 links + action arrows

showLearning(representation)

Abstract Method:

Shows a visualization of the current learning, usually in the form of a gridded value function and policy. It is thus really only possible for 1 or 2-state domains.

Parameters:representation – the learned value function Representation to generate the value function / policy plots.

Acrobot with Runge-Kutta Integration

class rlpy.Domains.Acrobot.Acrobot[source]

Acrobot is a 2-link pendulum with only the second joint actuated Intitially, both links point downwards. The goal is to swing the end-effector at a height at least the length of one link above the base.

Both links can swing freely and can pass by each other, i.e., they don’t collide when they have the same angle.

STATE: The state consists of the two rotational joint angles and their velocities [theta1 theta2 thetaDot1 thetaDot2]. An angle of 0 corresponds to corresponds to the respective link pointing downwards (angles are in world coordinates).

ACTIONS: The action is either applying +1, 0 or -1 torque on the joint between the two pendulum links.

Note

The dynamics equations were missing some terms in the NIPS paper which are present in the book. R. Sutton confirmed in personal correspondance that the experimental results shown in the paper and the book were generated with the equations shown in the book.

However, there is the option to run the domain with the paper equations by setting book_or_nips = ‘nips’

REFERENCE:

See also

R. Sutton: Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding (NIPS 1996)

See also

R. Sutton and A. G. Barto: Reinforcement learning: An introduction. Cambridge: MIT press, 1998.

Warning

This version of the domain uses the Runge-Kutta method for integrating the system dynamics and is more realistic, but also considerably harder than the original version which employs Euler integration, see the AcrobotLegacy class.

[m] position of the center of mass of link 1

[m] position of the center of mass of link 2

[kg] mass of link 1

[kg] mass of link 2

moments of inertia for both links

book_or_nips = 'book'

use dynamics equations from the nips paper or the book

showDomain(a=0)[source]

Plot the 2 links + action arrows