The Reinforcement Learning Library for Education and Research

Previous topic

Acrobot with Euler Integration

Next topic


This Page

Bicycle Balancing

class rlpy.Domains.Bicycle.BicycleBalancing[source]

Simulation of balancing a bicycle.

STATE: The state contains of 7 variables, 5 of which are observable.

  • omega: angle from the vertical to the bicycle [rad]
  • omega dot: angular velocity for omega [rad / s]
  • theta: angle the handlebars are displaced from normal [rad]
  • theta dot: angular velocity for theta [rad / s]
  • psi: angle formed by bicycle frame and x-axis [rad]

[x_b: x-coordinate where the back tire touches the ground [m]] [y_b: y-coordinate where the back tire touches the ground [m]]

The state variables x_b and y_b are not observable.


  • T in {-2, 0, 2}: the torque applied to the handlebar
  • d in {-.02, 0, .02}: displacement of the rider

i.e., 9 actions in total.


See also

Ernst, D., Geurts, P. & Wehenkel, L. Tree-Based Batch Mode Reinforcement Learning. Journal of Machine Learning Research (2005) Issue 6


This domain is tested only marginally, use with a care.

dt = 0.01

Frequency is 1 / dt.

episodeCap = 50000

Total episode duration is episodeCap * dt sec.

showDomain(a=0, s=None)[source]

shows a live graph of each observable dimension

show_domain_every = 20

only update the graphs in showDomain every x steps