Simulation of balancing a bicycle.
STATE: The state contains of 7 variables, 5 of which are observable.
[x_b: x-coordinate where the back tire touches the ground [m]] [y_b: y-coordinate where the back tire touches the ground [m]]
The state variables x_b and y_b are not observable.
ACTIONS:
i.e., 9 actions in total.
REFERENCE:
See also
Ernst, D., Geurts, P. & Wehenkel, L. Tree-Based Batch Mode Reinforcement Learning. Journal of Machine Learning Research (2005) Issue 6
Warning
This domain is tested only marginally, use with a care.
Frequency is 1 / dt.
Total episode duration is episodeCap * dt sec.
only update the graphs in showDomain every x steps