The Reinforcement Learning Library for Education and Research

Table Of Contents

Previous topic

Creating a New Representation

Next topic

Creating a New Policy

This Page

Creating a New Domain

This tutorial describes the standard RLPy Domain interface, and illustrates a brief example of creating a new problem domain.

The Domain controls the environment in which the Agent resides as well as the reward function the Agent is subject to.

The Agent interacts with the Domain in discrete timesteps called episodes (see step()). At each step, the Agent informs the Domain what indexed action it wants to perform. The Domain then calculates the effects this action has on the environment and updates its internal state accordingly. It also returns the new state (ns) to the agent, along with a reward/penalty, (r) and whether or not the episode is over (terminal), in which case the agent is reset to its initial state.

This process repeats until the Domain determines that the Agent has either completed its goal or failed. The Experiment controls this cycle.

Because Agents are designed to be agnostic to the Domain that they are acting within and the problem they are trying to solve, the Domain needs to completely describe everything related to the task. Therefore, the Domain must not only define the observations that the Agent receives, but also the states it can be in, the actions that it can perform, and the relationships between the three.


While each dimension of the state s is either continuous or discrete, discrete dimensions are assume to take nonnegative integer values (ie, the index of the discrete state).


You may want to review the namespace / inheritance / scoping rules in Python.


  • Each Domain must be a subclass of Domain and call the __init__() function of the Domain superclass.
  • Any randomization that occurs at object construction MUST occur in the init_randomization() function, which can be called by __init__().
  • Any random calls should use self.random_state, not random() or np.random(), as this will ensure consistent seeded results during experiments.
  • After your agent is complete, you should define a unit test to ensure future revisions do not alter behavior. See rlpy/tests/test_domains for some examples.

REQUIRED Instance Variables

The new Domain MUST set these variables BEFORE calling the superclass __init__() function:

  1. self.statespace_limits - Bounds on each dimension of the state space. Each row corresponds to one dimension and has two elements [min, max]. Used for discretization of continuous dimensions.
  2. self.continuous_dims - array of integers; each element is the index (eg, row in statespace_limits above) of a continuous-valued dimension. This array is empty if all states are discrete.
  3. self.DimNames - array of strings, a name corresponding to each dimension (eg one for each row in statespace_limits above)
  4. self.episodeCap - integer, maximum number of steps before an episode terminated (even if not in a terminal state).
  5. actions_num - integer, the total number of possible actions (ie, the size of the action space). This number MUST be a finite integer - continuous action spaces are not currently supported.
  6. discount_factor - float, the discount factor (gamma in literature) by which rewards are reduced.

REQUIRED Functions

  1. s0(), (see linked documentation), which returns a (possibly random) state in the domain, to be used at the start of an episode.
  2. step(), (see linked documentation), which returns the tuple (r,ns,terminal, pa) that results from taking action a from the current state (internal to the Domain).
    • r is the reward obtained during the transition
    • ns is the new state after the transition
    • terminal, a boolean, is true if the new state ns is a terminal one to end the episode
    • pa, an array of possible actions to take from the new state ns.

SPECIAL Functions

In many cases, the Domain will also override the functions:

  1. isTerminal() - returns a boolean whether or not the current (internal) state is terminal. Default is always return False.
  2. possibleActions() - returns an array of possible action indices, which often depend on the current state. Default is to enumerate every possible action, regardless of current state.

OPTIONAL Functions

Optionally, define / override the following functions, used for visualization:

  1. showDomain() - Visualization of domain based on current internal state and an action, a. Often the header will include an optional argument s to display instead of the current internal state. RLPy frequently uses matplotlib to accomplish this - see the example below.
  2. showLearning() - Visualization of the “learning” obtained so far on this domain, usually a value function plot and policy plot. See the introductory tutorial for an example on GridWorld

XX expectedStep(), XX

Additional Information

  • As always, the Domain can log messages using self.logger.info(<str>), see Python logger doc.
  • You should log values assigned to custom parameters when __init__() is called.
  • See Domain for functions provided by the superclass, especially before defining helper functions which might be redundant.

Example: Creating the ChainMDP Domain

In this example we will recreate the simple ChainMDP Domain, which consists of n states that can only transition to n-1 or n+1: s0 <-> s1 <-> ... <-> sn n The goal is to reach state sn from s0, after which the episode terminates. The agent can select from two actions: left [0] and right [1] (it never remains in same state). But the transitions are noisy, and the opposite of the desired action is taken instead with some probability. Note that the optimal policy is to always go right.

  1. Create a new file in the Domains/ directory, ChainMDPTut.py. Add the header block at the top:

    __copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy"
    __credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann",
                   "William Dabney", "Jonathan P. How"]
    __license__ = "BSD 3-Clause"
    __author__ = "Ray N. Forcement"
    from rlpy.Tools import plt, mpatches, fromAtoB
    from .Domain import Domain
    import numpy as np
  2. Declare the class, create needed members variables (here several objects to be used for visualization and a few domain reward parameters), and write a docstring description:

    class ChainMDPTut(Domain):
        Tutorial Domain - nearly identical to ChainMDP.py
        #: Reward for each timestep spent in the goal region
        GOAL_REWARD = 0
        #: Reward for each timestep
        STEP_REWARD = -1
        # Used for graphical normalization
        MAX_RETURN  = 1
        # Used for graphical normalization
        MIN_RETURN  = 0
        # Used for graphical shifting of arrows
        SHIFT       = .3
        #:Used for graphical radius of states
        RADIUS      = .5
        # Stores the graphical pathes for states so that we can later change their colors
        circles     = None
        #: Number of states in the chain
        chainSize   = 0
        # Y values used for drawing circles
        Y           = 1
  3. Copy the __init__ declaration from Domain.py, add needed parameters (here the number of states in the chain, chainSize), and log them. Assign self.statespace_limits, self.episodeCap, self.continuous_dims, self.DimNames, self.actions_num, and self.discount_factor. Then call the superclass constructor:

    def __init__(self, chainSize=2):
        :param chainSize: Number of states \'n\' in the chain.
        self.chainSize          = chainSize
        self.start              = 0
        self.goal               = chainSize - 1
        self.statespace_limits  = array([[0,chainSize-1]])
        self.episodeCap         = 2*chainSize
        self.continuous_dims    = []
        self.DimNames           = [`State`]
        self.actions_num        = 2
        self.discount_factor    = 0.9
  4. Copy the step() and function declaration and implement it accordingly to return the tuple (r,ns,isTerminal,possibleActions), and similarly for s0(). We want the agent to always start at state [0] to begin, and only achieves reward and terminates when s = [n-1]:

    def step(self,a):
        s = self.state[0]
        if a == 0: #left
            ns = max(0,s-1)
        if a == 1: #right
            ns = min(self.chainSize-1,s+1)
        self.state = array([ns])
        terminal = self.isTerminal()
        r = self.GOAL_REWARD if terminal else self.STEP_REWARD
        return r, ns, terminal, self.possibleActions()
    def s0(self):
        self.state = np.array([0])
        return self.state, self.isTerminal(), self.possibleActions()
  5. In accordance with the above termination condition, override the isTerminal() function by copying its declaration from Domain.py:

    def isTerminal(self):
        s = self.state
        return (s[0] == self.chainSize - 1)
  6. For debugging convenience, demonstration, and entertainment, create a domain visualization by overriding the default (which is to do nothing). With matplotlib, generally this involves first performing a check to see if the figure object needs to be created (and adding objects accordingly), otherwise merely updating existing plot objects based on the current self.state and action a:

    def showDomain(self, a = 0):
        #Draw the environment
        s = self.state
        s = s[0]
        if self.circles is None: # We need to draw the figure for the first time
           fig = pl.figure(1, (self.chainSize*2, 2))
           ax = fig.add_axes([0, 0, 1, 1], frameon=False, aspect=1.)
           ax.set_xlim(0, self.chainSize*2)
           ax.set_ylim(0, 2)
           ax.add_patch(mpatches.Circle((1+2*(self.chainSize-1), self.Y), self.RADIUS*1.1, fc="w")) #Make the last one double circle
           self.circles = [mpatches.Circle((1+2*i, self.Y), self.RADIUS, fc="w") for i in arange(self.chainSize)]
           for i in arange(self.chainSize):
               if i != self.chainSize-1:
                    fromAtoB(1+2*i+self.SHIFT,self.Y+self.SHIFT,1+2*(i+1)-self.SHIFT, self.Y+self.SHIFT)
                    if i != self.chainSize-2: fromAtoB(1+2*(i+1)-self.SHIFT,self.Y-self.SHIFT,1+2*i+self.SHIFT, self.Y-self.SHIFT, 'r')
        [p.set_facecolor('w') for p in self.circles]


When first creating a matplotlib figure, you must call pl.show(); when updating the figure on subsequent steps, use pl.draw().

That’s it! Now add your new Domain to Domains/__init__.py:

``from ChainMDPTut import ChainMDPTut``

Finally, create a unit test for your agent as described in Creating a Unit Test

Now test it by creating a simple settings file on the domain of your choice. An example experiment is given below:

#!/usr/bin/env python
Domain Tutorial for RLPy

Assumes you have created the ChainMDPTut.py domain according to the
tutorial and placed it in the Domains/ directory.
Tests the agent using SARSA with a tabular representation.
__author__ = "Robert H. Klein"
from rlpy.Domains import ChainMDPTut
from rlpy.Agents import SARSA
from rlpy.Representations import Tabular
from rlpy.Policies import eGreedy
from rlpy.Experiments import Experiment
import os
import logging

def make_experiment(exp_id=1, path="./Results/Tutorial/ChainMDPTut-SARSA"):
    Each file specifying an experimental setup should contain a
    make_experiment function which returns an instance of the Experiment
    class with everything set up.

    @param id: number used to seed the random number generators
    @param path: output directory where logs and results are stored
    opt = {}
    opt["exp_id"] = exp_id
    opt["path"] = path

    ## Domain:
    chainSize = 50
    domain = ChainMDPTut(chainSize=chainSize)
    opt["domain"] = domain

    ## Representation
    # discretization only needed for continuous state spaces, discarded otherwise
    representation  = Tabular(domain)

    ## Policy
    policy = eGreedy(representation, epsilon=0.2)

    ## Agent
    opt["agent"] = SARSA(representation=representation, policy=policy,
    opt["checks_per_policy"] = 100
    opt["max_steps"] = 2000
    opt["num_policy_checks"] = 10
    experiment = Experiment(**opt)
    return experiment

if __name__ == '__main__':
    experiment = make_experiment(1)
    experiment.run(visualize_steps=False,  # should each learning step be shown?
                   visualize_learning=True,  # show policy / value function?
                   visualize_performance=1)  # show performance runs?

What to do next?

In this Domain tutorial, we have seen how to

  • Write a Domain that inherits from the RLPy base Domain class
  • Override several base functions
  • Create a visualization
  • Add the Domain to RLPy and test it

Adding your component to RLPy

If you would like to add your component to RLPy, we recommend developing on the development version (see Development Version). Please use the following header at the top of each file:

__copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy"
__credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann",
                "William Dabney", "Jonathan P. How"]
__license__ = "BSD 3-Clause"
__author__ = "Tim Beaver"

Fill in the appropriate __author__ name and __credits__ as needed. Note that RLPy requires the BSD 3-Clause license.

  • If you installed RLPy in a writeable directory, the className of the new domain can be added to the __init__.py file in the Domains/ directory. (This allows other files to import the new domain).
  • If available, please include a link or reference to the publication associated with this implementation (and note differences, if any).

If you would like to add your new domain to the RLPy project, we recommend you branch the project and create a pull request to the RLPy repository.

You can also email the community list rlpy@mit.edu for comments or questions. To subscribe click here.