RLPy

RLPy

The Reinforcement Learning Library for Education and Research

Previous topic

The RLPy API

Next topic

Agent

This Page

Experiment

class rlpy.Experiments.Experiment(agent, domain, exp_id=1, max_steps=0, config_logging=True, num_policy_checks=10, log_interval=1, path='Results/Temp', checks_per_policy=1, stat_bins_per_state_dim=0, **kwargs)

The Experiment controls the training, testing, and evaluation of the agent. Reinforcement learning is based around the concept of training an Agent to solve a task, and later testing its ability to do so based on what it has learned. This cycle forms a loop that the experiment defines and controls. First the agent is repeatedly tasked with solving a problem determined by the Domain, restarting after some termination condition is reached. (The sequence of steps between terminations is known as an episode.)

Each time the Agent attempts to solve the task, it learns more about how to accomplish its goal. The experiment controls this loop of “training sessions”, iterating over each step in which the Agent and Domain interact. After a set number of training sessions defined by the experiment, the agent’s current policy is tested for its performance on the task. The experiment collects data on the agent’s performance and then puts the agent through more training sessions. After a set number of loops, training sessions followed by an evaluation, the experiment is complete and the gathered data is printed and saved. For each section, training and evaluation, the experiment determines whether or not the visualization of the step should generated.

The Experiment class is a base class that provides the basic framework for all RL experiments. It provides the methods and attributes that allow child classes to interact with the Agent and Domain classes within the RLPy library.

Note

All experiment implementations should inherit from this class.

Parameters:
  • agent – the Agent to use for learning the task.
  • domain – the problem Domain to learn
  • exp_id – ID of this experiment (main seed used for calls to np.rand)
  • max_steps – Total number of interactions (steps) before experiment termination.

Note

max_steps is distinct from episodeCap; episodeCap defines the the largest number of interactions which can occur in a single episode / trajectory, while max_steps limits the sum of all interactions over all episodes which can occur in an experiment.

Parameters:
  • num_policy_checks – Number of Performance Checks uniformly scattered along timesteps of the experiment
  • log_interval – Number of seconds between log prints to console
  • path – Path to the directory to be used for results storage (Results are stored in path/output_filename)
  • checks_per_policy – defines how many episodes should be run to estimate the performance of a single policy
compile_path(path)

An experiment path can be specified with placeholders. For example, Results/Temp/{domain}/{agent}/{representation}. This functions replaces the placeholders with actual values.

evaluate(total_steps, episode_number, visualize=0)

Evaluate the current agent within an experiment

Parameters:
  • total_steps – (int) number of steps used in learning so far
  • episode_number – (int) number of episodes used in learning so far
exp_id = 1

ID of the current experiment (main seed used for calls to np.rand)

load()

loads the experimental results from the results.txt file If the results could not be found, the function returns None and the results array otherwise.

mainSeed = 999999999

The Main Random Seed used to generate other random seeds (we use a different seed for each experiment id)

maxRuns = 1000

Maximum number of runs used for averaging, specified so that enough random seeds are generated

output_filename = ''

The name of the file used to store the data

performanceRun(total_steps, visualize=False)

Execute a single episode using the current policy to evaluate its performance. No exploration or learning is enabled.

Parameters:
  • total_steps – int maximum number of steps of the episode to peform
  • visualize – boolean, optional defines whether to show each step or not (if implemented by the domain)
plot(y='return', x='learning_steps', save=False)

Plots the performance of the experiment This function has only limited capabilities. For more advanced plotting of results consider Tools.Merger.Merger.

printAll()

prints all information about the experiment

result = None

A 2-d numpy array that stores all generated results.The purpose of a run is to fill this array. Size is stats_num x num_policy_checks.

run(visualize_performance=0, visualize_learning=False, visualize_steps=False, debug_on_sigurg=False)

Run the experiment and collect statistics / generate the results

Parameters:
  • visualize_performance – (int) determines whether a visualization of the steps taken in performance runs are shown. 0 means no visualization is shown. A value n > 0 means that only the first n performance runs for a specific policy are shown (i.e., for n < checks_per_policy, not all performance runs are shown)
  • visualize_learning – (boolean) show some visualization of the learning status before each performance evaluation (e.g. Value function)
  • visualize_steps – (boolean) visualize all steps taken during learning
  • debug_on_sigurg

    (boolean) if true, the ipdb debugger is opened when the python process receives a SIGURG signal. This allows to enter a debugger at any time, e.g. to view data interactively or actual debugging. The feature works only in Unix systems. The signal can be sent with the kill command:

    kill -URG pid

    where pid is the process id of the python interpreter running this function.

run_from_commandline()

wrapper around run method which automatically reads run parameters from command line arguments

save()

Saves the experimental results to the results.json file

seed_components()

set the initial seeds for all random number generators used during the experiment run based on the currently set exp_id.