RLPy

RLPy

The Reinforcement Learning Library for Education and Research

Table Of Contents

Previous topic

Infinite Track CartPole: Balance Task

This Page

Tools.run - Running Experiments in Batch

rlpy.Tools.run.get_finished_ids(path)[source]

returns all experiment ids for which the result file exists in the given directory

rlpy.Tools.run.prepare_directory(setting, path, **hyperparam)[source]

Creates a directory in path with a file for executing a given setting. The function returns the executable python script file

Parameters:
  • setting – filename which contains a make_experiment method that get the id and hyperparameters and returns an instance of Experiment ready to run
  • path – specifies where to create the directory
  • **hyperparam – all hyperparameters passed to the setting’s make_experiment()
Returns:

filename of the file to execute in path

rlpy.Tools.run.read_setting_content(filename)[source]

reads the file content without the __main__ section :param filename: filename where the settings are specified

rlpy.Tools.run.run(filename, location, ids, parallelization='sequential', force_rerun=False, block=True, n_jobs=-2, verbose=10, **hyperparam)[source]

run a file containing a RLPy experiment description (a make_experiment function) in batch mode. Note that the __main__ section of this file is ignored

Parameters:
  • filename – file to run
  • location – directory (does not need to exist), where all outputs and a copy of the file to execute is stored
  • ids – list of ids / seeds which should be executed
  • parallelization – either sequential (running the experiment on one core for each seed in sequence), joblib (run using multiple cores in parallel, no console ouput of the individual runs) or condor (submit jobs to a HTCondor job scheduling system
  • force_rerun – if False, seeds for which the results exists are not executed
  • block – if True, the function returns when all jobs are done
  • n_jobs – if parallelized with joblib, this specifies the number of cores to use specifying -1 means all cores, -2 means all but one cores
  • verbose – controls the amount of outputs
  • **hyperaram – hyperparameter values which are passed to make_experiment as keyword arguments.
rlpy.Tools.run.run_condor(fn, ids, force_rerun=False, block=False, verbose=10, poll_duration=30)[source]
rlpy.Tools.run.run_joblib(fn, ids, n_jobs=-2, verbose=10)[source]
rlpy.Tools.run.run_profiled(make_exp_fun, profile_location='Profiling', out='Test.pdf', **kwargs)[source]

run an experiment (without storing its results) and profiles the execution. A gprof file is created and a pdf with a graphical visualization of the most time-consuming functions in the experiment execution

Parameters:
  • make_exp_fun – function that returns an Experiment instance which is then executed. All remaining keyword parameters are passed to this function.
  • profile_location – directory used to store the profiling result files.
  • out – filename of the generated pdf file.
  • **kwargs – remaining parameters passed to the experiment generator function

Tools.results - Analyzing results

Parsing, extracting statistics and plotting of experimental results.

class rlpy.Tools.results.MultiExperimentResults(paths)[source]

provides tools to analyze, compare, load and plot results of several different experiments each stored in a separate path

loads the data in paths paths is a dictionary which maps labels to directories alternatively, paths is a list, then the path itself is considered as the label

plot_avg_sem(x, y, pad_x=False, pad_y=False, xbars=False, ybars=True, colors=None, markers=None, xerror_every=1, legend=True, **kwargs)[source]
plots quantity y over x (means and standard error of the mean).

The quantities are specified by their id strings, i.e. “return” or “learning steps”

pad_x, pad_y: if not enough observations are present for some results, should they be filled with the value of the last available obervation?

xbars, ybars: show standard error of the mean for the respective quantity colors: dictionary which maps experiment keys to colors.

markers: dictionary which maps experiment keys to markers.

xerror_exery: show horizontal error bars only every .. observation.

legend: (Boolean) show legend below plot.

Returns the figure handle of the created plot

rlpy.Tools.results.add_first_close_entries(results, new_label='95_time', x='time', y='return', min_rel_proximity=0.05)[source]

adds an entry to each result for the time required to get within 5% of the final quantity. returns nothing as the results are added in place

rlpy.Tools.results.avg_quantity(results, quantity, pad=False)[source]

returns the average and standard deviation and number of observations over all runs of a certain quantity. If pad is true, missing entries for runs with less entries are filled with the last value

rlpy.Tools.results.contains_results(path, min_num=1)[source]

determines whether a directory contains at least min_num results or not

rlpy.Tools.results.default_colors = ['b', 'g', 'r', 'c', 'm', 'y', 'k', 'purple']

default colors used for plotting

rlpy.Tools.results.default_labels = {'learning_steps': 'Learning Steps', 'return': 'Average Return', 'discounted_return': 'Discounted Return', 'learning_time': 'Computation Time'}

default labels for result quantities

rlpy.Tools.results.default_markers = ['o', 'v', '8', 's', 'p', '*', '<', 'h', '^', 'H', 'D', '>', 'd']

default markers used for plotting

rlpy.Tools.results.first_close_to_final(x, y, min_rel_proximity=0.05)[source]

returns the chronologically first value of x where y was close to min_rel_proximity (y[-1] - y[0]) of the final value of y, i.e., y[-1].

rlpy.Tools.results.get_all_result_paths(path, min_num=1)[source]

scan all subdirectories of a list of paths if they contain at least min_num results the list of paths with results are returned

rlpy.Tools.results.load_results(path)[source]

returns a dictionary with the results of each run of an experiment stored in path The keys are the seeds of the single runs

rlpy.Tools.results.load_single(filename)[source]

loads and returns a single experiment stored in filename returns None if file does not exist

rlpy.Tools.results.save_figure(figure, filename)[source]
rlpy.Tools.results.thousand_format_xaxis()[source]

set the xaxis labels to have a ...k format

Tools.hypersearch - Optimizing Hyperparameters

Functions to be used with hyperopt for doing hyper parameter optimization.

rlpy.Tools.hypersearch.find_hyperparameters(setting, path, space=None, max_evals=100, trials_per_point=30, parallelization='sequential', objective='max_reward', max_concurrent_jobs=100)[source]

This function does hyperparameter optimization for RLPy experiments with the hyperopt library. At the end an instance of the optimization trials is stored in “path”/trials.pck

Parameters:
  • setting – file specifying the experimental setup. It contains a make_experiment function and a dictionary named param_space if the argument space is not used. For each key of param_space there needs to be an optional argument in make_experiment
  • path – directory used to store all intermediate results.
  • space – (optional) an alternative specification of the hyperparameter space
  • max_evals – maximum number of evaluations of a single hyperparameter setting
  • trials_per_point – specifies the number of independent runs (with different seeds) of the experiment for evaluating a single hyperparameter setting.
  • parallelization – either sequential, joblib, condor_all or condor_full, condor. the condor options can be used in a computing cluster with a HTCondor machine. The joblib option parallelizes runs on one machine and sequential runs every experiment in sequence.
  • objective – (optional) string specifying the objective to optimize, possible values are max_reward, min_steps, max_steps
  • max_concurrent_jobs – only relevant for condor_full parallelization. specifies the maximum number of jobs that should run at the same time.
Returns:

a tuple containing the best hyperarameter settings and the hyperopt trials instance of the optimization procedure