RLPy

RLPy

The Reinforcement Learning Library for Education and Research

Previous topic

Representation

Next topic

Domain

This Page

MDP Solvers

class rlpy.MDPSolvers.MDPSolver.MDPSolver(job_id, representation, domain, planning_time=inf, convergence_threshold=0.005, ns_samples=100, project_path='.', log_interval=5000, show=False)[source]

MDPSolver is the base class for model based reinforcement learning agents and planners.

Args:

job_id (int): Job ID number used for running multiple jobs on a cluster.

representation (Representation): Representation used for the value function.

domain (Domain): Domain (MDP) to solve.

planning_time (int): Maximum amount of time in seconds allowed for planning. Defaults to inf (unlimited).

convergence_threshold (float): Threshold for determining if the value function has converged.

ns_samples (int): How many samples of the successor states to take.

project_path (str): Output path for saving the results of running the MDPSolver on a domain.

log_interval (int): Minimum number of seconds between displaying logged information.

show (bool): Enable visualization?

BellmanBackup(s, a, ns_samples, policy=None)[source]

Applied Bellman Backup to state-action pair s,a i.e. Q(s,a) = E[r + discount_factor * V(s’)] If policy is given then Q(s,a) = E[r + discount_factor * Q(s’,pi(s’)]

Args:
s (ndarray): The current state a (int): The action taken in state s ns_samples(int): Number of next state samples to use. policy (Policy): Policy object to use for sampling actions.
IsTabularRepresentation()[source]

Check to see if the representation is Tabular as Policy Iteration and Value Iteration only work with Tabular representation

collectSamples(samples)[source]

Return matrices of S,A,NS,R,T where each row of each numpy 2d-array is a sample by following the current policy.

  • S: (#samples) x (# state space dimensions)
  • A: (#samples) x (1) int [we are storing actionIDs here, integers]
  • NS:(#samples) x (# state space dimensions)
  • R: (#samples) x (1) float
  • T: (#samples) x (1) bool

See Q_MC() and MC_episode()

hasTime()[source]

Return a boolean stating if there is time left for planning.

performanceRun()[source]

Set Exploration to zero and sample one episode from the domain.

solve()[source]

Solve the domain MDP.