MDPSolver is the base class for model based reinforcement learning agents and planners.
job_id (int): Job ID number used for running multiple jobs on a cluster.
representation (Representation): Representation used for the value function.
domain (Domain): Domain (MDP) to solve.
planning_time (int): Maximum amount of time in seconds allowed for planning. Defaults to inf (unlimited).
convergence_threshold (float): Threshold for determining if the value function has converged.
ns_samples (int): How many samples of the successor states to take.
project_path (str): Output path for saving the results of running the MDPSolver on a domain.
log_interval (int): Minimum number of seconds between displaying logged information.
show (bool): Enable visualization?
Applied Bellman Backup to state-action pair s,a i.e. Q(s,a) = E[r + discount_factor * V(s’)] If policy is given then Q(s,a) = E[r + discount_factor * Q(s’,pi(s’)]
Check to see if the representation is Tabular as Policy Iteration and Value Iteration only work with Tabular representation
Return matrices of S,A,NS,R,T where each row of each numpy 2d-array is a sample by following the current policy.
See Q_MC() and MC_episode()