Agent-Environment Interface¶

The three basic interface of mushroom_rl are the Agent, the Environment and the Core interface.

The Agent is the basic interface for any Reinforcement Learning algorithm.
The Environment is the basic interface for every problem/task that the agent should solve.
The Core is a class used to control the interaction between an agent and an environment.

Agent¶

MushroomRL provides the implementations of several algorithms belonging to all categories of RL:

value-based;
policy-search;
actor-critic.

One can easily implement customized algorithms following the structure of the already available ones, by extending the following interface:

class mushroom_rl.algorithms.agent.Agent(mdp_info, policy, features=None)[source]¶

Bases: object

This class implements the functions to manage the agent (e.g. move the agent following its policy).

__init__(mdp_info, policy, features=None)[source]¶

Constructor.

Parameters:	mdp_info (MDPInfo) – information about the MDP; policy (Policy) – the policy followed by the agent; features (object, None) – features to extract from the state.

fit(dataset)[source]¶

Fit step.

Parameters:	dataset (list) – the dataset.

draw_action(state)[source]¶

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:	state (np.ndarray) – the state where the agent is.
Returns:	The action to be executed.

episode_start()[source]¶: Called by the agent when a new episode starts.

stop()[source]¶: Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

classmethod load(path)[source]¶

Load and deserialize the agent from the given location on disk.

Parameters:	path (string) – Relative or absolute path to the agents save location.
Returns:	The loaded agent.

save(path)[source]¶

Serialize and save the agent to the given path on disk.

Parameters:	path (string) – Relative or absolute path to the agents save location.

copy()[source]¶

Returns:	A deepcopy of the agent.

_add_save_attr(**attr_dict)[source]¶

Add attributes that should be saved for an agent.

Parameters:	attr_dict (dict) – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()[source]¶: This method can be overwritten to implement logic that is executed after the loading of the agent.

Environment¶

MushroomRL provides several implementation of well known benchmarks with both continuous and discrete action spaces.

To implement a new environment, it is mandatory to use the following interface:

class mushroom_rl.environments.environment.MDPInfo(observation_space, action_space, gamma, horizon)[source]¶

Bases: object

This class is used to store the information of the environment.

__init__(observation_space, action_space, gamma, horizon)[source]¶

Constructor.

Parameters:	observation_space ([Box, Discrete]) – the state space; action_space ([Box, Discrete]) – the action space; gamma (float) – the discount factor; horizon (int) – the horizon.

size¶

The sum of the number of discrete states and discrete actions. Only works for discrete spaces.

Type:	Returns

shape¶

The concatenation of the shape tuple of the state and action spaces.

Type:	Returns

class mushroom_rl.environments.environment.Environment(mdp_info)[source]¶

Bases: object

Basic interface used by any mushroom environment.

__init__(mdp_info)[source]¶

Constructor.

Parameters:	mdp_info (MDPInfo) – an object containing the info of the environment.

seed(seed)[source]¶

Set the seed of the environment.

Parameters:	seed (float) – the value of the seed.

reset(state=None)[source]¶

Reset the current state.

Parameters:	state (np.ndarray, None) – the state to set to the current state.
Returns:	The current state.

step(action)[source]¶

Move the agent from its current state according to the action.

Parameters:	action (np.ndarray) – the action to execute.
Returns:	The state reached by the agent executing `action` in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also an additional dictionary is returned (possibly empty).

stop()[source]¶: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

info¶

An object containing the info of the environment.

Type:	Returns

static _bound(x, min_value, max_value)[source]¶

Method used to bound state and action variables.

Parameters:	x – the variable to bound; min_value – the minimum value; max_value – the maximum value;
Returns:	The bounded variable.

Core¶

class mushroom_rl.core.core.Core(agent, mdp, callbacks_episode=None, callback_step=None, preprocessors=None)[source]¶

Bases: object

Implements the functions to run a generic algorithm.

__init__(agent, mdp, callbacks_episode=None, callback_step=None, preprocessors=None)[source]¶

Constructor.

Parameters:

agent (Agent) – the agent moving according to a policy;
mdp (Environment) – the environment in which the agent moves;
callbacks_episode (list) – list of callbacks to execute at the end of each learn iteration;
callback_step (Callback) – callback to execute after each step;
preprocessors (list) – list of state preprocessors to be applied to state variables before feeding them to the agent.

learn(n_steps=None, n_episodes=None, n_steps_per_fit=None, n_episodes_per_fit=None, render=False, quiet=False)[source]¶

This function moves the agent in the environment and fits the policy using the collected samples. The agent can be moved for a given number of steps or a given number of episodes and, independently from this choice, the policy can be fitted after a given number of steps or a given number of episodes. By default, the environment is reset.

Parameters:

n_steps (int, None) – number of steps to move the agent;
n_episodes (int, None) – number of episodes to move the agent;
n_steps_per_fit (int, None) – number of steps between each fit of the policy;
n_episodes_per_fit (int, None) – number of episodes between each fit of the policy;
render (bool, False) – whether to render the environment or not;
quiet (bool, False) – whether to show the progress bar or not.

evaluate(initial_states=None, n_steps=None, n_episodes=None, render=False, quiet=False)[source]¶

This function moves the agent in the environment using its policy. The agent is moved for a provided number of steps, episodes, or from a set of initial states for the whole episode. By default, the environment is reset.

Parameters:	initial_states (np.ndarray, None) – the starting states of each episode; n_steps (int, None) – number of steps to move the agent; n_episodes (int, None) – number of episodes to move the agent; render (bool, False) – whether to render the environment or not; quiet (bool, False) – whether to show the progress bar or not.

_step(render)[source]¶

Single step.

Parameters:	render (bool) – whether to render or not.
Returns:	A tuple containing the previous state, the action sampled by the agent, the reward obtained, the reached state, the absorbing flag of the reached state and the last step flag.

reset(initial_states=None)[source]¶: Reset the state of the agent.

_preprocess(state)[source]¶

Method to apply state preprocessors.

Parameters:	state (np.ndarray) – the state to be preprocessed.
Returns:	The preprocessed state.