Agent-Environment Interface

The three basic interface of mushroom_rl are the Agent, the Environment and the Core interface.

  • The Agent is the basic interface for any Reinforcement Learning algorithm.

  • The Environment is the basic interface for every problem/task that the agent should solve.

  • The Core is a class used to control the interaction between an agent and an environment.

To implement serialization of MushroomRL data on the disk (load/save functionality) we also provide the Serializable interface. Finally, we provide the logging functionality with the Logger class.

Agent

MushroomRL provides the implementations of several algorithms belonging to all categories of RL:

  • value-based;

  • policy-search;

  • actor-critic.

One can easily implement customized algorithms following the structure of the already available ones, by extending the following interface:

class AgentInfo(is_episodic, policy_state_shape, backend)[source]

Bases: Serializable

__init__(is_episodic, policy_state_shape, backend)[source]
_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

class Agent(mdp_info, policy, is_episodic=False, backend='numpy')[source]

Bases: Serializable

This class implements the functions to manage the agent (e.g. move the agent following its policy).

__init__(mdp_info, policy, is_episodic=False, backend='numpy')[source]

Constructor.

Parameters:
  • mdp_info (MDPInfo) – information about the MDP;

  • policy (Policy) – the policy followed by the agent;

  • is_episodic (bool, False) – whether the agent is learning in an episodic fashion or not;

  • backend (str, 'numpy') – array backend to be used by the algorithm.

fit(dataset)[source]

Fit step.

Parameters:

dataset (Dataset) – the dataset.

draw_action(state, policy_state=None)[source]

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)[source]

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)[source]

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

stop()[source]

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

set_logger(logger)[source]

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

add_core_preprocessor(preprocessor)[source]

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_agent_preprocessor(preprocessor)[source]

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

_agent_preprocess(state)[source]

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_update_agent_preprocessor(state)[source]

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

Environment

MushroomRL provides several implementation of well known benchmarks with both continuous and discrete action spaces.

To implement a new environment, it is mandatory to use the following interface:

class MDPInfo(observation_space, action_space, gamma, horizon, dt=0.1, backend='numpy')[source]

Bases: Serializable

This class is used to store the information of the environment.

__init__(observation_space, action_space, gamma, horizon, dt=0.1, backend='numpy')[source]

Constructor.

Parameters:
  • observation_space ([Box, Discrete]) – the state space;

  • action_space ([Box, Discrete]) – the action space;

  • gamma (float) – the discount factor;

  • horizon (int) – the horizon;

  • dt (float, 1e-1) – the control timestep of the environment;

  • backend (str, 'numpy') – the type of data library used to generate state and actions.

property size

Returns: The sum of the number of discrete states and discrete actions. Only works for discrete spaces.

property shape

Returns: The concatenation of the shape tuple of the state and action spaces.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

class Environment(mdp_info)[source]

Bases: object

Basic interface used by any MushroomRL environment.

classmethod register()[source]

Register an environment in the environment list.

static list_registered()[source]

List registered environments.

Returns:

The list of the registered environments.

static make(env_name, *args, **kwargs)[source]

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:
  • env_name (str) – Name of the environment,

  • *args – positional arguments to be provided to the environment generator;

  • **kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

__init__(mdp_info)[source]

Constructor.

Parameters:

mdp_info (MDPInfo) – an object containing the info of the environment.

seed(seed)[source]

Set the seed of the environment.

Parameters:

seed (float) – the value of the seed.

reset(state=None)[source]

Reset the environment to the initial state.

Parameters:

state (np.ndarray, None) – the state to set to the current state.

Returns:

The initial state and a dictionary containing the info for the episode.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:

action (np.ndarray) – the action to execute.

Returns:

The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

render(record=False)[source]

Render the environment to screen.

Parameters:

record (bool, False) – whether the visualized image should be returned or not.

Returns:

The visualized image, or None if the record flag is set to false.

stop()[source]

Method used to stop an env. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

property info

Returns: An object containing the info of the environment.

static _bound(x, min_value, max_value)[source]

Method used to bound state and action variables.

Parameters:
  • x – the variable to bound;

  • min_value – the minimum value;

  • max_value – the maximum value;

Returns:

The bounded variable.

Core

class Core(agent, env, callbacks_fit=None, callback_step=None, record_dictionary=None)[source]

Bases: object

Implements the functions to run a generic algorithm.

__init__(agent, env, callbacks_fit=None, callback_step=None, record_dictionary=None)[source]

Constructor.

Parameters:
  • agent (Agent) – the agent moving according to a policy;

  • env (Environment) – the environment in which the agent moves;

  • callbacks_fit (list) – list of callbacks to execute at the end of each fit;

  • callback_step (Callback) – callback to execute after each step;

  • record_dictionary (dict, None) – a dictionary of parameters for the recording, must containt the recorder_class, fps, and optionally other keyword arguments to be passed to build the recorder class. By default, the VideoRecorder class is used and the environment action frequency as frames per second.

learn(n_steps=None, n_episodes=None, n_steps_per_fit=None, n_episodes_per_fit=None, render=False, record=False, quiet=False)[source]

This function moves the agent in the environment and fits the policy using the collected samples. The agent can be moved for a given number of steps or a given number of episodes and, independently from this choice, the policy can be fitted after a given number of steps or a given number of episodes. The environment is reset at the beginning of the learning process.

Parameters:
  • n_steps (int, None) – number of steps to move the agent;

  • n_episodes (int, None) – number of episodes to move the agent;

  • n_steps_per_fit (int, None) – number of steps between each fit of the policy;

  • n_episodes_per_fit (int, None) – number of episodes between each fit of the policy;

  • render (bool, False) – whether to render the environment or not;

  • record (bool, False) – whether to record a video of the environment or not. If True, also the render flag should be set to True.

  • quiet (bool, False) – whether to show the progress bar or not.

evaluate(initial_states=None, n_steps=None, n_episodes=None, render=False, quiet=False, record=False)[source]

This function moves the agent in the environment using its policy. The agent is moved for a provided number of steps, episodes, or from a set of initial states for the whole episode. The environment is reset at the beginning of the learning process.

Parameters:
  • initial_states (np.ndarray, None) – the starting states of each episode;

  • n_steps (int, None) – number of steps to move the agent;

  • n_episodes (int, None) – number of episodes to move the agent;

  • render (bool, False) – whether to render the environment or not;

  • quiet (bool, False) – whether to show the progress bar or not;

  • record (bool, False) – whether to record a video of the environment or not. If True, also the render flag should be set to True.

Returns:

The collected dataset.

_step(render, record)[source]

Single step.

Parameters:

render (bool) – whether to render or not.

Returns:

A tuple containing the previous state, the action sampled by the agent, the reward obtained, the reached state, the absorbing flag of the reached state and the last step flag.

_reset(initial_states)[source]

Reset the state of the agent.

_preprocess(state)[source]

Method to apply state preprocessors.

Parameters:

state (np.ndarray) – the state to be preprocessed.

Returns:

The preprocessed state.

_build_recorder_class(recorder_class=None, fps=None, **kwargs)[source]

Method to create a video recorder class.

Parameters:

recorder_class (class) – the class used to record the video. By default, we use the VideoRecorder class from mushroom. The class must implement the __call__ and stop methods.

Returns:

The recorder object.

Serialization

class Serializable[source]

Bases: object

Interface to implement serialization of a MushroomRL object. This provide load and save functionality to save the object in a zip file. It is possible to save the state of the agent with different levels of

save(path, full_save=False)[source]

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')[source]

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

classmethod load(path)[source]

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

copy()[source]
Returns:

A deepcopy of the agent.

_add_save_attr(**attr_dict)[source]

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()[source]

This method can be overwritten to implement logic that is executed after the loading of the agent.

__init__()

Logger

class Logger(log_name='', results_dir='./logs', log_console=False, use_timestamp=False, append=False, seed=None, **kwargs)[source]

Bases: DataLogger, ConsoleLogger

This class implements the logging functionality. It can be used to create automatically a log directory, save numpy data array and the current agent.

__init__(log_name='', results_dir='./logs', log_console=False, use_timestamp=False, append=False, seed=None, **kwargs)[source]

Constructor.

Parameters:
  • log_name (string, '') – name of the current experiment directory if not specified, the current timestamp is used.

  • results_dir (string, './logs') – name of the base logging directory. If set to None, no directory is created;

  • log_console (bool, False) – whether to log or not the console output;

  • use_timestamp (bool, False) – If true, adds the current timestamp to the folder name;

  • append (bool, False) – If true, the logger will append the new data logged to the one already existing in the directory;

  • seed (int, None) – seed for the current run. It can be optionally specified to add a seed suffix for each data file logged;

  • **kwargs – other parameters for ConsoleLogger class.

critical(msg)

Log a message with CRITICAL level

debug(msg)

Log a message with DEBUG level

epoch_info(epoch, **kwargs)

Log the epoch info with the format: Epoch <epoch number> | <label 1>: <data 1> <label 2> <data 2> …

Parameters:
  • epoch (int) – epoch number;

  • **kwargs – the labels and the data to be displayed.

error(msg)

Log a message with ERROR level

exception(msg)

Log a message with ERROR level. To be called only from an exception handler

info(msg)

Log a message with INFO level

log_agent(agent, epoch=None, full_save=False)

Log agent into the log folder.

Parameters:
  • agent (Agent) – The agent to be saved;

  • epoch (int, None) – optional epoch number to be added to the agent file currently saved;

  • full_save (bool, False) – whether to save the full data from the agent or not.

log_best_agent(agent, J, full_save=False)

Log the best agent so far into the log folder. The agent is logged only if the current performance is better than the performance of the previously stored agent.

Parameters:
  • agent (Agent) – The agent to be saved;

  • J (float) – The performance metric of the current agent;

  • full_save (bool, False) – whether to save the full data from the agent or not.

log_numpy(**kwargs)

Log scalars into numpy arrays.

Parameters:

**kwargs – set of named scalar values to be saved. The argument name will be used to identify the given quantity and as base file name.

log_numpy_array(**kwargs)

Log numpy arrays.

Parameters:

**kwargs – set of named arrays to be saved. The argument name will be used to identify the given quantity and as base file name.

property path

Property to return the path to the current logging directory

strong_line()

Log a line of #

warning(msg)

Log a message with WARNING level

weak_line()

Log a line of -

class ConsoleLogger(log_name, log_dir=None, suffix='', log_file_name=None, console_log_level=10, file_log_level=10)[source]

Bases: object

This class implements the console logging functionality. It can be used to log text into the console and optionally save a log file.

__init__(log_name, log_dir=None, suffix='', log_file_name=None, console_log_level=10, file_log_level=10)[source]

Constructor.

Parameters:
  • log_name (str, None) – Name of the current logger.

  • log_dir (Path, None) – path of the logging directory. If None, no the console output is not logged into a file;

  • suffix (int, None) – optional string to add a suffix to the logger id and to the data file logged;

  • log_file_name (str, None) – optional specifier for log file name, id is used by default;

  • console_log_level (int, logging.DEBUG) – logging level for console;

  • file_log_level (int, logging.DEBUG) – logging level for file.

debug(msg)[source]

Log a message with DEBUG level

info(msg)[source]

Log a message with INFO level

warning(msg)[source]

Log a message with WARNING level

error(msg)[source]

Log a message with ERROR level

critical(msg)[source]

Log a message with CRITICAL level

exception(msg)[source]

Log a message with ERROR level. To be called only from an exception handler

strong_line()[source]

Log a line of #

weak_line()[source]

Log a line of -

epoch_info(epoch, **kwargs)[source]

Log the epoch info with the format: Epoch <epoch number> | <label 1>: <data 1> <label 2> <data 2> …

Parameters:
  • epoch (int) – epoch number;

  • **kwargs – the labels and the data to be displayed.

class DataLogger(results_dir, suffix='', append=False)[source]

Bases: object

This class implements the data logging functionality. It can be used to create automatically a log directory, save numpy data array and the current agent.

__init__(results_dir, suffix='', append=False)[source]

Constructor.

Parameters:
  • results_dir (Path) – path of the logging directory;

  • suffix (string) – optional string to add a suffix to each data file logged;

  • append (bool, False) – If true, the logger will append the new data logged to the one already existing in the directory.

log_numpy(**kwargs)[source]

Log scalars into numpy arrays.

Parameters:

**kwargs – set of named scalar values to be saved. The argument name will be used to identify the given quantity and as base file name.

log_numpy_array(**kwargs)[source]

Log numpy arrays.

Parameters:

**kwargs – set of named arrays to be saved. The argument name will be used to identify the given quantity and as base file name.

log_agent(agent, epoch=None, full_save=False)[source]

Log agent into the log folder.

Parameters:
  • agent (Agent) – The agent to be saved;

  • epoch (int, None) – optional epoch number to be added to the agent file currently saved;

  • full_save (bool, False) – whether to save the full data from the agent or not.

log_best_agent(agent, J, full_save=False)[source]

Log the best agent so far into the log folder. The agent is logged only if the current performance is better than the performance of the previously stored agent.

Parameters:
  • agent (Agent) – The agent to be saved;

  • J (float) – The performance metric of the current agent;

  • full_save (bool, False) – whether to save the full data from the agent or not.

property path

Property to return the path to the current logging directory