Value-Based

TD

class SARSA(mdp_info, policy, learning_rate)[source]

Bases: TD

SARSA algorithm.

__init__(mdp_info, policy, learning_rate)[source]

Constructor.

Parameters:
  • approximator – the approximator to use to fit the Q-function;

  • learning_rate (Parameter) – the learning rate.

_update(state, action, reward, next_state, absorbing)[source]

Update the Q-table.

Parameters:
  • state (np.ndarray) – state;

  • action (np.ndarray) – action;

  • reward (np.ndarray) – reward;

  • next_state (np.ndarray) – next state;

  • absorbing (np.ndarray) – absorbing flag.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class SARSALambda(mdp_info, policy, learning_rate, lambda_coeff, trace='replacing')[source]

Bases: TD

The SARSA(lambda) algorithm for finite MDPs.

__init__(mdp_info, policy, learning_rate, lambda_coeff, trace='replacing')[source]

Constructor.

Parameters:
  • lambda_coeff ([float, Parameter]) – eligibility trace coefficient;

  • trace (str, 'replacing') – type of eligibility trace to use.

_update(state, action, reward, next_state, absorbing)[source]

Update the Q-table.

Parameters:
  • state (np.ndarray) – state;

  • action (np.ndarray) – action;

  • reward (np.ndarray) – reward;

  • next_state (np.ndarray) – next state;

  • absorbing (np.ndarray) – absorbing flag.

episode_start(initial_state, episode_info)[source]

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class ExpectedSARSA(mdp_info, policy, learning_rate)[source]

Bases: TD

Expected SARSA algorithm. “A theoretical and empirical analysis of Expected Sarsa”. Seijen H. V. et al.. 2009.

__init__(mdp_info, policy, learning_rate)[source]

Constructor.

Parameters:
  • approximator – the approximator to use to fit the Q-function;

  • learning_rate (Parameter) – the learning rate.

_update(state, action, reward, next_state, absorbing)[source]

Update the Q-table.

Parameters:
  • state (np.ndarray) – state;

  • action (np.ndarray) – action;

  • reward (np.ndarray) – reward;

  • next_state (np.ndarray) – next state;

  • absorbing (np.ndarray) – absorbing flag.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class QLearning(mdp_info, policy, learning_rate)[source]

Bases: TD

Q-Learning algorithm. “Learning from Delayed Rewards”. Watkins C.J.C.H.. 1989.

__init__(mdp_info, policy, learning_rate)[source]

Constructor.

Parameters:
  • approximator – the approximator to use to fit the Q-function;

  • learning_rate (Parameter) – the learning rate.

_update(state, action, reward, next_state, absorbing)[source]

Update the Q-table.

Parameters:
  • state (np.ndarray) – state;

  • action (np.ndarray) – action;

  • reward (np.ndarray) – reward;

  • next_state (np.ndarray) – next state;

  • absorbing (np.ndarray) – absorbing flag.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class QLambda(mdp_info, policy, learning_rate, lambda_coeff, trace='replacing')[source]

Bases: TD

Q(Lambda) algorithm. “Learning from Delayed Rewards”. Watkins C.J.C.H.. 1989.

__init__(mdp_info, policy, learning_rate, lambda_coeff, trace='replacing')[source]

Constructor.

Parameters:
  • lambda_coeff ([float, Parameter]) – eligibility trace coefficient;

  • trace (str, 'replacing') – type of eligibility trace to use.

_update(state, action, reward, next_state, absorbing)[source]

Update the Q-table.

Parameters:
  • state (np.ndarray) – state;

  • action (np.ndarray) – action;

  • reward (np.ndarray) – reward;

  • next_state (np.ndarray) – next state;

  • absorbing (np.ndarray) – absorbing flag.

episode_start(initial_state, episode_info)[source]

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class DoubleQLearning(mdp_info, policy, learning_rate)[source]

Bases: TD

Double Q-Learning algorithm. “Double Q-Learning”. Hasselt H. V.. 2010.

__init__(mdp_info, policy, learning_rate)[source]

Constructor.

Parameters:
  • approximator – the approximator to use to fit the Q-function;

  • learning_rate (Parameter) – the learning rate.

_update(state, action, reward, next_state, absorbing)[source]

Update the Q-table.

Parameters:
  • state (np.ndarray) – state;

  • action (np.ndarray) – action;

  • reward (np.ndarray) – reward;

  • next_state (np.ndarray) – next state;

  • absorbing (np.ndarray) – absorbing flag.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class SpeedyQLearning(mdp_info, policy, learning_rate)[source]

Bases: TD

Speedy Q-Learning algorithm. “Speedy Q-Learning”. Ghavamzadeh et. al.. 2011.

__init__(mdp_info, policy, learning_rate)[source]

Constructor.

Parameters:
  • approximator – the approximator to use to fit the Q-function;

  • learning_rate (Parameter) – the learning rate.

_update(state, action, reward, next_state, absorbing)[source]

Update the Q-table.

Parameters:
  • state (np.ndarray) – state;

  • action (np.ndarray) – action;

  • reward (np.ndarray) – reward;

  • next_state (np.ndarray) – next state;

  • absorbing (np.ndarray) – absorbing flag.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class RLearning(mdp_info, policy, learning_rate, beta)[source]

Bases: TD

R-Learning algorithm. “A Reinforcement Learning Method for Maximizing Undiscounted Rewards”. Schwartz A.. 1993.

__init__(mdp_info, policy, learning_rate, beta)[source]

Constructor.

Parameters:

beta ([float, Parameter]) – beta coefficient.

_update(state, action, reward, next_state, absorbing)[source]

Update the Q-table.

Parameters:
  • state (np.ndarray) – state;

  • action (np.ndarray) – action;

  • reward (np.ndarray) – reward;

  • next_state (np.ndarray) – next state;

  • absorbing (np.ndarray) – absorbing flag.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class WeightedQLearning(mdp_info, policy, learning_rate, sampling=True, precision=1000)[source]

Bases: TD

Weighted Q-Learning algorithm. “Estimating the Maximum Expected Value through Gaussian Approximation”. D’Eramo C. et. al.. 2016.

__init__(mdp_info, policy, learning_rate, sampling=True, precision=1000)[source]

Constructor.

Parameters:
  • sampling (bool, True) – use the approximated version to speed up the computation;

  • precision (int, 1000) – number of samples to use in the approximated version.

_update(state, action, reward, next_state, absorbing)[source]

Update the Q-table.

Parameters:
  • state (np.ndarray) – state;

  • action (np.ndarray) – action;

  • reward (np.ndarray) – reward;

  • next_state (np.ndarray) – next state;

  • absorbing (np.ndarray) – absorbing flag.

_next_q(next_state)[source]
Parameters:

next_state (np.ndarray) – the state where next action has to be evaluated.

Returns:

The weighted estimator value in next_state.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class MaxminQLearning(mdp_info, policy, learning_rate, n_tables)[source]

Bases: TD

Maxmin Q-Learning algorithm without replay memory. “Maxmin Q-learning: Controlling the Estimation Bias of Q-learning”. Lan Q. et al. 2019.

__init__(mdp_info, policy, learning_rate, n_tables)[source]

Constructor.

Parameters:

n_tables (int) – number of tables in the ensemble.

_update(state, action, reward, next_state, absorbing)[source]

Update the Q-table.

Parameters:
  • state (np.ndarray) – state;

  • action (np.ndarray) – action;

  • reward (np.ndarray) – reward;

  • next_state (np.ndarray) – next state;

  • absorbing (np.ndarray) – absorbing flag.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class RQLearning(mdp_info, policy, learning_rate, off_policy=False, beta=None, delta=None)[source]

Bases: TD

RQ-Learning algorithm. “Exploiting Structure and Uncertainty of Bellman Updates in Markov Decision Processes”. Tateo D. et al.. 2017.

__init__(mdp_info, policy, learning_rate, off_policy=False, beta=None, delta=None)[source]

Constructor.

Parameters:
  • off_policy (bool, False) – whether to use the off policy setting or the online one;

  • beta ([float, Parameter], None) – beta coefficient;

  • delta ([float, Parameter], None) – delta coefficient.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update(state, action, reward, next_state, absorbing)[source]

Update the Q-table.

Parameters:
  • state (np.ndarray) – state;

  • action (np.ndarray) – action;

  • reward (np.ndarray) – reward;

  • next_state (np.ndarray) – next state;

  • absorbing (np.ndarray) – absorbing flag.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

_next_q(next_state)[source]
Parameters:

next_state (np.ndarray) – the state where next action has to be evaluated.

Returns:

The weighted estimator value in ‘next_state’.

class SARSALambdaContinuous(mdp_info, policy, approximator, learning_rate, lambda_coeff, approximator_params=None)[source]

Bases: TD

Continuous version of SARSA(lambda) algorithm.

__init__(mdp_info, policy, approximator, learning_rate, lambda_coeff, approximator_params=None)[source]

Constructor.

Parameters:

lambda_coeff ([float, Parameter]) – eligibility trace coefficient.

_update(state, action, reward, next_state, absorbing)[source]

Update the Q-table.

Parameters:
  • state (np.ndarray) – state;

  • action (np.ndarray) – action;

  • reward (np.ndarray) – reward;

  • next_state (np.ndarray) – next state;

  • absorbing (np.ndarray) – absorbing flag.

episode_start(initial_state, episode_info)[source]

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class TrueOnlineSARSALambda(mdp_info, policy, learning_rate, lambda_coeff, approximator_params=None)[source]

Bases: TD

True Online SARSA(lambda) with linear function approximation. “True Online TD(lambda)”. Seijen H. V. et al.. 2014.

__init__(mdp_info, policy, learning_rate, lambda_coeff, approximator_params=None)[source]

Constructor.

Parameters:

lambda_coeff ([float, Parameter]) – eligibility trace coefficient.

_update(state, action, reward, next_state, absorbing)[source]

Update the Q-table.

Parameters:
  • state (np.ndarray) – state;

  • action (np.ndarray) – action;

  • reward (np.ndarray) – reward;

  • next_state (np.ndarray) – next state;

  • absorbing (np.ndarray) – absorbing flag.

episode_start(initial_state, episode_info)[source]

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

Batch TD

class FQI(mdp_info, policy, approximator, n_iterations, approximator_params=None, fit_params=None, quiet=False)[source]

Bases: BatchTD

Fitted Q-Iteration algorithm. “Tree-Based Batch Mode Reinforcement Learning”, Ernst D. et al.. 2005.

__init__(mdp_info, policy, approximator, n_iterations, approximator_params=None, fit_params=None, quiet=False)[source]

Constructor.

Parameters:
  • n_iterations ([int, Parameter]) – number of iterations to perform for training;

  • quiet (bool, False) – whether to show the progress bar or not.

fit(dataset)[source]

Fit step.

Parameters:

dataset (Dataset) – the dataset.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class DoubleFQI(mdp_info, policy, approximator, n_iterations, approximator_params=None, fit_params=None, quiet=False)[source]

Bases: FQI

Double Fitted Q-Iteration algorithm. “Estimating the Maximum Expected Value in Continuous Reinforcement Learning Problems”. D’Eramo C. et al.. 2017.

__init__(mdp_info, policy, approximator, n_iterations, approximator_params=None, fit_params=None, quiet=False)[source]

Constructor.

Parameters:
  • n_iterations ([int, Parameter]) – number of iterations to perform for training;

  • quiet (bool, False) – whether to show the progress bar or not.

fit(dataset)[source]

Fit step.

Parameters:

dataset (Dataset) – the dataset.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class BoostedFQI(mdp_info, policy, approximator, n_iterations, approximator_params=None, fit_params=None, quiet=False)[source]

Bases: FQI

Boosted Fitted Q-Iteration algorithm. “Boosted Fitted Q-Iteration”. Tosatto S. et al.. 2017.

__init__(mdp_info, policy, approximator, n_iterations, approximator_params=None, fit_params=None, quiet=False)[source]

Constructor.

Parameters:
  • n_iterations ([int, Parameter]) – number of iterations to perform for training;

  • quiet (bool, False) – whether to show the progress bar or not.

fit(dataset)[source]

Fit step.

Parameters:

dataset (Dataset) – the dataset.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class LSPI(mdp_info, policy, approximator_params=None, epsilon=0.01, fit_params=None)[source]

Bases: BatchTD

Least-Squares Policy Iteration algorithm. “Least-Squares Policy Iteration”. Lagoudakis M. G. and Parr R.. 2003.

__init__(mdp_info, policy, approximator_params=None, epsilon=0.01, fit_params=None)[source]

Constructor.

Parameters:

epsilon ([float, Parameter], 1e-2) – termination coefficient.

fit(dataset)[source]

Fit step.

Parameters:

dataset (Dataset) – the dataset.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

DQN

class AbstractDQN(mdp_info, policy, approximator, approximator_params, batch_size, target_update_frequency, replay_memory=None, initial_replay_size=500, max_replay_size=5000, fit_params=None, predict_params=None, clip_reward=False)[source]

Bases: Agent

__init__(mdp_info, policy, approximator, approximator_params, batch_size, target_update_frequency, replay_memory=None, initial_replay_size=500, max_replay_size=5000, fit_params=None, predict_params=None, clip_reward=False)[source]

Constructor.

Parameters:
  • approximator (object) – the approximator to use to fit the Q-function;

  • approximator_params (dict) – parameters of the approximator to build;

  • batch_size ([int, Parameter]) – the number of samples in a batch;

  • target_update_frequency (int) – the number of samples collected between each update of the target network;

  • replay_memory ([ReplayMemory, PrioritizedReplayMemory], None) – the object of the replay memory to use; if None, a default replay memory is created;

  • initial_replay_size (int) – the number of samples to collect before starting the learning;

  • max_replay_size (int) – the maximum number of samples in the replay memory;

  • fit_params (dict, None) – parameters of the fitting algorithm of the approximator;

  • predict_params (dict, None) – parameters for the prediction with the approximator;

  • clip_reward (bool, False) – whether to clip the reward or not.

fit(dataset)[source]

Fit step.

Parameters:

dataset (Dataset) – the dataset.

_update_target()[source]

Update the target network.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_next_q(next_state, absorbing)[source]
Parameters:
  • next_state (np.ndarray) – the states where next action has to be evaluated;

  • absorbing (np.ndarray) – the absorbing flag for the states in next_state.

Returns:

Maximum action-value for each state in next_state.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

_post_load()[source]

This method can be overwritten to implement logic that is executed after the loading of the agent.

set_logger(logger, loss_filename='loss_Q')[source]

Setter that can be used to pass a logger to the algorithm

Parameters:
  • logger (Logger) – the logger to be used by the algorithm;

  • loss_filename (str, 'loss_Q') – optional string to specify the loss filename.

class DQN(mdp_info, policy, approximator, approximator_params, batch_size, target_update_frequency, replay_memory=None, initial_replay_size=500, max_replay_size=5000, fit_params=None, predict_params=None, clip_reward=False)[source]

Bases: AbstractDQN

Deep Q-Network algorithm. “Human-Level Control Through Deep Reinforcement Learning”. Mnih V. et al.. 2015.

_next_q(next_state, absorbing)[source]
Parameters:
  • next_state (np.ndarray) – the states where next action has to be evaluated;

  • absorbing (np.ndarray) – the absorbing flag for the states in next_state.

Returns:

Maximum action-value for each state in next_state.

__init__(mdp_info, policy, approximator, approximator_params, batch_size, target_update_frequency, replay_memory=None, initial_replay_size=500, max_replay_size=5000, fit_params=None, predict_params=None, clip_reward=False)

Constructor.

Parameters:
  • approximator (object) – the approximator to use to fit the Q-function;

  • approximator_params (dict) – parameters of the approximator to build;

  • batch_size ([int, Parameter]) – the number of samples in a batch;

  • target_update_frequency (int) – the number of samples collected between each update of the target network;

  • replay_memory ([ReplayMemory, PrioritizedReplayMemory], None) – the object of the replay memory to use; if None, a default replay memory is created;

  • initial_replay_size (int) – the number of samples to collect before starting the learning;

  • max_replay_size (int) – the maximum number of samples in the replay memory;

  • fit_params (dict, None) – parameters of the fitting algorithm of the approximator;

  • predict_params (dict, None) – parameters for the prediction with the approximator;

  • clip_reward (bool, False) – whether to clip the reward or not.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

_update_target()

Update the target network.

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger, loss_filename='loss_Q')

Setter that can be used to pass a logger to the algorithm

Parameters:
  • logger (Logger) – the logger to be used by the algorithm;

  • loss_filename (str, 'loss_Q') – optional string to specify the loss filename.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class DoubleDQN(mdp_info, policy, approximator, approximator_params, batch_size, target_update_frequency, replay_memory=None, initial_replay_size=500, max_replay_size=5000, fit_params=None, predict_params=None, clip_reward=False)[source]

Bases: DQN

Double DQN algorithm. “Deep Reinforcement Learning with Double Q-Learning”. Hasselt H. V. et al.. 2016.

_next_q(next_state, absorbing)[source]
Parameters:
  • next_state (np.ndarray) – the states where next action has to be evaluated;

  • absorbing (np.ndarray) – the absorbing flag for the states in next_state.

Returns:

Maximum action-value for each state in next_state.

__init__(mdp_info, policy, approximator, approximator_params, batch_size, target_update_frequency, replay_memory=None, initial_replay_size=500, max_replay_size=5000, fit_params=None, predict_params=None, clip_reward=False)

Constructor.

Parameters:
  • approximator (object) – the approximator to use to fit the Q-function;

  • approximator_params (dict) – parameters of the approximator to build;

  • batch_size ([int, Parameter]) – the number of samples in a batch;

  • target_update_frequency (int) – the number of samples collected between each update of the target network;

  • replay_memory ([ReplayMemory, PrioritizedReplayMemory], None) – the object of the replay memory to use; if None, a default replay memory is created;

  • initial_replay_size (int) – the number of samples to collect before starting the learning;

  • max_replay_size (int) – the maximum number of samples in the replay memory;

  • fit_params (dict, None) – parameters of the fitting algorithm of the approximator;

  • predict_params (dict, None) – parameters for the prediction with the approximator;

  • clip_reward (bool, False) – whether to clip the reward or not.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

_update_target()

Update the target network.

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger, loss_filename='loss_Q')

Setter that can be used to pass a logger to the algorithm

Parameters:
  • logger (Logger) – the logger to be used by the algorithm;

  • loss_filename (str, 'loss_Q') – optional string to specify the loss filename.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class AveragedDQN(mdp_info, policy, approximator, n_approximators, **params)[source]

Bases: AbstractDQN

Averaged-DQN algorithm. “Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning”. Anschel O. et al.. 2017.

__init__(mdp_info, policy, approximator, n_approximators, **params)[source]

Constructor.

Parameters:

n_approximators (int) – the number of target approximators to store.

_update_target()[source]

Update the target network.

_next_q(next_state, absorbing)[source]
Parameters:
  • next_state (np.ndarray) – the states where next action has to be evaluated;

  • absorbing (np.ndarray) – the absorbing flag for the states in next_state.

Returns:

Maximum action-value for each state in next_state.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger, loss_filename='loss_Q')

Setter that can be used to pass a logger to the algorithm

Parameters:
  • logger (Logger) – the logger to be used by the algorithm;

  • loss_filename (str, 'loss_Q') – optional string to specify the loss filename.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class MaxminDQN(mdp_info, policy, approximator, n_approximators, **params)[source]

Bases: DQN

MaxminDQN algorithm. “Maxmin Q-learning: Controlling the Estimation Bias of Q-learning”. Lan Q. et al.. 2020.

__init__(mdp_info, policy, approximator, n_approximators, **params)[source]

Constructor.

Parameters:

n_approximators (int) – the number of approximators in the ensemble.

fit(dataset)[source]

Fit step.

Parameters:

dataset (Dataset) – the dataset.

_update_target()[source]

Update the target network.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_next_q(next_state, absorbing)
Parameters:
  • next_state (np.ndarray) – the states where next action has to be evaluated;

  • absorbing (np.ndarray) – the absorbing flag for the states in next_state.

Returns:

Maximum action-value for each state in next_state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger, loss_filename='loss_Q')

Setter that can be used to pass a logger to the algorithm

Parameters:
  • logger (Logger) – the logger to be used by the algorithm;

  • loss_filename (str, 'loss_Q') – optional string to specify the loss filename.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class DuelingDQN(mdp_info, policy, approximator_params, avg_advantage=True, **params)[source]

Bases: DQN

Dueling DQN algorithm. “Dueling Network Architectures for Deep Reinforcement Learning”. Wang Z. et al.. 2016.

__init__(mdp_info, policy, approximator_params, avg_advantage=True, **params)[source]

Constructor.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_next_q(next_state, absorbing)
Parameters:
  • next_state (np.ndarray) – the states where next action has to be evaluated;

  • absorbing (np.ndarray) – the absorbing flag for the states in next_state.

Returns:

Maximum action-value for each state in next_state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

_update_target()

Update the target network.

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger, loss_filename='loss_Q')

Setter that can be used to pass a logger to the algorithm

Parameters:
  • logger (Logger) – the logger to be used by the algorithm;

  • loss_filename (str, 'loss_Q') – optional string to specify the loss filename.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class CategoricalDQN(mdp_info, policy, approximator_params, n_atoms, v_min, v_max, **params)[source]

Bases: AbstractDQN

Categorical DQN algorithm. “A Distributional Perspective on Reinforcement Learning”. Bellemare M. et al.. 2017.

__init__(mdp_info, policy, approximator_params, n_atoms, v_min, v_max, **params)[source]

Constructor.

Parameters:
  • n_atoms (int) – number of atoms;

  • v_min (float) – minimum value of value-function;

  • v_max (float) – maximum value of value-function.

fit(dataset)[source]

Fit step.

Parameters:

dataset (Dataset) – the dataset.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_next_q(next_state, absorbing)
Parameters:
  • next_state (np.ndarray) – the states where next action has to be evaluated;

  • absorbing (np.ndarray) – the absorbing flag for the states in next_state.

Returns:

Maximum action-value for each state in next_state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

_update_target()

Update the target network.

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger, loss_filename='loss_Q')

Setter that can be used to pass a logger to the algorithm

Parameters:
  • logger (Logger) – the logger to be used by the algorithm;

  • loss_filename (str, 'loss_Q') – optional string to specify the loss filename.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class NoisyDQN(mdp_info, policy, approximator_params, **params)[source]

Bases: DQN

Noisy DQN algorithm. “Noisy networks for exploration”. Fortunato M. et al.. 2018.

__init__(mdp_info, policy, approximator_params, **params)[source]

Constructor.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_next_q(next_state, absorbing)
Parameters:
  • next_state (np.ndarray) – the states where next action has to be evaluated;

  • absorbing (np.ndarray) – the absorbing flag for the states in next_state.

Returns:

Maximum action-value for each state in next_state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

_update_target()

Update the target network.

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

fit(dataset)

Fit step.

Parameters:

dataset (Dataset) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger, loss_filename='loss_Q')

Setter that can be used to pass a logger to the algorithm

Parameters:
  • logger (Logger) – the logger to be used by the algorithm;

  • loss_filename (str, 'loss_Q') – optional string to specify the loss filename.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class QuantileDQN(mdp_info, policy, approximator_params, n_quantiles, **params)[source]

Bases: AbstractDQN

Quantile Regression DQN algorithm. “Distributional Reinforcement Learning with Quantile Regression”. Dabney W. et al.. 2018.

__init__(mdp_info, policy, approximator_params, n_quantiles, **params)[source]

Constructor.

Parameters:

n_quantiles (int) – number of quantiles.

fit(dataset)[source]

Fit step.

Parameters:

dataset (Dataset) – the dataset.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_next_q(next_state, absorbing)
Parameters:
  • next_state (np.ndarray) – the states where next action has to be evaluated;

  • absorbing (np.ndarray) – the absorbing flag for the states in next_state.

Returns:

Maximum action-value for each state in next_state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

_update_target()

Update the target network.

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger, loss_filename='loss_Q')

Setter that can be used to pass a logger to the algorithm

Parameters:
  • logger (Logger) – the logger to be used by the algorithm;

  • loss_filename (str, 'loss_Q') – optional string to specify the loss filename.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class Rainbow(mdp_info, policy, approximator_params, n_atoms, v_min, v_max, n_steps_return, alpha_coeff, beta, sigma_coeff=0.5, **params)[source]

Bases: AbstractDQN

Rainbow algorithm. “Rainbow: Combining Improvements in Deep Reinforcement Learning”. Hessel M. et al.. 2018.

__init__(mdp_info, policy, approximator_params, n_atoms, v_min, v_max, n_steps_return, alpha_coeff, beta, sigma_coeff=0.5, **params)[source]

Constructor.

Parameters:
  • n_atoms (int) – number of atoms;

  • v_min (float) – minimum value of value-function;

  • v_max (float) – maximum value of value-function;

  • n_steps_return (int) – the number of steps to consider to compute the n-return;

  • alpha_coeff (float) – prioritization exponent for prioritized experience replay;

  • beta (Parameter) – importance sampling coefficient for prioritized experience replay;

  • sigma_coeff (float, .5) – sigma0 coefficient for noise initialization in noisy layers.

fit(dataset)[source]

Fit step.

Parameters:

dataset (Dataset) – the dataset.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_agent_preprocess(state)

Applies all the agent’s preprocessors to the state.

Parameters:

state (Array) – the state where the agent is;

Returns:

The preprocessed state.

_next_q(next_state, absorbing)
Parameters:
  • next_state (np.ndarray) – the states where next action has to be evaluated;

  • absorbing (np.ndarray) – the absorbing flag for the states in next_state.

Returns:

Maximum action-value for each state in next_state.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

_update_agent_preprocessor(state)

Updates the stats of all the agent’s preprocessors given the state.

Parameters:

state (Array) – the state where the agent is;

_update_target()

Update the target network.

add_agent_preprocessor(preprocessor)

Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

add_core_preprocessor(preprocessor)

Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

property core_preprocessors

Access to core’s state preprocessors stored in the agent.

draw_action(state, policy_state=None)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:
  • state – the state where the agent is;

  • policy_state – the policy internal state.

Returns:

The action to be executed.

episode_start(initial_state, episode_info)

Called by the Core when a new episode starts.

Parameters:
  • initial_state (Array) – vector representing the initial state of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context.

Returns:

A tuple containing the policy initial state and, optionally, the policy parameters

episode_start_vectorized(initial_states, episode_info, start_mask)

Called by the VectorCore when a new episode starts.

Parameters:
  • initial_states (Array) – the initial states of the environment.

  • episode_info (dict) – a dictionary containing the information at reset, such as context;

  • start_mask (Array) – boolean mask to select the environments that are starting a new episode

Returns:

A tuple containing the policy initial states and, optionally, the policy parameters

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger, loss_filename='loss_Q')

Setter that can be used to pass a logger to the algorithm

Parameters:
  • logger (Logger) – the logger to be used by the algorithm;

  • loss_filename (str, 'loss_Q') – optional string to specify the loss filename.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.