Actor-Critic

Classical Actor-Critic Methods

class COPDAC_Q(mdp_info, policy, mu, alpha_theta, alpha_omega, alpha_v, value_function_features=None, policy_features=None)[source]

Bases: Agent

Compatible off-policy deterministic actor-critic algorithm. “Deterministic Policy Gradient Algorithms”. Silver D. et al.. 2014.

__init__(mdp_info, policy, mu, alpha_theta, alpha_omega, alpha_v, value_function_features=None, policy_features=None)[source]

Constructor.

Parameters:
  • mu (Regressor) – regressor that describe the deterministic policy to be learned i.e., the deterministic mapping between state and action.

  • alpha_theta ([float, Parameter]) – learning rate for policy update;

  • alpha_omega ([float, Parameter]) – learning rate for the advantage function;

  • alpha_v ([float, Parameter]) – learning rate for the value function;

  • value_function_features (Features, None) – features used by the value function approximator;

  • policy_features (Features, None) – features used by the policy.

fit(dataset, **info)[source]

Fit step.

Parameters:

dataset (list) – the dataset.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

add_preprocessor(preprocessor)

Add preprocessor to the preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

draw_action(state)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:

state (np.ndarray) – the state where the agent is.

Returns:

The action to be executed.

episode_start()

Called by the agent when a new episode starts.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

property preprocessors

Access to state preprocessors stored in the agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class StochasticAC(mdp_info, policy, alpha_theta, alpha_v, lambda_par=0.9, value_function_features=None, policy_features=None)[source]

Bases: Agent

Stochastic Actor critic in the episodic setting as presented in: “Model-Free Reinforcement Learning with Continuous Action in Practice”. Degris T. et al.. 2012.

__init__(mdp_info, policy, alpha_theta, alpha_v, lambda_par=0.9, value_function_features=None, policy_features=None)[source]

Constructor.

Parameters:
  • alpha_theta ([float, Parameter]) – learning rate for policy update;

  • alpha_v ([float, Parameter]) – learning rate for the value function;

  • lambda_par ([float, Parameter], .9) – trace decay parameter;

  • value_function_features (Features, None) – features used by the value function approximator;

  • policy_features (Features, None) – features used by the policy.

episode_start()[source]

Called by the agent when a new episode starts.

fit(dataset, **info)[source]

Fit step.

Parameters:

dataset (list) – the dataset.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

add_preprocessor(preprocessor)

Add preprocessor to the preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

draw_action(state)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:

state (np.ndarray) – the state where the agent is.

Returns:

The action to be executed.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

property preprocessors

Access to state preprocessors stored in the agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class StochasticAC_AVG(mdp_info, policy, alpha_theta, alpha_v, alpha_r, lambda_par=0.9, value_function_features=None, policy_features=None)[source]

Bases: StochasticAC

Stochastic Actor critic in the average reward setting as presented in: “Model-Free Reinforcement Learning with Continuous Action in Practice”. Degris T. et al.. 2012.

__init__(mdp_info, policy, alpha_theta, alpha_v, alpha_r, lambda_par=0.9, value_function_features=None, policy_features=None)[source]

Constructor.

Parameters:

alpha_r (Parameter) – learning rate for the reward trace.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

add_preprocessor(preprocessor)

Add preprocessor to the preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

draw_action(state)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:

state (np.ndarray) – the state where the agent is.

Returns:

The action to be executed.

episode_start()

Called by the agent when a new episode starts.

fit(dataset, **info)

Fit step.

Parameters:

dataset (list) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

property preprocessors

Access to state preprocessors stored in the agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

Deep Actor-Critic Methods

class DeepAC(mdp_info, policy, actor_optimizer, parameters)[source]

Bases: Agent

Base class for algorithms that uses the reparametrization trick, such as SAC, DDPG and TD3.

__init__(mdp_info, policy, actor_optimizer, parameters)[source]

Constructor.

Parameters:
  • actor_optimizer (dict) – parameters to specify the actor optimizer algorithm;

  • parameters (list) – policy parameters to be optimized.

fit(dataset, **info)[source]

Fit step.

Parameters:

dataset (list) – the dataset.

_optimize_actor_parameters(loss)[source]

Method used to update actor parameters to maximize a given loss.

Parameters:

loss (torch.tensor) – the loss computed by the algorithm.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()[source]

This method can be overwritten to implement logic that is executed after the loading of the agent.

add_preprocessor(preprocessor)

Add preprocessor to the preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

draw_action(state)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:

state (np.ndarray) – the state where the agent is.

Returns:

The action to be executed.

episode_start()

Called by the agent when a new episode starts.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

property preprocessors

Access to state preprocessors stored in the agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class A2C(mdp_info, policy, actor_optimizer, critic_params, ent_coeff, max_grad_norm=None, critic_fit_params=None)[source]

Bases: DeepAC

Advantage Actor Critic algorithm (A2C). Synchronous version of the A3C algorithm. “Asynchronous Methods for Deep Reinforcement Learning”. Mnih V. et. al.. 2016.

__init__(mdp_info, policy, actor_optimizer, critic_params, ent_coeff, max_grad_norm=None, critic_fit_params=None)[source]

Constructor.

Parameters:
  • policy (TorchPolicy) – torch policy to be learned by the algorithm;

  • actor_optimizer (dict) – parameters to specify the actor optimizer algorithm;

  • critic_params (dict) – parameters of the critic approximator to build;

  • ent_coeff ([float, Parameter], 0) – coefficient for the entropy penalty;

  • max_grad_norm (float, None) – maximum norm for gradient clipping. If None, no clipping will be performed, unless specified otherwise in actor_optimizer;

  • critic_fit_params (dict, None) – parameters of the fitting algorithm of the critic approximator.

fit(dataset, **info)[source]

Fit step.

Parameters:

dataset (list) – the dataset.

_post_load()[source]

This method can be overwritten to implement logic that is executed after the loading of the agent.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_optimize_actor_parameters(loss)

Method used to update actor parameters to maximize a given loss.

Parameters:

loss (torch.tensor) – the loss computed by the algorithm.

add_preprocessor(preprocessor)

Add preprocessor to the preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

draw_action(state)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:

state (np.ndarray) – the state where the agent is.

Returns:

The action to be executed.

episode_start()

Called by the agent when a new episode starts.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

property preprocessors

Access to state preprocessors stored in the agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class DDPG(mdp_info, policy_class, policy_params, actor_params, actor_optimizer, critic_params, batch_size, initial_replay_size, max_replay_size, tau, policy_delay=1, critic_fit_params=None, actor_predict_params=None, critic_predict_params=None)[source]

Bases: DeepAC

Deep Deterministic Policy Gradient algorithm. “Continuous Control with Deep Reinforcement Learning”. Lillicrap T. P. et al.. 2016.

__init__(mdp_info, policy_class, policy_params, actor_params, actor_optimizer, critic_params, batch_size, initial_replay_size, max_replay_size, tau, policy_delay=1, critic_fit_params=None, actor_predict_params=None, critic_predict_params=None)[source]

Constructor.

Parameters:
  • policy_class (Policy) – class of the policy;

  • policy_params (dict) – parameters of the policy to build;

  • actor_params (dict) – parameters of the actor approximator to build;

  • actor_optimizer (dict) – parameters to specify the actor optimizer algorithm;

  • critic_params (dict) – parameters of the critic approximator to build;

  • batch_size ([int, Parameter]) – the number of samples in a batch;

  • initial_replay_size (int) – the number of samples to collect before starting the learning;

  • max_replay_size (int) – the maximum number of samples in the replay memory;

  • tau ((float, Parameter)) – value of coefficient for soft updates;

  • policy_delay ([int, Parameter], 1) – the number of updates of the critic after which an actor update is implemented;

  • critic_fit_params (dict, None) – parameters of the fitting algorithm of the critic approximator;

  • actor_predict_params (dict, None) – parameters for the prediction with the actor approximator;

  • critic_predict_params (dict, None) – parameters for the prediction with the critic approximator.

fit(dataset, **info)[source]

Fit step.

Parameters:

dataset (list) – the dataset.

_next_q(next_state, absorbing)[source]
Parameters:
  • next_state (np.ndarray) – the states where next action has to be evaluated;

  • absorbing (np.ndarray) – the absorbing flag for the states in next_state.

Returns:

Action-values returned by the critic for next_state and the action returned by the actor.

_post_load()[source]

This method can be overwritten to implement logic that is executed after the loading of the agent.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_optimize_actor_parameters(loss)

Method used to update actor parameters to maximize a given loss.

Parameters:

loss (torch.tensor) – the loss computed by the algorithm.

add_preprocessor(preprocessor)

Add preprocessor to the preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

draw_action(state)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:

state (np.ndarray) – the state where the agent is.

Returns:

The action to be executed.

episode_start()

Called by the agent when a new episode starts.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

property preprocessors

Access to state preprocessors stored in the agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class TD3(mdp_info, policy_class, policy_params, actor_params, actor_optimizer, critic_params, batch_size, initial_replay_size, max_replay_size, tau, policy_delay=2, noise_std=0.2, noise_clip=0.5, critic_fit_params=None)[source]

Bases: DDPG

Twin Delayed DDPG algorithm. “Addressing Function Approximation Error in Actor-Critic Methods”. Fujimoto S. et al.. 2018.

__init__(mdp_info, policy_class, policy_params, actor_params, actor_optimizer, critic_params, batch_size, initial_replay_size, max_replay_size, tau, policy_delay=2, noise_std=0.2, noise_clip=0.5, critic_fit_params=None)[source]

Constructor.

Parameters:
  • policy_class (Policy) – class of the policy;

  • policy_params (dict) – parameters of the policy to build;

  • actor_params (dict) – parameters of the actor approximator to build;

  • actor_optimizer (dict) – parameters to specify the actor optimizer algorithm;

  • critic_params (dict) – parameters of the critic approximator to build;

  • batch_size ([int, Parameter]) – the number of samples in a batch;

  • initial_replay_size (int) – the number of samples to collect before starting the learning;

  • max_replay_size (int) – the maximum number of samples in the replay memory;

  • tau ([float, Parameter]) – value of coefficient for soft updates;

  • policy_delay ([int, Parameter], 2) – the number of updates of the critic after which an actor update is implemented;

  • noise_std ([float, Parameter], .2) – standard deviation of the noise used for policy smoothing;

  • noise_clip ([float, Parameter], .5) – maximum absolute value for policy smoothing noise;

  • critic_fit_params (dict, None) – parameters of the fitting algorithm of the critic approximator.

_next_q(next_state, absorbing)[source]
Parameters:
  • next_state (np.ndarray) – the states where next action has to be evaluated;

  • absorbing (np.ndarray) – the absorbing flag for the states in next_state.

Returns:

Action-values returned by the critic for next_state and the action returned by the actor.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_optimize_actor_parameters(loss)

Method used to update actor parameters to maximize a given loss.

Parameters:

loss (torch.tensor) – the loss computed by the algorithm.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

add_preprocessor(preprocessor)

Add preprocessor to the preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

draw_action(state)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:

state (np.ndarray) – the state where the agent is.

Returns:

The action to be executed.

episode_start()

Called by the agent when a new episode starts.

fit(dataset, **info)

Fit step.

Parameters:

dataset (list) – the dataset.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

property preprocessors

Access to state preprocessors stored in the agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class SAC(mdp_info, actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, batch_size, initial_replay_size, max_replay_size, warmup_transitions, tau, lr_alpha, use_log_alpha_loss=False, log_std_min=-20, log_std_max=2, target_entropy=None, critic_fit_params=None)[source]

Bases: DeepAC

Soft Actor-Critic algorithm. “Soft Actor-Critic Algorithms and Applications”. Haarnoja T. et al.. 2019.

__init__(mdp_info, actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, batch_size, initial_replay_size, max_replay_size, warmup_transitions, tau, lr_alpha, use_log_alpha_loss=False, log_std_min=-20, log_std_max=2, target_entropy=None, critic_fit_params=None)[source]

Constructor.

Parameters:
  • actor_mu_params (dict) – parameters of the actor mean approximator to build;

  • actor_sigma_params (dict) – parameters of the actor sigma approximator to build;

  • actor_optimizer (dict) – parameters to specify the actor optimizer algorithm;

  • critic_params (dict) – parameters of the critic approximator to build;

  • batch_size ((int, Parameter)) – the number of samples in a batch;

  • initial_replay_size (int) – the number of samples to collect before starting the learning;

  • max_replay_size (int) – the maximum number of samples in the replay memory;

  • warmup_transitions ([int, Parameter]) – number of samples to accumulate in the replay memory to start the policy fitting;

  • tau ([float, Parameter]) – value of coefficient for soft updates;

  • lr_alpha ([float, Parameter]) – Learning rate for the entropy coefficient;

  • use_log_alpha_loss (bool, False) – whether to use the original implementation loss or the one from the paper;

  • log_std_min ([float, Parameter]) – Min value for the policy log std;

  • log_std_max ([float, Parameter]) – Max value for the policy log std;

  • target_entropy (float, None) – target entropy for the policy, if None a default value is computed;

  • critic_fit_params (dict, None) – parameters of the fitting algorithm of the critic approximator.

fit(dataset, **info)[source]

Fit step.

Parameters:

dataset (list) – the dataset.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_next_q(next_state, absorbing)[source]
Parameters:
  • next_state (np.ndarray) – the states where next action has to be evaluated;

  • absorbing (np.ndarray) – the absorbing flag for the states in next_state.

Returns:

Action-values returned by the critic for next_state and the action returned by the actor.

_optimize_actor_parameters(loss)

Method used to update actor parameters to maximize a given loss.

Parameters:

loss (torch.tensor) – the loss computed by the algorithm.

add_preprocessor(preprocessor)

Add preprocessor to the preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

draw_action(state)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:

state (np.ndarray) – the state where the agent is.

Returns:

The action to be executed.

episode_start()

Called by the agent when a new episode starts.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

property preprocessors

Access to state preprocessors stored in the agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

_post_load()[source]

This method can be overwritten to implement logic that is executed after the loading of the agent.

class TRPO(mdp_info, policy, critic_params, ent_coeff=0.0, max_kl=0.001, lam=1.0, n_epochs_line_search=10, n_epochs_cg=10, cg_damping=0.01, cg_residual_tol=1e-10, critic_fit_params=None)[source]

Bases: Agent

Trust Region Policy optimization algorithm. “Trust Region Policy Optimization”. Schulman J. et al.. 2015.

__init__(mdp_info, policy, critic_params, ent_coeff=0.0, max_kl=0.001, lam=1.0, n_epochs_line_search=10, n_epochs_cg=10, cg_damping=0.01, cg_residual_tol=1e-10, critic_fit_params=None)[source]

Constructor.

Parameters:
  • policy (TorchPolicy) – torch policy to be learned by the algorithm

  • critic_params (dict) – parameters of the critic approximator to build;

  • ent_coeff ([float, Parameter], 0) – coefficient for the entropy penalty;

  • max_kl ([float, Parameter], .001) – maximum kl allowed for every policy update;

  • float (lam) – lambda coefficient used by generalized advantage estimation;

  • n_epochs_line_search ([int, Parameter], 10) – maximum number of iterations of the line search algorithm;

  • n_epochs_cg ([int, Parameter], 10) – maximum number of iterations of the conjugate gradient algorithm;

  • cg_damping ([float, Parameter], 1e-2) – damping factor for the conjugate gradient algorithm;

  • cg_residual_tol ([float, Parameter], 1e-10) – conjugate gradient residual tolerance;

  • critic_fit_params (dict, None) – parameters of the fitting algorithm of the critic approximator.

fit(dataset, **info)[source]

Fit step.

Parameters:

dataset (list) – the dataset.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

add_preprocessor(preprocessor)

Add preprocessor to the preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

draw_action(state)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:

state (np.ndarray) – the state where the agent is.

Returns:

The action to be executed.

episode_start()

Called by the agent when a new episode starts.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

property preprocessors

Access to state preprocessors stored in the agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.

class PPO(mdp_info, policy, actor_optimizer, critic_params, n_epochs_policy, batch_size, eps_ppo, lam, ent_coeff=0.0, critic_fit_params=None)[source]

Bases: Agent

Proximal Policy Optimization algorithm. “Proximal Policy Optimization Algorithms”. Schulman J. et al.. 2017.

__init__(mdp_info, policy, actor_optimizer, critic_params, n_epochs_policy, batch_size, eps_ppo, lam, ent_coeff=0.0, critic_fit_params=None)[source]

Constructor.

Parameters:
  • policy (TorchPolicy) – torch policy to be learned by the algorithm

  • actor_optimizer (dict) – parameters to specify the actor optimizer algorithm;

  • critic_params (dict) – parameters of the critic approximator to build;

  • n_epochs_policy ([int, Parameter]) – number of policy updates for every dataset;

  • batch_size ([int, Parameter]) – size of minibatches for every optimization step

  • eps_ppo ([float, Parameter]) – value for probability ratio clipping;

  • lam ([float, Parameter], 1.) – lambda coefficient used by generalized advantage estimation;

  • ent_coeff ([float, Parameter], 1.) – coefficient for the entropy regularization term;

  • critic_fit_params (dict, None) – parameters of the fitting algorithm of the critic approximator.

fit(dataset, **info)[source]

Fit step.

Parameters:

dataset (list) – the dataset.

_post_load()[source]

This method can be overwritten to implement logic that is executed after the loading of the agent.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

add_preprocessor(preprocessor)

Add preprocessor to the preprocessor list. The preprocessors are applied in order.

Parameters:

preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.

copy()
Returns:

A deepcopy of the agent.

draw_action(state)

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:

state (np.ndarray) – the state where the agent is.

Returns:

The action to be executed.

episode_start()

Called by the agent when a new episode starts.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

property preprocessors

Access to state preprocessors stored in the agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_logger(logger)

Setter that can be used to pass a logger to the algorithm

Parameters:

logger (Logger) – the logger to be used by the algorithm.

stop()

Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.