Policy

class Policy(policy_state_shape=None)[source]

Bases: Serializable

Interface representing a generic policy. A policy is a probability distribution that gives the probability of taking an action given a specified state. A policy is used by mushroom agents to interact with the environment.

__init__(policy_state_shape=None)[source]

Constructor.

Parameters:

policy_state_shape (tuple, None) – the shape of the internal state of the policy.

__call__(state, action, policy_state)[source]

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

draw_action(state, policy_state)[source]

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

reset()[source]

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

class ParametricPolicy(policy_state_shape=None)[source]

Bases: Policy

Interface for a generic parametric policy. A parametric policy is a policy that depends on set of parameters, called the policy weights. For differentiable policies, the derivative of the probability for a specified state-action pair can be provided.

__init__(policy_state_shape=None)[source]

Constructor.

Parameters:

policy_state_shape (tuple, None) – the shape of the internal state of the policy.

diff_log(state, action, policy_state)[source]

Compute the gradient of the logarithm of the probability density function, in the specified state and action pair, i.e.:

\[\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the gradient is computed;

  • action – the action where the gradient is computed;

  • policy_state – the internal state of the policy.

Returns:

The gradient of the logarithm of the pdf w.r.t. the policy weights

diff(state, action, policy_state=None)[source]

Compute the derivative of the probability density function, in the specified state and action pair. Normally it is computed w.r.t. the derivative of the logarithm of the probability density function, exploiting the likelihood ratio trick, i.e.:

\[\nabla_{\theta}p(s,a)=p(s,a)\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the derivative is computed;

  • action – the action where the derivative is computed;

  • policy_state – the internal state of the policy.

Returns:

The derivative w.r.t. the policy weights

set_weights(weights)[source]

Setter.

Parameters:

weights (np.ndarray) – the vector of the new weights to be used by the policy.

get_weights()[source]

Getter.

Returns:

The current policy weights.

property weights_size

Property.

Returns:

The size of the policy weights.

__call__(state, action, policy_state)

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

draw_action(state, policy_state)

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

Deterministic policy

class DeterministicPolicy(mu, policy_state_shape=None)[source]

Bases: ParametricPolicy

Simple parametric policy representing a deterministic policy. As deterministic policies are degenerate probability functions where all the probability mass is on the deterministic action,they are not differentiable, even if the mean value approximator is differentiable.

__init__(mu, policy_state_shape=None)[source]

Constructor.

Parameters:

mu (Regressor) – the regressor representing the action to select in each state.

get_regressor()[source]

Getter.

Returns:

The regressor that is used to map state to actions.

__call__(state, action, policy_state=None)[source]

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

draw_action(state, policy_state=None)[source]

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

set_weights(weights)[source]

Setter.

Parameters:

weights (np.ndarray) – the vector of the new weights to be used by the policy.

get_weights()[source]

Getter.

Returns:

The current policy weights.

property weights_size

Property.

Returns:

The size of the policy weights.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

diff(state, action, policy_state=None)

Compute the derivative of the probability density function, in the specified state and action pair. Normally it is computed w.r.t. the derivative of the logarithm of the probability density function, exploiting the likelihood ratio trick, i.e.:

\[\nabla_{\theta}p(s,a)=p(s,a)\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the derivative is computed;

  • action – the action where the derivative is computed;

  • policy_state – the internal state of the policy.

Returns:

The derivative w.r.t. the policy weights

diff_log(state, action, policy_state)

Compute the gradient of the logarithm of the probability density function, in the specified state and action pair, i.e.:

\[\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the gradient is computed;

  • action – the action where the gradient is computed;

  • policy_state – the internal state of the policy.

Returns:

The gradient of the logarithm of the pdf w.r.t. the policy weights

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

Gaussian policy

class AbstractGaussianPolicy(policy_state_shape=None)[source]

Bases: ParametricPolicy

Abstract class of Gaussian policies.

__init__(policy_state_shape=None)[source]

Constructor.

__call__(state, action, policy_state=None)[source]

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

draw_action(state, policy_state=None)[source]

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

diff(state, action, policy_state=None)

Compute the derivative of the probability density function, in the specified state and action pair. Normally it is computed w.r.t. the derivative of the logarithm of the probability density function, exploiting the likelihood ratio trick, i.e.:

\[\nabla_{\theta}p(s,a)=p(s,a)\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the derivative is computed;

  • action – the action where the derivative is computed;

  • policy_state – the internal state of the policy.

Returns:

The derivative w.r.t. the policy weights

diff_log(state, action, policy_state)

Compute the gradient of the logarithm of the probability density function, in the specified state and action pair, i.e.:

\[\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the gradient is computed;

  • action – the action where the gradient is computed;

  • policy_state – the internal state of the policy.

Returns:

The gradient of the logarithm of the pdf w.r.t. the policy weights

get_weights()

Getter.

Returns:

The current policy weights.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_weights(weights)

Setter.

Parameters:

weights (np.ndarray) – the vector of the new weights to be used by the policy.

property weights_size

Property.

Returns:

The size of the policy weights.

class GaussianPolicy(mu, sigma, policy_state_shape=None)[source]

Bases: AbstractGaussianPolicy

Gaussian policy. This is a differentiable policy for continuous action spaces. The policy samples an action in every state following a gaussian distribution, where the mean is computed in the state and the covariance matrix is fixed.

__init__(mu, sigma, policy_state_shape=None)[source]

Constructor.

Parameters:
  • mu (Regressor) – the regressor representing the mean w.r.t. the state;

  • sigma (np.ndarray) – a square positive definite matrix representing the covariance matrix. The size of this matrix must be n x n, where n is the action dimensionality.

set_sigma(sigma)[source]

Setter.

Parameters:

sigma (np.ndarray) – the new covariance matrix. Must be a square positive definite matrix.

diff_log(state, action, policy_state=None)[source]

Compute the gradient of the logarithm of the probability density function, in the specified state and action pair, i.e.:

\[\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the gradient is computed;

  • action – the action where the gradient is computed;

  • policy_state – the internal state of the policy.

Returns:

The gradient of the logarithm of the pdf w.r.t. the policy weights

set_weights(weights)[source]

Setter.

Parameters:

weights (np.ndarray) – the vector of the new weights to be used by the policy.

get_weights()[source]

Getter.

Returns:

The current policy weights.

property weights_size

Property.

Returns:

The size of the policy weights.

__call__(state, action, policy_state=None)

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

diff(state, action, policy_state=None)

Compute the derivative of the probability density function, in the specified state and action pair. Normally it is computed w.r.t. the derivative of the logarithm of the probability density function, exploiting the likelihood ratio trick, i.e.:

\[\nabla_{\theta}p(s,a)=p(s,a)\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the derivative is computed;

  • action – the action where the derivative is computed;

  • policy_state – the internal state of the policy.

Returns:

The derivative w.r.t. the policy weights

draw_action(state, policy_state=None)

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

class DiagonalGaussianPolicy(mu, std, policy_state_shape=None)[source]

Bases: AbstractGaussianPolicy

Gaussian policy with learnable standard deviation. The Covariance matrix is constrained to be a diagonal matrix, where the diagonal is the squared standard deviation vector. This is a differentiable policy for continuous action spaces. This policy is similar to the gaussian policy, but the weights includes also the standard deviation.

__init__(mu, std, policy_state_shape=None)[source]

Constructor.

Parameters:
  • mu (Regressor) – the regressor representing the mean w.r.t. the state;

  • std (np.ndarray) – a vector of standard deviations. The length of this vector must be equal to the action dimensionality.

set_std(std)[source]

Setter.

Parameters:

std (np.ndarray) – the new standard deviation. Must be a square positive definite matrix.

diff_log(state, action, policy_state=None)[source]

Compute the gradient of the logarithm of the probability density function, in the specified state and action pair, i.e.:

\[\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the gradient is computed;

  • action – the action where the gradient is computed;

  • policy_state – the internal state of the policy.

Returns:

The gradient of the logarithm of the pdf w.r.t. the policy weights

set_weights(weights)[source]

Setter.

Parameters:

weights (np.ndarray) – the vector of the new weights to be used by the policy.

get_weights()[source]

Getter.

Returns:

The current policy weights.

property weights_size

Property.

Returns:

The size of the policy weights.

__call__(state, action, policy_state=None)

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

diff(state, action, policy_state=None)

Compute the derivative of the probability density function, in the specified state and action pair. Normally it is computed w.r.t. the derivative of the logarithm of the probability density function, exploiting the likelihood ratio trick, i.e.:

\[\nabla_{\theta}p(s,a)=p(s,a)\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the derivative is computed;

  • action – the action where the derivative is computed;

  • policy_state – the internal state of the policy.

Returns:

The derivative w.r.t. the policy weights

draw_action(state, policy_state=None)

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

class StateStdGaussianPolicy(mu, std, eps=1e-06, policy_state_shape=None)[source]

Bases: AbstractGaussianPolicy

Gaussian policy with learnable standard deviation. The Covariance matrix is constrained to be a diagonal matrix, where the diagonal is the squared standard deviation, which is computed for each state. This is a differentiable policy for continuous action spaces. This policy is similar to the diagonal gaussian policy, but a parametric regressor is used to compute the standard deviation, so the standard deviation depends on the current state.

__init__(mu, std, eps=1e-06, policy_state_shape=None)[source]

Constructor.

Parameters:
  • mu (Regressor) – the regressor representing the mean w.r.t. the state;

  • std (Regressor) – the regressor representing the standard deviations w.r.t. the state. The output dimensionality of the regressor must be equal to the action dimensionality;

  • eps (float, 1e-6) – A positive constant added to the variance to ensure that is always greater than zero.

diff_log(state, action, policy_state=None)[source]

Compute the gradient of the logarithm of the probability density function, in the specified state and action pair, i.e.:

\[\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the gradient is computed;

  • action – the action where the gradient is computed;

  • policy_state – the internal state of the policy.

Returns:

The gradient of the logarithm of the pdf w.r.t. the policy weights

set_weights(weights)[source]

Setter.

Parameters:

weights (np.ndarray) – the vector of the new weights to be used by the policy.

get_weights()[source]

Getter.

Returns:

The current policy weights.

property weights_size

Property.

Returns:

The size of the policy weights.

__call__(state, action, policy_state=None)

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

diff(state, action, policy_state=None)

Compute the derivative of the probability density function, in the specified state and action pair. Normally it is computed w.r.t. the derivative of the logarithm of the probability density function, exploiting the likelihood ratio trick, i.e.:

\[\nabla_{\theta}p(s,a)=p(s,a)\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the derivative is computed;

  • action – the action where the derivative is computed;

  • policy_state – the internal state of the policy.

Returns:

The derivative w.r.t. the policy weights

draw_action(state, policy_state=None)

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

class StateLogStdGaussianPolicy(mu, log_std, policy_state_shape=None)[source]

Bases: AbstractGaussianPolicy

Gaussian policy with learnable standard deviation. The Covariance matrix is constrained to be a diagonal matrix, the diagonal is computed by an exponential transformation of the logarithm of the standard deviation computed in each state. This is a differentiable policy for continuous action spaces. This policy is similar to the State std gaussian policy, but here the regressor represents the logarithm of the standard deviation.

__init__(mu, log_std, policy_state_shape=None)[source]

Constructor.

Parameters:
  • mu (Regressor) – the regressor representing the mean w.r.t. the state;

  • log_std (Regressor) – a regressor representing the logarithm of the variance w.r.t. the state. The output dimensionality of the regressor must be equal to the action dimensionality.

diff_log(state, action, policy_state=None)[source]

Compute the gradient of the logarithm of the probability density function, in the specified state and action pair, i.e.:

\[\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the gradient is computed;

  • action – the action where the gradient is computed;

  • policy_state – the internal state of the policy.

Returns:

The gradient of the logarithm of the pdf w.r.t. the policy weights

set_weights(weights)[source]

Setter.

Parameters:

weights (np.ndarray) – the vector of the new weights to be used by the policy.

get_weights()[source]

Getter.

Returns:

The current policy weights.

property weights_size

Property.

Returns:

The size of the policy weights.

__call__(state, action, policy_state=None)

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

diff(state, action, policy_state=None)

Compute the derivative of the probability density function, in the specified state and action pair. Normally it is computed w.r.t. the derivative of the logarithm of the probability density function, exploiting the likelihood ratio trick, i.e.:

\[\nabla_{\theta}p(s,a)=p(s,a)\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the derivative is computed;

  • action – the action where the derivative is computed;

  • policy_state – the internal state of the policy.

Returns:

The derivative w.r.t. the policy weights

draw_action(state, policy_state=None)

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

Noise policy

class OrnsteinUhlenbeckPolicy(mu, sigma, theta, dt, x0=None)[source]

Bases: ParametricPolicy

Ornstein-Uhlenbeck process as implemented in: https://github.com/openai/baselines/blob/master/baselines/ddpg/noise.py.

This policy is commonly used in the Deep Deterministic Policy Gradient algorithm.

__init__(mu, sigma, theta, dt, x0=None)[source]

Constructor.

Parameters:
  • mu (Regressor) – the regressor representing the mean w.r.t. the state;

  • sigma (torch.tensor) – average magnitude of the random fluctations per square-root time;

  • theta (float) – rate of mean reversion;

  • dt (float) – time interval;

  • x0 (torch.tensor, None) – initial values of noise.

__call__(state, action=None, policy_state=None)[source]

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

draw_action(state, policy_state)[source]

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

set_weights(weights)[source]

Setter.

Parameters:

weights (np.ndarray) – the vector of the new weights to be used by the policy.

get_weights()[source]

Getter.

Returns:

The current policy weights.

property weights_size

Property.

Returns:

The size of the policy weights.

reset()[source]

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

diff(state, action, policy_state=None)

Compute the derivative of the probability density function, in the specified state and action pair. Normally it is computed w.r.t. the derivative of the logarithm of the probability density function, exploiting the likelihood ratio trick, i.e.:

\[\nabla_{\theta}p(s,a)=p(s,a)\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the derivative is computed;

  • action – the action where the derivative is computed;

  • policy_state – the internal state of the policy.

Returns:

The derivative w.r.t. the policy weights

diff_log(state, action, policy_state)

Compute the gradient of the logarithm of the probability density function, in the specified state and action pair, i.e.:

\[\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the gradient is computed;

  • action – the action where the gradient is computed;

  • policy_state – the internal state of the policy.

Returns:

The gradient of the logarithm of the pdf w.r.t. the policy weights

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

class ClippedGaussianPolicy(mu, sigma, low, high, policy_state_shape=None)[source]

Bases: ParametricPolicy

Clipped Gaussian policy, as used in:

“Addressing Function Approximation Error in Actor-Critic Methods”. Fujimoto S. et al.. 2018.

This is a non-differentiable policy for continuous action spaces. The policy samples an action in every state following a gaussian distribution, where the mean is computed in the state and the covariance matrix is fixed. The action is then clipped using the given action range. This policy is not a truncated Gaussian, as it simply clips the action if the value is bigger than the boundaries. Thus, the non-differentiability.

__init__(mu, sigma, low, high, policy_state_shape=None)[source]

Constructor.

Parameters:
  • mu (Regressor) – the regressor representing the mean w.r.t. the state;

  • sigma (torch.tensor) – a square positive definite matrix representing the covariance matrix. The size of this matrix must be n x n, where n is the action dimensionality;

  • low (torch.tensor) – a vector containing the minimum action for each component;

  • high (torch.tensor) – a vector containing the maximum action for each component.

__call__(state, action=None, policy_state=None)[source]

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

draw_action(state, policy_state=None)[source]

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

set_weights(weights)[source]

Setter.

Parameters:

weights (np.ndarray) – the vector of the new weights to be used by the policy.

get_weights()[source]

Getter.

Returns:

The current policy weights.

property weights_size

Property.

Returns:

The size of the policy weights.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

diff(state, action, policy_state=None)

Compute the derivative of the probability density function, in the specified state and action pair. Normally it is computed w.r.t. the derivative of the logarithm of the probability density function, exploiting the likelihood ratio trick, i.e.:

\[\nabla_{\theta}p(s,a)=p(s,a)\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the derivative is computed;

  • action – the action where the derivative is computed;

  • policy_state – the internal state of the policy.

Returns:

The derivative w.r.t. the policy weights

diff_log(state, action, policy_state)

Compute the gradient of the logarithm of the probability density function, in the specified state and action pair, i.e.:

\[\nabla_{\theta}\log p(s,a)\]
Parameters:
  • state – the state where the gradient is computed;

  • action – the action where the gradient is computed;

  • policy_state – the internal state of the policy.

Returns:

The gradient of the logarithm of the pdf w.r.t. the policy weights

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

TD policy

class TDPolicy(policy_state_shape=None)[source]

Bases: Policy

__init__(policy_state_shape=None)[source]

Constructor.

set_q(approximator)[source]
Parameters:

approximator (object) – the approximator to use.

get_q()[source]
Returns:

The approximator used by the policy.

__call__(state, action, policy_state)

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

draw_action(state, policy_state)

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

class EpsGreedy(epsilon, policy_state_shape=None)[source]

Bases: TDPolicy

Epsilon greedy policy.

__init__(epsilon, policy_state_shape=None)[source]

Constructor.

Parameters:

epsilon ([float, Parameter]) – the exploration coefficient. It indicates the probability of performing a random actions in the current step.

__call__(*args)[source]

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

draw_action(state, policy_state=None)[source]

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

set_epsilon(epsilon)[source]

Setter.

Parameters:
  • epsilon ([float, Parameter]) – the exploration coefficient. It indicates the

  • step. (probability of performing a random actions in the current) –

update(*idx)[source]

Update the value of the epsilon parameter at the provided index (e.g. in case of different values of epsilon for each visited state according to the number of visits).

Parameters:

*idx (list) – index of the parameter to be updated.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

get_q()
Returns:

The approximator used by the policy.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_q(approximator)
Parameters:

approximator (object) – the approximator to use.

class Boltzmann(beta, policy_state_shape=None)[source]

Bases: TDPolicy

Boltzmann softmax policy.

__init__(beta, policy_state_shape=None)[source]

Constructor.

Parameters:
  • beta ([float, Parameter]) – the inverse of the temperature distribution. As

  • infinity (the temperature approaches) –

  • and (the policy becomes more) –

  • 0.0 (more random. As the temperature approaches) –

  • becomes (the policy) –

  • greedy. (more and more) –

__call__(*args)[source]

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

draw_action(state, policy_state=None)[source]

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

set_beta(beta)[source]

Setter.

Parameters:

beta ((float, Parameter)) – the inverse of the temperature distribution.

update(*idx)[source]

Update the value of the beta parameter at the provided index (e.g. in case of different values of beta for each visited state according to the number of visits).

Parameters:

*idx (list) – index of the parameter to be updated.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

get_q()
Returns:

The approximator used by the policy.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_q(approximator)
Parameters:

approximator (object) – the approximator to use.

class Mellowmax(omega, beta_min=-10.0, beta_max=10.0, policy_state_shape=None)[source]

Bases: Boltzmann

Mellowmax policy. “An Alternative Softmax Operator for Reinforcement Learning”. Asadi K. and Littman M.L.. 2017.

class MellowmaxParameter(outer, omega, beta_min, beta_max)[source]

Bases: Parameter

__init__(outer, omega, beta_min, beta_max)[source]

Constructor.

Parameters:
  • value (float) – initial value of the parameter;

  • min_value (float, None) – minimum value that the parameter can reach when decreasing;

  • max_value (float, None) – maximum value that the parameter can reach when increasing;

  • size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.

__call__(state)[source]

Update and return the parameter in the provided index.

Parameters:

*idx (list) – index of the parameter to return.

Returns:

The updated parameter in the provided index.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_compute(*idx, **kwargs)
Returns:

The value of the parameter in the provided index.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:

*idx (list) – index of the parameter to return.

Returns:

The current value of the parameter in the provided index.

property initial_value

Returns: The initial value of the parameters.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

property shape

Returns: The shape of the table of parameters.

update(*idx, **kwargs)

Updates the number of visit of the parameter in the provided index.

Parameters:

*idx (list) – index of the parameter whose number of visits has to be updated.

__init__(omega, beta_min=-10.0, beta_max=10.0, policy_state_shape=None)[source]

Constructor.

Parameters:
  • omega (Parameter) – the omega parameter of the policy from which beta of the Boltzmann policy is computed;

  • beta_min (float, -10.) – one end of the bracketing interval for minimization with Brent’s method;

  • beta_max (float, 10.) – the other end of the bracketing interval for minimization with Brent’s method.

set_beta(beta)[source]

Setter.

Parameters:

beta ((float, Parameter)) – the inverse of the temperature distribution.

update(*idx)[source]

Update the value of the beta parameter at the provided index (e.g. in case of different values of beta for each visited state according to the number of visits).

Parameters:

*idx (list) – index of the parameter to be updated.

__call__(*args)

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

draw_action(state, policy_state=None)

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

get_q()
Returns:

The approximator used by the policy.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

set_q(approximator)
Parameters:

approximator (object) – the approximator to use.

Torch policy

class TorchPolicy(policy_state_shape=None)[source]

Bases: Policy

Interface for a generic PyTorch policy. A PyTorch policy is a policy implemented as a neural network using PyTorch. Functions ending with ‘_t’ use tensors as input, and also as output when required.

__init__(policy_state_shape=None)[source]

Constructor.

__call__(state, action, policy_state=None)[source]

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

draw_action(state, policy_state=None)[source]

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

distribution(state)[source]

Compute the policy distribution in the given states.

Parameters:

state (np.ndarray) – the set of states where the distribution is computed.

Returns:

The torch distribution for the provided states.

entropy(state=None)[source]

Compute the entropy of the policy.

Parameters:

state (np.ndarray, None) – the set of states to consider. If the entropy of the policy can be computed in closed form, then state can be None.

Returns:

The value of the entropy of the policy.

draw_action_t(state)[source]

Draw an action given a tensor.

Parameters:

state (torch.Tensor) – set of states.

Returns:

The tensor of the actions to perform in each state.

log_prob_t(state, action)[source]

Compute the logarithm of the probability of taking action in state.

Parameters:
  • state (torch.Tensor) – set of states.

  • action (torch.Tensor) – set of actions.

Returns:

The tensor of log-probability.

entropy_t(state)[source]

Compute the entropy of the policy.

Parameters:

state (torch.Tensor) – the set of states to consider. If the entropy of the policy can be computed in closed form, then state can be None.

Returns:

The tensor value of the entropy of the policy.

distribution_t(state)[source]

Compute the policy distribution in the given states.

Parameters:

state (torch.Tensor) – the set of states where the distribution is computed.

Returns:

The torch distribution for the provided states.

set_weights(weights)[source]

Setter.

Parameters:

weights (np.ndarray) – the vector of the new weights to be used by the policy.

get_weights()[source]

Getter.

Returns:

The current policy weights.

parameters()[source]

Returns the trainable policy parameters, as expected by torch optimizers.

Returns:

List of parameters to be optimized.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

class GaussianTorchPolicy(network, input_shape, output_shape, std_0=1.0, policy_state_shape=None, **params)[source]

Bases: TorchPolicy

Torch policy implementing a Gaussian policy with trainable standard deviation. The standard deviation is not state-dependent.

__init__(network, input_shape, output_shape, std_0=1.0, policy_state_shape=None, **params)[source]

Constructor.

Parameters:
  • network (object) – the network class used to implement the mean regressor;

  • input_shape (tuple) – the shape of the state space;

  • output_shape (tuple) – the shape of the action space;

  • std_0 (float, 1.) – initial standard deviation;

  • params (dict) – parameters used by the network constructor.

draw_action_t(state)[source]

Draw an action given a tensor.

Parameters:

state (torch.Tensor) – set of states.

Returns:

The tensor of the actions to perform in each state.

log_prob_t(state, action)[source]

Compute the logarithm of the probability of taking action in state.

Parameters:
  • state (torch.Tensor) – set of states.

  • action (torch.Tensor) – set of actions.

Returns:

The tensor of log-probability.

entropy_t(state=None)[source]

Compute the entropy of the policy.

Parameters:

state (torch.Tensor) – the set of states to consider. If the entropy of the policy can be computed in closed form, then state can be None.

Returns:

The tensor value of the entropy of the policy.

distribution_t(state)[source]

Compute the policy distribution in the given states.

Parameters:

state (torch.Tensor) – the set of states where the distribution is computed.

Returns:

The torch distribution for the provided states.

set_weights(weights)[source]

Setter.

Parameters:

weights (np.ndarray) – the vector of the new weights to be used by the policy.

get_weights()[source]

Getter.

Returns:

The current policy weights.

parameters()[source]

Returns the trainable policy parameters, as expected by torch optimizers.

Returns:

List of parameters to be optimized.

__call__(state, action, policy_state=None)

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

distribution(state)

Compute the policy distribution in the given states.

Parameters:

state (np.ndarray) – the set of states where the distribution is computed.

Returns:

The torch distribution for the provided states.

draw_action(state, policy_state=None)

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

entropy(state=None)

Compute the entropy of the policy.

Parameters:

state (np.ndarray, None) – the set of states to consider. If the entropy of the policy can be computed in closed form, then state can be None.

Returns:

The value of the entropy of the policy.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.

class BoltzmannTorchPolicy(network, input_shape, output_shape, beta, policy_state_shape=None, **params)[source]

Bases: TorchPolicy

Torch policy implementing a Boltzmann policy.

__init__(network, input_shape, output_shape, beta, policy_state_shape=None, **params)[source]

Constructor.

Parameters:
  • network (object) – the network class used to implement the mean regressor;

  • input_shape (tuple) – the shape of the state space;

  • output_shape (tuple) – the shape of the action space;

  • beta ([float, Parameter]) – the inverse of the temperature distribution. As the temperature approaches infinity, the policy becomes more and more random. As the temperature approaches 0.0, the policy becomes more and more greedy.

  • **params – parameters used by the network constructor.

draw_action_t(state)[source]

Draw an action given a tensor.

Parameters:

state (torch.Tensor) – set of states.

Returns:

The tensor of the actions to perform in each state.

log_prob_t(state, action)[source]

Compute the logarithm of the probability of taking action in state.

Parameters:
  • state (torch.Tensor) – set of states.

  • action (torch.Tensor) – set of actions.

Returns:

The tensor of log-probability.

entropy_t(state)[source]

Compute the entropy of the policy.

Parameters:

state (torch.Tensor) – the set of states to consider. If the entropy of the policy can be computed in closed form, then state can be None.

Returns:

The tensor value of the entropy of the policy.

distribution_t(state)[source]

Compute the policy distribution in the given states.

Parameters:

state (torch.Tensor) – the set of states where the distribution is computed.

Returns:

The torch distribution for the provided states.

set_weights(weights)[source]

Setter.

Parameters:

weights (np.ndarray) – the vector of the new weights to be used by the policy.

get_weights()[source]

Getter.

Returns:

The current policy weights.

parameters()[source]

Returns the trainable policy parameters, as expected by torch optimizers.

Returns:

List of parameters to be optimized.

__call__(state, action, policy_state=None)

Compute the probability of taking action in a certain state following the policy.

Parameters:
  • state – state where you want to evaluate the policy density;

  • action – action where you want to evaluate the policy density;

  • policy_state – internal_state where you want to evaluate the policy density.

Returns:

The probability of all actions following the policy in the given state if the list contains only the state, else the probability of the given action in the given state following the policy. If the action space is continuous, state and action must be provided

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:

**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load()

This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()
Returns:

A deepcopy of the agent.

distribution(state)

Compute the policy distribution in the given states.

Parameters:

state (np.ndarray) – the set of states where the distribution is computed.

Returns:

The torch distribution for the provided states.

draw_action(state, policy_state=None)

Sample an action in state using the policy.

Parameters:
  • state – the state where the agent is;

  • policy_state – the internal state of the policy.

Returns:

The action sampled from the policy and optionally the next policy state.

entropy(state=None)

Compute the entropy of the policy.

Parameters:

state (np.ndarray, None) – the set of states to consider. If the entropy of the policy can be computed in closed form, then state can be None.

Returns:

The value of the entropy of the policy.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:

path (Path, string) – Relative or absolute path to the agents save location.

Returns:

The loaded agent.

reset()

Useful when the policy needs a special initialization at the beginning of an episode.

Returns:

The initial policy state (by default None).

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:
  • path (Path, str) – Relative or absolute path to the object save location;

  • full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:
  • zip_file (ZipFile) – ZipFile where te object needs to be saved;

  • full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;

  • folder (string, '') – subfolder to be used by the save method.