Utils

Angles

mushroom.utils.angles.normalize_angle_positive(angle)[source]

Wrap the angle between 0 and 2 * pi.

Parameters:angle (float) – angle to wrap.
Returns:The wrapped angle.
mushroom.utils.angles.normalize_angle(angle)[source]

Wrap the angle between -pi and pi.

Parameters:angle (float) – angle to wrap.
Returns:The wrapped angle.
mushroom.utils.angles.shortest_angular_distance(from_angle, to_angle)[source]

Compute the shortest distance between two angles

Parameters:
  • from_angle (float) – starting angle;
  • to_angle (float) – final angle.
Returns:

The shortest distance between from_angle and to_angle.

Callbacks

class mushroom.utils.callbacks.Callback[source]

Bases: object

Interface for all basic callbacks. Implements a list in which it is possible to store data and methods to query and clean the content stored by the callback.

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

__call__(dataset)[source]

Add samples to the samples list.

Parameters:dataset (list) – the samples to collect.
get()[source]
Returns:The current collected data as a list.
clean()[source]

Deletes the current stored data list

class mushroom.utils.callbacks.CollectDataset[source]

Bases: mushroom.utils.callbacks.Callback

This callback can be used to collect samples during the learning of the agent.

__call__(dataset)[source]

Add samples to the samples list.

Parameters:dataset (list) – the samples to collect.
class mushroom.utils.callbacks.CollectQ(approximator)[source]

Bases: mushroom.utils.callbacks.Callback

This callback can be used to collect the action values in all states at the current time step.

__init__(approximator)[source]

Constructor.

Parameters:approximator ([Table, EnsembleTable]) – the approximator to use to predict the action values.
__call__(**kwargs)[source]

Add action values to the action-values list.

Parameters:**kwargs (dict) – empty dictionary.
class mushroom.utils.callbacks.CollectMaxQ(approximator, state)[source]

Bases: mushroom.utils.callbacks.Callback

This callback can be used to collect the maximum action value in a given state at each call.

__init__(approximator, state)[source]

Constructor.

Parameters:
  • approximator ([Table, EnsembleTable]) – the approximator to use;
  • state (np.ndarray) – the state to consider.
__call__(**kwargs)[source]

Add maximum action values to the maximum action-values list.

Parameters:**kwargs (dict) – empty dictionary.
class mushroom.utils.callbacks.CollectParameters(parameter, *idx)[source]

Bases: mushroom.utils.callbacks.Callback

This callback can be used to collect the values of a parameter (e.g. learning rate) during a run of the agent.

__init__(parameter, *idx)[source]

Constructor.

Parameters:
  • parameter (Parameter) – the parameter whose values have to be collected;
  • *idx (list) – index of the parameter when the parameter is tabular.
__call__(**kwargs)[source]

Add the parameter value to the parameter values list.

Parameters:**kwargs (dict) – empty dictionary.

Dataset

mushroom.utils.dataset.parse_dataset(dataset, features=None)[source]

Split the dataset in its different components and return them.

Parameters:
  • dataset (list) – the dataset to parse;
  • features (object, None) – features to apply to the states.
Returns:

The np.ndarray of state, action, reward, next_state, absorbing flag and last step flag. Features are applied to state and next_state, when provided.

mushroom.utils.dataset.episodes_length(dataset)[source]

Compute the length of each episode in the dataset.

Parameters:dataset (list) – the dataset to consider.
Returns:A list of length of each episode in the dataset.
mushroom.utils.dataset.select_first_episodes(dataset, n_episodes, parse=False)[source]

Return the first n_episodes episodes in the provided dataset.

Parameters:
  • dataset (list) – the dataset to consider;
  • n_episodes (int) – the number of episodes to pick from the dataset;
  • parse (bool, False) – whether to parse the dataset to return.
Returns:

A subset of the dataset containing the first n_episodes episodes.

mushroom.utils.dataset.select_random_samples(dataset, n_samples, parse=False)[source]

Return the randomly picked desired number of samples in the provided dataset.

Parameters:
  • dataset (list) – the dataset to consider;
  • n_samples (int) – the number of samples to pick from the dataset;
  • parse (bool, False) – whether to parse the dataset to return.
Returns:

A subset of the dataset containing randomly picked n_samples samples.

mushroom.utils.dataset.compute_J(dataset, gamma=1.0)[source]

Compute the cumulative discounted reward of each episode in the dataset.

Parameters:
  • dataset (list) – the dataset to consider;
  • gamma (float, 1.) – discount factor.
Returns:

The cumulative discounted reward of each episode in the dataset.

mushroom.utils.dataset.compute_metrics(dataset, gamma=1.0)[source]

Compute the metrics of each complete episode in the dataset.

Parameters:
  • dataset (list) – the dataset to consider;
  • gamma (float, 1.0) – the discount factor.
Returns:

The minimum score reached in an episode, the maximum score reached in an episode, the mean score reached, the number of completed games.

If episode has been completed, it returns 0 for all values.

Eligibility trace

mushroom.utils.eligibility_trace.EligibilityTrace(shape, name='replacing')[source]

Factory method to create an eligibility trace of the provided type.

Parameters:
  • shape (list) – shape of the eligibility trace table;
  • name (str, 'replacing') – type of the eligibility trace.
Returns:

The eligibility trace table of the provided shape and type.

class mushroom.utils.eligibility_trace.ReplacingTrace(shape, initial_value=0.0, dtype=None)[source]

Bases: mushroom.utils.table.Table

Replacing trace.

reset()[source]
update(state, action)[source]
__init__(shape, initial_value=0.0, dtype=None)

Constructor.

Parameters:
  • shape (tuple) – the shape of the tabular regressor.
  • initial_value (float, 0.) – the initial value for each entry of the tabular regressor.
  • dtype ([int, float], None) – the dtype of the table array.
fit(x, y)
Parameters:
  • x (int) – index of the table to be filled;
  • y (float) – value to fill in the table.
n_actions

The number of actions considered by the table.

Type:Returns
predict(*z)

Predict the output of the table given an input.

Parameters:
  • *z (list) – list of input of the model. If the table is a Q-table,
  • list may contain states or states and actions depending (this) – on whether the call requires to predict all q-values or only one q-value corresponding to the provided action;
Returns:

The table prediction.

shape

The shape of the table.

Type:Returns
class mushroom.utils.eligibility_trace.AccumulatingTrace(shape, initial_value=0.0, dtype=None)[source]

Bases: mushroom.utils.table.Table

Accumulating trace.

reset()[source]
update(state, action)[source]
__init__(shape, initial_value=0.0, dtype=None)

Constructor.

Parameters:
  • shape (tuple) – the shape of the tabular regressor.
  • initial_value (float, 0.) – the initial value for each entry of the tabular regressor.
  • dtype ([int, float], None) – the dtype of the table array.
fit(x, y)
Parameters:
  • x (int) – index of the table to be filled;
  • y (float) – value to fill in the table.
n_actions

The number of actions considered by the table.

Type:Returns
predict(*z)

Predict the output of the table given an input.

Parameters:
  • *z (list) – list of input of the model. If the table is a Q-table,
  • list may contain states or states and actions depending (this) – on whether the call requires to predict all q-values or only one q-value corresponding to the provided action;
Returns:

The table prediction.

shape

The shape of the table.

Type:Returns

Features

mushroom.utils.features.uniform_grid(n_centers, low, high)[source]

This function is used to create the parameters of uniformly spaced radial basis functions with 25% of overlap. It creates a uniformly spaced grid of n_centers[i] points in each ranges[i]. Also returns a vector containing the appropriate scales of the radial basis functions.

Parameters:
  • n_centers (list) – number of centers of each dimension;
  • low (np.ndarray) – lowest value for each dimension;
  • high (np.ndarray) – highest value for each dimension.
Returns:

The uniformly spaced grid and the scale vector.

Folder

mushroom.utils.folder.mk_dir_recursive(dir_path)[source]

Create a directory and, if needed, all the directory tree. Differently from os.mkdir, this function does not raise exception when the directory already exists.

Parameters:dir_path (str) – the path of the directory to create.

Create a symlink deleting the previous one, if it already exists.

Parameters:
  • src (str) – source;
  • dst (str) – destination.

Minibatches

mushroom.utils.minibatches.minibatch_number(size, batch_size)[source]

Function to retrieve the number of batches, given a batch sizes.

Parameters:
  • size (int) – size of the dataset;
  • batch_size (int) – size of the batches.
Returns:

The number of minibatches in the dataset.

mushroom.utils.minibatches.minibatch_generator(batch_size, *dataset)[source]

Generator that creates a minibatch from the full dataset.

Parameters:
  • batch_size (int) – the maximum size of each minibatch;
  • dataset – the dataset to be splitted.
Returns:

The current minibatch.

Numerical gradient

mushroom.utils.numerical_gradient.numerical_diff_policy(policy, state, action, eps=1e-06)[source]

Compute the gradient of a policy in (state, action) numerically.

Parameters:
  • policy (Policy) – the policy whose gradient has to be returned;
  • state (np.ndarray) – the state;
  • action (np.ndarray) – the action;
  • eps (float, 1e-6) – the value of the perturbation.
Returns:

The gradient of the provided policy in (state, action) computed numerically.

mushroom.utils.numerical_gradient.numerical_diff_dist(dist, theta, eps=1e-06)[source]

Compute the gradient of a distribution in theta numerically.

Parameters:
  • dist (Distribution) – the distribution whose gradient has to be returned;
  • theta (np.ndarray) – the parametrization where to compute the gradient;
  • eps (float, 1e-6) – the value of the perturbation.
Returns:

The gradient of the provided distribution theta computed numerically.

Parameters

class mushroom.utils.parameters.Parameter(value, min_value=None, max_value=None, size=(1, ))[source]

Bases: object

This class implements function to manage parameters, such as learning rate. It also allows to have a single parameter for each state of state-action tuple.

__init__(value, min_value=None, max_value=None, size=(1, ))[source]

Constructor.

Parameters:
  • value (float) – initial value of the parameter;
  • min_value (float, None) – minimum value that the parameter can reach when decreasing;
  • max_value (float, None) – maximum value that the parameter can reach when increasing;
  • size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.
__call__(*idx, **kwargs)[source]

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
get_value(*idx, **kwargs)[source]

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
_compute(*idx, **kwargs)[source]
Returns:The value of the parameter in the provided index.
update(*idx, **kwargs)[source]

Updates the number of visit of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter whose number of visits has to be updated.
shape

The shape of the table of parameters.

Type:Returns
class mushroom.utils.parameters.LinearParameter(value, threshold_value, n, size=(1, ))[source]

Bases: mushroom.utils.parameters.Parameter

This class implements a linearly changing parameter according to the number of times it has been used.

__init__(value, threshold_value, n, size=(1, ))[source]

Constructor.

Parameters:
  • value (float) – initial value of the parameter;
  • min_value (float, None) – minimum value that the parameter can reach when decreasing;
  • max_value (float, None) – maximum value that the parameter can reach when increasing;
  • size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.
_compute(*idx, **kwargs)[source]

Returns: The value of the parameter in the provided index.

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
shape

The shape of the table of parameters.

Type:Returns
update(*idx, **kwargs)

Updates the number of visit of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter whose number of visits has to be updated.
class mushroom.utils.parameters.ExponentialParameter(value, exp=1.0, min_value=None, max_value=None, size=(1, ))[source]

Bases: mushroom.utils.parameters.Parameter

This class implements a exponentially changing parameter according to the number of times it has been used.

__init__(value, exp=1.0, min_value=None, max_value=None, size=(1, ))[source]

Constructor.

Parameters:
  • value (float) – initial value of the parameter;
  • min_value (float, None) – minimum value that the parameter can reach when decreasing;
  • max_value (float, None) – maximum value that the parameter can reach when increasing;
  • size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.
_compute(*idx, **kwargs)[source]

Returns: The value of the parameter in the provided index.

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
shape

The shape of the table of parameters.

Type:Returns
update(*idx, **kwargs)

Updates the number of visit of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter whose number of visits has to be updated.
class mushroom.utils.parameters.AdaptiveParameter(value)[source]

Bases: object

This class implements a basic adaptive gradient step. Instead of moving of a step proportional to the gradient, takes a step limited by a given metric. To specify the metric, the natural gradient has to be provided. If natural gradient is not provided, the identity matrix is used.

The step rule is:

\[ \begin{align}\begin{aligned}\Delta\theta=\underset{\Delta\vartheta}{argmax}\Delta\vartheta^{t}\nabla_{\theta}J\\s.t.:\Delta\vartheta^{T}M\Delta\vartheta\leq\varepsilon\end{aligned}\end{align} \]

Lecture notes, Neumann G. http://www.ias.informatik.tu-darmstadt.de/uploads/Geri/lecture-notes-constraint.pdf

__init__(value)[source]

Initialize self. See help(type(self)) for accurate signature.

__call__(*args, **kwargs)[source]

Call self as a function.

Replay memory

class mushroom.utils.replay_memory.ReplayMemory(initial_size, max_size)[source]

Bases: object

This class implements function to manage a replay memory as the one used in “Human-Level Control Through Deep Reinforcement Learning” by Mnih V. et al..

__init__(initial_size, max_size)[source]

Constructor.

Parameters:
  • initial_size (int) – initial number of elements in the replay memory;
  • max_size (int) – maximum number of elements that the replay memory can contain.
add(dataset)[source]

Add elements to the replay memory.

Parameters:dataset (list) – list of elements to add to the replay memory.
get(n_samples)[source]

Returns the provided number of states from the replay memory.

Parameters:n_samples (int) – the number of samples to return.
Returns:The requested number of samples.
reset()[source]

Reset the replay memory.

initialized

Whether the replay memory has reached the number of elements that allows it to be used.

Type:Returns
size

The number of elements contained in the replay memory.

Type:Returns
class mushroom.utils.replay_memory.SumTree(max_size)[source]

Bases: object

This class implements a sum tree data structure. This is used, for instance, by PrioritizedReplayMemory.

__init__(max_size)[source]

Constructor.

Parameters:max_size (int) – maximum size of the tree.
add(dataset, priority)[source]

Add elements to the tree.

Parameters:
  • dataset (list) – list of elements to add to the tree;
  • p (np.ndarray) – priority of each sample in the dataset.
get(s)[source]

Returns the provided number of states from the replay memory.

Parameters:s (float) – the value of the samples to return.
Returns:The requested sample.
update(idx, priorities)[source]

Update the priority of the sample at the provided index in the dataset.

Parameters:
  • idx (np.ndarray) – indexes of the transitions in the dataset;
  • priorities (np.ndarray) – priorities of the transitions.
size

The current size of the tree.

Type:Returns
max_p

The maximum priority among the ones in the tree.

Type:Returns
total_p

The sum of the priorities in the tree, i.e. the value of the root node.

Type:Returns
class mushroom.utils.replay_memory.PrioritizedReplayMemory(initial_size, max_size, alpha, beta, epsilon=0.01)[source]

Bases: object

This class implements function to manage a prioritized replay memory as the one used in “Prioritized Experience Replay” by Schaul et al., 2015.

__init__(initial_size, max_size, alpha, beta, epsilon=0.01)[source]

Constructor.

Parameters:
  • initial_size (int) – initial number of elements in the replay memory;
  • max_size (int) – maximum number of elements that the replay memory can contain;
  • alpha (float) – prioritization coefficient;
  • beta (float) – importance sampling coefficient;
  • epsilon (float, 01) – small value to avoid zero probabilities.
add(dataset, p)[source]

Add elements to the replay memory.

Parameters:
  • dataset (list) – list of elements to add to the replay memory;
  • p (np.ndarray) – priority of each sample in the dataset.
get(n_samples)[source]

Returns the provided number of states from the replay memory.

Parameters:n_samples (int) – the number of samples to return.
Returns:The requested number of samples.
update(error, idx)[source]

Update the priority of the sample at the provided index in the dataset.

Parameters:
  • error (np.ndarray) – errors to consider to compute the priorities;
  • idx (np.ndarray) – indexes of the transitions in the dataset.
initialized

Whether the replay memory has reached the number of elements that allows it to be used.

Type:Returns
max_priority

The maximum value of priority inside the replay memory.

Type:Returns

Spaces

class mushroom.utils.spaces.Box(low, high, shape=None)[source]

Bases: object

This class implements functions to manage continuous states and action spaces. It is similar to the Box class in gym.spaces.box.

__init__(low, high, shape=None)[source]

Constructor.

Parameters:
  • low ([float, np.ndarray]) – the minimum value of each dimension of the space. If a scalar value is provided, this value is considered as the minimum one for each dimension. If a np.ndarray is provided, each i-th element is considered the minimum value of the i-th dimension;
  • high ([float, np.ndarray]) – the maximum value of dimensions of the space. If a scalar value is provided, this value is considered as the maximum one for each dimension. If a np.ndarray is provided, each i-th element is considered the maximum value of the i-th dimension;
  • shape (np.ndarray, None) – the dimension of the space. Must match the shape of low and high, if they are np.ndarray.
low

The minimum value of each dimension of the space.

Type:Returns
high

The maximum value of each dimension of the space.

Type:Returns
shape

The dimensions of the space.

Type:Returns
class mushroom.utils.spaces.Discrete(n)[source]

Bases: object

This class implements functions to manage discrete states and action spaces. It is similar to the Discrete class in gym.spaces.discrete.

__init__(n)[source]

Constructor.

Parameters:n (int) – the number of values of the space.
size

The number of elements of the space.

Type:Returns
shape

The shape of the space that is always (1,).

Type:Returns

Table

class mushroom.utils.table.Table(shape, initial_value=0.0, dtype=None)[source]

Bases: object

Table regressor. Used for discrete state and action spaces.

__init__(shape, initial_value=0.0, dtype=None)[source]

Constructor.

Parameters:
  • shape (tuple) – the shape of the tabular regressor.
  • initial_value (float, 0.) – the initial value for each entry of the tabular regressor.
  • dtype ([int, float], None) – the dtype of the table array.
fit(x, y)[source]
Parameters:
  • x (int) – index of the table to be filled;
  • y (float) – value to fill in the table.
predict(*z)[source]

Predict the output of the table given an input.

Parameters:
  • *z (list) – list of input of the model. If the table is a Q-table,
  • list may contain states or states and actions depending (this) – on whether the call requires to predict all q-values or only one q-value corresponding to the provided action;
Returns:

The table prediction.

n_actions

The number of actions considered by the table.

Type:Returns
shape

The shape of the table.

Type:Returns
class mushroom.utils.table.EnsembleTable(n_models, shape)[source]

Bases: mushroom.approximators._implementations.ensemble.Ensemble

This class implements functions to manage table ensembles.

__init__(n_models, shape)[source]

Constructor.

Parameters:
  • n_models (int) – number of models in the ensemble;
  • shape (np.ndarray) – shape of each table in the ensemble.
fit(*z, idx=None, **fit_params)

Fit the idx-th model of the ensemble if idx is provided, every model otherwise.

Parameters:
  • *z (list) – a list containing the inputs to use to predict with each regressor of the ensemble;
  • idx (int, None) – index of the model to fit;
  • **fit_params (dict) – other params.
model

The list of the models in the ensemble.

Type:Returns
predict(*z, idx=None, prediction='mean', compute_variance=False, **predict_params)

Predict.

Parameters:
  • *z (list) – a list containing the inputs to use to predict with each regressor of the ensemble;
  • idx (int, None) – index of the model to use for prediction;
  • prediction (str, 'mean') – the type of prediction to make. It can be a ‘mean’ of the ensembles, or a ‘sum’;
  • compute_variance (bool, False) – whether to compute the variance of the prediction or not;
  • **predict_params (dict) – other parameters used by the predict method the regressor.
Returns:

The predictions of the model.

reset()

Reset the model parameters.

Torch

mushroom.utils.torch.set_weights(parameters, weights, use_cuda)[source]

Function used to set the value of a set of torch parameters given a vector of values.

Parameters:
  • parameters (list) – list of parameters to be considered;
  • weights (numpy.ndarray) – array of the new values for the parameters;
  • use_cuda (bool) – whether the parameters are cuda tensors or not;
mushroom.utils.torch.get_weights(parameters)[source]

Function used to get the value of a set of torch parameters as a single vector of values.

Parameters:parameters (list) – list of parameters to be considered.
Returns:A numpy vector consisting of all the values of the vectors.
mushroom.utils.torch.zero_grad(parameters)[source]

Function used to set to zero the value of the gradient of a set of torch parameters.

Parameters:parameters (list) – list of parameters to be considered.
mushroom.utils.torch.get_gradient(params)[source]

Function used to get the value of the gradient of a set of torch parameters.

Parameters:parameters (list) – list of parameters to be considered.
mushroom.utils.torch.to_float_tensor(x, use_cuda=False)[source]

Function used to convert a numpy array to a float torch tensor.

Parameters:
  • x (np.ndarray) – numpy array to be converted as torch tensor;
  • use_cuda (bool) – whether to build a cuda tensors or not.
Returns:

A float tensor build from the values contained in the input array.

Value Functions

mushroom.utils.value_functions.compute_advantage_montecarlo(V, s, ss, r, absorbing, gamma)[source]

Function to estimate the advantage and new value function target over a dataset. The value function is estimated using rollouts (monte carlo estimation).

Parameters:
  • V (Regressor) – the current value function regressor;
  • s (numpy.ndarray) – the set of states in which we want to evaluate the advantage;
  • ss (numpy.ndarray) – the set of next states in which we want to evaluate the advantage;
  • r (numpy.ndarray) – the reward obtained in each transition from state s to state ss;
  • absorbing (numpy.ndarray) – an array of boolean flags indicating if the reached state is absorbing;
  • gamma (float) – the discount factor of the considered problem.
Returns:

The new estimate for the value function of the next state and the advantage function.

mushroom.utils.value_functions.compute_advantage(V, s, ss, r, absorbing, gamma)[source]

Function to estimate the advantage and new value function target over a dataset. The value function is estimated using bootstrapping.

Parameters:
  • V (Regressor) – the current value function regressor;
  • s (numpy.ndarray) – the set of states in which we want to evaluate the advantage;
  • ss (numpy.ndarray) – the set of next states in which we want to evaluate the advantage;
  • r (numpy.ndarray) – the reward obtained in each transition from state s to state ss;
  • absorbing (numpy.ndarray) – an array of boolean flags indicating if the reached state is absorbing;
  • gamma (float) – the discount factor of the considered problem.
Returns:

The new estimate for the value function of the next state and the advantage function.

mushroom.utils.value_functions.compute_gae(V, s, ss, r, absorbing, last, gamma, lam)[source]

Function to compute Generalized Advantage Estimation (GAE) and new value function target over a dataset.

“High-Dimensional Continuous Control Using Generalized Advantage Estimation”. Schulman J. et al.. 2016.

Parameters:
  • V (Regressor) – the current value function regressor;
  • s (numpy.ndarray) – the set of states in which we want to evaluate the advantage;
  • ss (numpy.ndarray) – the set of next states in which we want to evaluate the advantage;
  • r (numpy.ndarray) – the reward obtained in each transition from state s to state ss;
  • absorbing (numpy.ndarray) – an array of boolean flags indicating if the reached state is absorbing;
  • last (numpy.ndarray) – an array of boolean flags indicating if the reached state is the last of the trajectory;
  • gamma (float) – the discount factor of the considered problem;
  • lam (float) – the value for the lamba coefficient used by GEA algorithm.
Returns:

The new estimate for the value function of the next state and the estimated generalized advantage.

Variance parameters

class mushroom.utils.variance_parameters.VarianceParameter(value, exponential=False, min_value=None, tol=1.0, size=(1, ))[source]

Bases: mushroom.utils.parameters.Parameter

Abstract class to implement variance-dependent parameters. A target parameter is expected.

__init__(value, exponential=False, min_value=None, tol=1.0, size=(1, ))[source]

Constructor.

Parameters:tol (float) – value of the variance of the target variable such that The parameter value is 0.5.
_compute(*idx, **kwargs)[source]

Returns: The value of the parameter in the provided index.

update(*idx, **kwargs)[source]

Updates the value of the parameter in the provided index.

Parameters:
  • *idx (list) – index of the parameter whose number of visits has to be updated.
  • target (float) – Value of the target variable;
  • factor (float) – Multiplicative factor for the parameter value, useful when the parameter depend on another parameter value.
__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
shape

The shape of the table of parameters.

Type:Returns
class mushroom.utils.variance_parameters.VarianceIncreasingParameter(value, exponential=False, min_value=None, tol=1.0, size=(1, ))[source]

Bases: mushroom.utils.variance_parameters.VarianceParameter

Class implementing a parameter that increases with the target variance.

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
__init__(value, exponential=False, min_value=None, tol=1.0, size=(1, ))

Constructor.

Parameters:tol (float) – value of the variance of the target variable such that The parameter value is 0.5.
_compute(*idx, **kwargs)

Returns: The value of the parameter in the provided index.

get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
shape

The shape of the table of parameters.

Type:Returns
update(*idx, **kwargs)

Updates the value of the parameter in the provided index.

Parameters:
  • *idx (list) – index of the parameter whose number of visits has to be updated.
  • target (float) – Value of the target variable;
  • factor (float) – Multiplicative factor for the parameter value, useful when the parameter depend on another parameter value.
class mushroom.utils.variance_parameters.VarianceDecreasingParameter(value, exponential=False, min_value=None, tol=1.0, size=(1, ))[source]

Bases: mushroom.utils.variance_parameters.VarianceParameter

Class implementing a parameter that decreases with the target variance.

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
__init__(value, exponential=False, min_value=None, tol=1.0, size=(1, ))

Constructor.

Parameters:tol (float) – value of the variance of the target variable such that The parameter value is 0.5.
_compute(*idx, **kwargs)

Returns: The value of the parameter in the provided index.

get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
shape

The shape of the table of parameters.

Type:Returns
update(*idx, **kwargs)

Updates the value of the parameter in the provided index.

Parameters:
  • *idx (list) – index of the parameter whose number of visits has to be updated.
  • target (float) – Value of the target variable;
  • factor (float) – Multiplicative factor for the parameter value, useful when the parameter depend on another parameter value.
class mushroom.utils.variance_parameters.WindowedVarianceParameter(value, exponential=False, min_value=None, tol=1.0, window=100, size=(1, ))[source]

Bases: mushroom.utils.parameters.Parameter

Abstract class to implement variance-dependent parameters. A target parameter is expected. differently from the “Variance Parameter” class the variance is computed in a window interval.

__init__(value, exponential=False, min_value=None, tol=1.0, window=100, size=(1, ))[source]

Constructor.

Parameters:
  • tol (float) – value of the variance of the target variable such that the parameter value is 0.5.
  • window (int) –
_compute(*idx, **kwargs)[source]

Returns: The value of the parameter in the provided index.

update(*idx, **kwargs)[source]

Updates the value of the parameter in the provided index.

Parameters:
  • *idx (list) – index of the parameter whose number of visits has to be updated.
  • target (float) – Value of the target variable;
  • factor (float) – Multiplicative factor for the parameter value, useful when the parameter depend on another parameter value.
__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
shape

The shape of the table of parameters.

Type:Returns
class mushroom.utils.variance_parameters.WindowedVarianceIncreasingParameter(value, exponential=False, min_value=None, tol=1.0, window=100, size=(1, ))[source]

Bases: mushroom.utils.variance_parameters.WindowedVarianceParameter

Class implementing a parameter that decreases with the target variance, where the variance is computed in a fixed length window.

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
__init__(value, exponential=False, min_value=None, tol=1.0, window=100, size=(1, ))

Constructor.

Parameters:
  • tol (float) – value of the variance of the target variable such that the parameter value is 0.5.
  • window (int) –
_compute(*idx, **kwargs)

Returns: The value of the parameter in the provided index.

get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
shape

The shape of the table of parameters.

Type:Returns
update(*idx, **kwargs)

Updates the value of the parameter in the provided index.

Parameters:
  • *idx (list) – index of the parameter whose number of visits has to be updated.
  • target (float) – Value of the target variable;
  • factor (float) – Multiplicative factor for the parameter value, useful when the parameter depend on another parameter value.

Viewer

class mushroom.utils.viewer.ImageViewer(size, dt)[source]

Bases: object

Interface to pygame for visualizing plain images.

__init__(size, dt)[source]

Constructor.

Parameters:
  • size ([list, tuple]) – size of the displayed image;
  • dt (float) – duration of a control step.
display(img)[source]

Display given frame.

Parameters:img – image to display.
class mushroom.utils.viewer.Viewer(env_width, env_height, width=500, height=500, background=(0, 0, 0))[source]

Bases: object

Interface to pygame for visualizing mushroom native environments.

__init__(env_width, env_height, width=500, height=500, background=(0, 0, 0))[source]

Constructor.

Parameters:
  • env_width (int) – The x dimension limit of the desired environment;
  • env_height (int) – The y dimension limit of the desired environment;
  • width (int, 500) – width of the environment window;
  • height (int, 500) – height of the environment window;
  • background (tuple, (0, 0, 0)) – background color of the screen.
screen

Property.

Returns:The screen created by this viewer.
size

Property.

Returns:The size of the screen.
line(start, end, color=(255, 255, 255), width=1)[source]

Draw a line on the screen.

Parameters:
  • start (np.ndarray) – starting point of the line;
  • end (np.ndarray) – end point of the line;
  • color (tuple (255, 255, 255)) – color of the line;
  • width (int, 1) – width of the line.
square(center, angle, edge, color=(255, 255, 255), width=0)[source]

Draw a square on the screen and apply a roto-translation to it.

Parameters:
  • center (np.ndarray) – the center of the polygon;
  • angle (float) – the rotation to apply to the polygon;
  • edge (float) – length of an edge;
  • color (tuple, (255, 255, 255)) – the color of the polygon;
  • width (int, 0) – the width of the polygon line, 0 to fill the polygon.
polygon(center, angle, points, color=(255, 255, 255), width=0)[source]

Draw a polygon on the screen and apply a roto-translation to it.

Parameters:
  • center (np.ndarray) – the center of the polygon;
  • angle (float) – the rotation to apply to the polygon;
  • points (list) – the points of the polygon w.r.t. the center;
  • color (tuple, (255, 255, 255)) – the color of the polygon;
  • width (int, 0) – the width of the polygon line, 0 to fill the polygon.
circle(center, radius, color=(255, 255, 255), width=0)[source]

Draw a circle on the screen.

Parameters:
  • center (np.ndarray) – the center of the circle;
  • radius (float) – the radius of the circle;
  • color (tuple, (255, 255, 255)) – the color of the circle;
  • width (int, 0) – the width of the circle line, 0 to fill the circle.
arrow_head(center, scale, angle, color=(255, 255, 255))[source]

Draw an harrow head.

Parameters:
  • center (np.ndarray) – the position of the arrow head;
  • scale (float) – scale of the arrow, correspond to the length;
  • angle (float) – the angle of rotation of the angle head;
  • color (tuple, (255, 255, 255)) – the color of the arrow.
force_arrow(center, direction, force, max_force, max_length, color=(255, 255, 255), width=1)[source]

Draw a torque arrow, i.e. a circular arrow representing a torque. The radius of the arrow is directly proportional to the torque value.

Parameters:
  • center (np.ndarray) – the point where the force is applied;
  • direction (np.ndarray) – the direction of the force;
  • force (float) – the applied force value;
  • max_force (float) – the maximum force value;
  • max_length (float) – the length to use for the maximum force;
  • color (tuple, (255, 255, 255)) – the color of the arrow;
  • width (int, 1) – the width of the force arrow.
torque_arrow(center, torque, max_torque, max_radius, color=(255, 255, 255), width=1)[source]

Draw a torque arrow, i.e. a circular arrow representing a torque. The radius of the arrow is directly proportional to the torque value.

Parameters:
  • center (np.ndarray) – the point where the torque is applied;
  • torque (float) – the applied torque value;
  • max_torque (float) – the maximum torque value;
  • max_radius (float) – the radius to use for the maximum torque;
  • color (tuple, (255, 255, 255)) – the color of the arrow;
  • width (int, 1) – the width of the torque arrow.
background_image(img)[source]

Use the given image as background for the window, rescaling it appropriately.

Parameters:img – the image to be used.
function(x_s, x_e, f, n_points=100, width=1, color=(255, 255, 255))[source]

Draw the graph of a function in the image.

Parameters:
  • x_s (float) – starting x coordinate;
  • x_e (float) – final x coordinate;
  • f (function) – the function that maps x coorinates into y coordinates;
  • n_points (int, 100) – the number of segments used to approximate the function to draw;
  • width (int, 1) – thw width of the line drawn;
  • color (tuple, (255,255,255)) – the color of the line.
display(s)[source]

Display current frame and initialize the next frame to the background color.

Parameters:s – time to wait in visualization.
close()[source]

Close the viewer, destroy the window.