Utils

Angles

mushroom.utils.angles.normalize_angle_positive(angle)[source]

Wrap the angle between 0 and 2 * pi.

Parameters:angle (float) – angle to wrap.
Returns:The wrapped angle.
mushroom.utils.angles.normalize_angle(angle)[source]

Wrap the angle between -pi and pi.

Parameters:angle (float) – angle to wrap.
Returns:The wrapped angle.

Callbacks

class mushroom.utils.callbacks.CollectDataset[source]

Bases: object

This callback can be used to collect samples during the learning of the agent.

__init__()[source]

Constructor.

__call__(dataset)[source]

Add samples to the samples list.

Parameters:dataset (list) – the samples to collect.
get()[source]
Returns:The current samples list.
clean()[source]

Deletes the current dataset

class mushroom.utils.callbacks.CollectQ(approximator)[source]

Bases: object

This callback can be used to collect the action values in all states at the current time step.

__init__(approximator)[source]

Constructor.

Parameters:approximator ([Table, EnsembleTable]) – the approximator to use to predict the action values.
__call__(**kwargs)[source]

Add action values to the action-values list.

Parameters:**kwargs (dict) – empty dictionary.
get_values()[source]
Returns:The current action-values list.
class mushroom.utils.callbacks.CollectMaxQ(approximator, state)[source]

Bases: object

This callback can be used to collect the maximum action value in a given state at each call.

__init__(approximator, state)[source]

Constructor.

Parameters:
  • approximator ([Table, EnsembleTable]) – the approximator to use;
  • state (np.ndarray) – the state to consider.
__call__(**kwargs)[source]

Add maximum action values to the maximum action-values list.

Parameters:**kwargs (dict) – empty dictionary.
get_values()[source]
Returns:The current maximum action-values list.
class mushroom.utils.callbacks.CollectParameters(parameter, *idx)[source]

Bases: object

This callback can be used to collect the values of a parameter (e.g. learning rate) during a run of the agent.

__init__(parameter, *idx)[source]

Constructor.

Parameters:
  • parameter (Parameter) – the parameter whose values have to be collected;
  • *idx (list) – index of the parameter when the parameter is tabular.
__call__(**kwargs)[source]

Add the parameter value to the parameter values list.

Parameters:**kwargs (dict) – empty dictionary.
get_values()[source]
Returns:The current parameter values list.

Dataset

mushroom.utils.dataset.parse_dataset(dataset, features=None)[source]

Split the dataset in its different components and return them.

Parameters:
  • dataset (list) – the dataset to parse;
  • features (object, None) – features to apply to the states.
Returns:

The np.ndarray of state, action, reward, next_state, absorbing flag and last step flag. Features are applied to state and next_state, when provided.

mushroom.utils.dataset.episodes_length(dataset)[source]

Compute the length of each episode in the dataset.

Parameters:dataset (list) – the dataset to consider.
Returns:A list of length of each episode in the dataset.
mushroom.utils.dataset.select_episodes(dataset, n_episodes, parse=False)[source]

Return the first n_episodes episodes in the provided dataset.

Parameters:
  • dataset (list) – the dataset to consider;
  • n_episodes (int) – the number of episodes to pick from the dataset;
  • parse (bool, False) – whether to parse the dataset to return.
Returns:

A subset of the dataset containing the first n_episodes episodes.

mushroom.utils.dataset.select_samples(dataset, n_samples, parse=False)[source]

Return the randomly picked desired number of samples in the provided dataset.

Parameters:
  • dataset (list) – the dataset to consider;
  • n_samples (int) – the number of samples to pick from the dataset;
  • parse (bool, False) – whether to parse the dataset to return.
Returns:

A subset of the dataset containing randomly picked n_samples samples.

mushroom.utils.dataset.compute_J(dataset, gamma=1.0)[source]

Compute the cumulative discounted reward of each episode in the dataset.

Parameters:
  • dataset (list) – the dataset to consider;
  • gamma (float, 1.) – discount factor.
Returns:

The cumulative discounted reward of each episode in the dataset.

mushroom.utils.dataset.compute_scores(dataset)[source]

Compute the scores of each episode in the dataset. This is meant to be used for the Atari environments.

Parameters:dataset (list) – the dataset to consider.
Returns:The minimum score reached in an episode, the maximum score reached in an episode, the mean score reached, the number of completed games.

If no game has been completed, it returns 0 for all values.

Eligibility trace

mushroom.utils.eligibility_trace.EligibilityTrace(shape, name='replacing')[source]

Factory method to create an eligibility trace of the provided type.

Parameters:
  • shape (list) – shape of the eligibility trace table;
  • name (str, 'replacing') – type of the eligibility trace.
Returns:

The eligibility trace table of the provided shape and type.

class mushroom.utils.eligibility_trace.ReplacingTrace(shape, initial_value=0.0, dtype=None)[source]

Bases: mushroom.utils.table.Table

Replacing trace.

reset()[source]
update(state, action)[source]
__init__(shape, initial_value=0.0, dtype=None)

Constructor.

Parameters:
  • shape (tuple) – the shape of the tabular regressor.
  • initial_value (float, 0.) – the initial value for each entry of the tabular regressor.
  • dtype ([int, float], None) – the dtype of the table array.
fit(x, y)
Parameters:
  • x (int) – index of the table to be filled;
  • y (float) – value to fill in the table.
n_actions

The number of actions considered by the table.

Type:Returns
predict(*z)

Predict the output of the table given an input.

Parameters:
  • *z (list) – list of input of the model. If the table is a Q-table,
  • list may contain states or states and actions depending (this) – on whether the call requires to predict all q-values or only one q-value corresponding to the provided action;
Returns:

The table prediction.

shape

The shape of the table.

Type:Returns
class mushroom.utils.eligibility_trace.AccumulatingTrace(shape, initial_value=0.0, dtype=None)[source]

Bases: mushroom.utils.table.Table

Accumulating trace.

reset()[source]
update(state, action)[source]
__init__(shape, initial_value=0.0, dtype=None)

Constructor.

Parameters:
  • shape (tuple) – the shape of the tabular regressor.
  • initial_value (float, 0.) – the initial value for each entry of the tabular regressor.
  • dtype ([int, float], None) – the dtype of the table array.
fit(x, y)
Parameters:
  • x (int) – index of the table to be filled;
  • y (float) – value to fill in the table.
n_actions

The number of actions considered by the table.

Type:Returns
predict(*z)

Predict the output of the table given an input.

Parameters:
  • *z (list) – list of input of the model. If the table is a Q-table,
  • list may contain states or states and actions depending (this) – on whether the call requires to predict all q-values or only one q-value corresponding to the provided action;
Returns:

The table prediction.

shape

The shape of the table.

Type:Returns

Features

mushroom.utils.features.uniform_grid(n_centers, low, high)[source]

This function is used to create the parameters of uniformly spaced radial basis functions with 25% of overlap. It creates a uniformly spaced grid of n_centers[i] points in each ranges[i]. Also returns a vector containing the appropriate scales of the radial basis functions.

Parameters:
  • n_centers (list) – number of centers of each dimension;
  • low (np.ndarray) – lowest value for each dimension;
  • high (np.ndarray) – highest value for each dimension.
Returns:

The uniformly spaced grid and the scale vector.

Folder

mushroom.utils.folder.mk_dir_recursive(dir_path)[source]

Create a directory and, if needed, all the directory tree. Differently from os.mkdir, this function does not raise exception when the directory already exists.

Parameters:dir_path (str) – the path of the directory to create.

Create a symlink deleting the previous one, if it already exists.

Parameters:
  • src (str) – source;
  • dst (str) – destination.

Minibatches

mushroom.utils.minibatches.minibatch_number(size, batch_size)[source]

Function to retrieve the number of batches, given a batch sizes.

Parameters:
  • size (int) – size of the dataset;
  • batch_size (int) – size of the batches.
Returns:

The number of minibatches in the dataset.

mushroom.utils.minibatches.minibatch_generator(batch_size, *dataset)[source]

Generator that creates a minibatch from the full dataset.

Parameters:
  • batch_size (int) – the maximum size of each minibatch;
  • dataset – the dataset to be splitted.
Returns:

The current minibatch.

Numerical gradient

mushroom.utils.numerical_gradient.numerical_diff_policy(policy, state, action, eps=1e-06)[source]

Compute the gradient of a policy in (state, action) numerically.

Parameters:
  • policy (Policy) – the policy whose gradient has to be returned;
  • state (np.ndarray) – the state;
  • action (np.ndarray) – the action;
  • eps (float, 1e-6) – the value of the perturbation.
Returns:

The gradient of the provided policy in (state, action) computed numerically.

mushroom.utils.numerical_gradient.numerical_diff_dist(dist, theta, eps=1e-06)[source]

Compute the gradient of a distribution in theta numerically.

Parameters:
  • dist (Distribution) – the distribution whose gradient has to be returned;
  • theta (np.ndarray) – the parametrization where to compute the gradient;
  • eps (float, 1e-6) – the value of the perturbation.
Returns:

The gradient of the provided distribution theta computed numerically.

Parameters

class mushroom.utils.parameters.Parameter(value, min_value=None, max_value=None, size=(1, ))[source]

Bases: object

This class implements function to manage parameters, such as learning rate. It also allows to have a single parameter for each state of state-action tuple.

__init__(value, min_value=None, max_value=None, size=(1, ))[source]

Constructor.

Parameters:
  • value (float) – initial value of the parameter;
  • min_value (float, None) – minimum value that the parameter can reach when decreasing;
  • max_value (float, None) – maximum value that the parameter can reach when increasing;
  • size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.
__call__(*idx, **kwargs)[source]

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
get_value(*idx, **kwargs)[source]

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
_compute(*idx, **kwargs)[source]
Returns:The value of the parameter in the provided index.
update(*idx, **kwargs)[source]

Updates the number of visit of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter whose number of visits has to be updated.
shape

The shape of the table of parameters.

Type:Returns
class mushroom.utils.parameters.LinearParameter(value, threshold_value, n, size=(1, ))[source]

Bases: mushroom.utils.parameters.Parameter

This class implements a linearly changing parameter according to the number of times it has been used.

__init__(value, threshold_value, n, size=(1, ))[source]

Constructor.

Parameters:
  • value (float) – initial value of the parameter;
  • min_value (float, None) – minimum value that the parameter can reach when decreasing;
  • max_value (float, None) – maximum value that the parameter can reach when increasing;
  • size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.
_compute(*idx, **kwargs)[source]

Returns: The value of the parameter in the provided index.

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
shape

The shape of the table of parameters.

Type:Returns
update(*idx, **kwargs)

Updates the number of visit of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter whose number of visits has to be updated.
class mushroom.utils.parameters.ExponentialParameter(value, exp=1.0, min_value=None, max_value=None, size=(1, ))[source]

Bases: mushroom.utils.parameters.Parameter

This class implements a exponentially changing parameter according to the number of times it has been used.

__init__(value, exp=1.0, min_value=None, max_value=None, size=(1, ))[source]

Constructor.

Parameters:
  • value (float) – initial value of the parameter;
  • min_value (float, None) – minimum value that the parameter can reach when decreasing;
  • max_value (float, None) – maximum value that the parameter can reach when increasing;
  • size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.
_compute(*idx, **kwargs)[source]

Returns: The value of the parameter in the provided index.

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
shape

The shape of the table of parameters.

Type:Returns
update(*idx, **kwargs)

Updates the number of visit of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter whose number of visits has to be updated.
class mushroom.utils.parameters.AdaptiveParameter(value)[source]

Bases: object

This class implements a basic adaptive gradient step. Instead of moving of a step proportional to the gradient, takes a step limited by a given metric. To specify the metric, the natural gradient has to be provided. If natural gradient is not provided, the identity matrix is used.

The step rule is:

\[ \begin{align}\begin{aligned}\Delta\theta=\underset{\Delta\vartheta}{argmax}\Delta\vartheta^{t}\nabla_{\theta}J\\s.t.:\Delta\vartheta^{T}M\Delta\vartheta\leq\varepsilon\end{aligned}\end{align} \]

Lecture notes, Neumann G. http://www.ias.informatik.tu-darmstadt.de/uploads/Geri/lecture-notes-constraint.pdf

__init__(value)[source]

Initialize self. See help(type(self)) for accurate signature.

__call__(*args, **kwargs)[source]

Call self as a function.

Preprocessor

class mushroom.utils.preprocessor.Preprocessor[source]

Bases: object

This is the interface class of the preprocessors.

__call__(x)[source]

Compute the preprocessing of the given input according to the type of preprocessor.

Parameters:x (np.ndarray) – the array to preprocess.
Returns:The preprocessed input data array.
class mushroom.utils.preprocessor.Scaler(coeff)[source]

Bases: mushroom.utils.preprocessor.Preprocessor

This class implements the function to scale the input data by a given coefficient.

__init__(coeff)[source]

Constructor.

Parameters:coeff (float) – the coefficient to use to scale input data.
class mushroom.utils.preprocessor.Binarizer(threshold, geq=True)[source]

Bases: mushroom.utils.preprocessor.Preprocessor

This class implements the function to binarize the values of an array according to a provided threshold value.

__init__(threshold, geq=True)[source]

Constructor.

Parameters:
  • threshold (float) – the coefficient to use to scale input data.
  • geq (bool, True) – whether the threshold include equal elements or not.
class mushroom.utils.preprocessor.Filter(idxs)[source]

Bases: mushroom.utils.preprocessor.Preprocessor

This class implements the function to filter the values of an array according to a provided array of indexes.

__init__(idxs)[source]

Constructor.

Parameters:idxs (float) – the array of idxs to use to filter input data.

Replay memory

class mushroom.utils.replay_memory.ReplayMemory(initial_size, max_size)[source]

Bases: object

This class implements function to manage a replay memory as the one used in “Human-Level Control Through Deep Reinforcement Learning” by Mnih V. et al..

__init__(initial_size, max_size)[source]

Constructor.

Parameters:
  • initial_size (int) – initial number of elements in the replay memory;
  • max_size (int) – maximum number of elements that the replay memory can contain.
add(dataset)[source]

Add elements to the replay memory.

Parameters:dataset (list) – list of elements to add to the replay memory.
get(n_samples)[source]

Returns the provided number of states from the replay memory.

Parameters:n_samples (int) – the number of samples to return.
Returns:The requested number of samples.
reset()[source]

Reset the replay memory.

initialized

Whether the replay memory has reached the number of elements that allows it to be used.

Type:Returns
size

The number of elements contained in the replay memory.

Type:Returns

Spaces

class mushroom.utils.spaces.Box(low, high, shape=None)[source]

Bases: object

This class implements functions to manage continuous states and action spaces. It is similar to the Box class in gym.spaces.box.

__init__(low, high, shape=None)[source]

Constructor.

Parameters:
  • low ([float, np.ndarray]) – the minimum value of each dimension of the space. If a scalar value is provided, this value is considered as the minimum one for each dimension. If a np.ndarray is provided, each i-th element is considered the minimum value of the i-th dimension;
  • high ([float, np.ndarray]) – the maximum value of dimensions of the space. If a scalar value is provided, this value is considered as the maximum one for each dimension. If a np.ndarray is provided, each i-th element is considered the maximum value of the i-th dimension;
  • shape (np.ndarray, None) – the dimension of the space. Must match the shape of low and high, if they are np.ndarray.
low

The minimum value of each dimension of the space.

Type:Returns
high

The maximum value of each dimension of the space.

Type:Returns
shape

The dimensions of the space.

Type:Returns
class mushroom.utils.spaces.Discrete(n)[source]

Bases: object

This class implements functions to manage discrete states and action spaces. It is similar to the Discrete class in gym.spaces.discrete.

__init__(n)[source]

Constructor.

Parameters:n (int) – the number of values of the space.
size

The number of elements of the space.

Type:Returns
shape

The shape of the space that is always (1,).

Type:Returns

Table

class mushroom.utils.table.Table(shape, initial_value=0.0, dtype=None)[source]

Bases: object

Table regressor. Used for discrete state and action spaces.

__init__(shape, initial_value=0.0, dtype=None)[source]

Constructor.

Parameters:
  • shape (tuple) – the shape of the tabular regressor.
  • initial_value (float, 0.) – the initial value for each entry of the tabular regressor.
  • dtype ([int, float], None) – the dtype of the table array.
fit(x, y)[source]
Parameters:
  • x (int) – index of the table to be filled;
  • y (float) – value to fill in the table.
predict(*z)[source]

Predict the output of the table given an input.

Parameters:
  • *z (list) – list of input of the model. If the table is a Q-table,
  • list may contain states or states and actions depending (this) – on whether the call requires to predict all q-values or only one q-value corresponding to the provided action;
Returns:

The table prediction.

n_actions

The number of actions considered by the table.

Type:Returns
shape

The shape of the table.

Type:Returns
class mushroom.utils.table.EnsembleTable(n_models, shape, prediction='mean')[source]

Bases: mushroom.approximators._implementations.ensemble.Ensemble

This class implements functions to manage table ensembles.

__init__(n_models, shape, prediction='mean')[source]

Constructor.

Parameters:
  • n_models (int) – number of models in the ensemble;
  • shape (np.ndarray) – shape of each table in the ensemble;
  • prediction (str, 'mean') – type of prediction to return.
fit(*z, **fit_params)

Fit the idx-th model of the ensemble if idx is provided, every model otherwise.

Parameters:
  • *z (list) – a list containing the inputs to use to predict with each regressor of the ensemble;
  • **fit_params (dict) – other params.
model

The list of the models in the ensemble.

Type:Returns
predict(*z, **predict_params)

Predict.

Parameters:
  • *z (list) – a list containing the inputs to use to predict with each regressor of the ensemble;
  • **predict_params (dict) – other params.
Returns:

The predictions of the model.

reset()

Reset the model parameters.

Variance parameters

class mushroom.utils.variance_parameters.VarianceParameter(value, exponential=False, min_value=None, tol=1.0, size=(1, ))[source]

Bases: mushroom.utils.parameters.Parameter

Abstract class to implement variance-dependent parameters. A target parameter is expected.

__init__(value, exponential=False, min_value=None, tol=1.0, size=(1, ))[source]

Constructor.

Parameters:
  • value (float) – initial value of the parameter;
  • min_value (float, None) – minimum value that the parameter can reach when decreasing;
  • max_value (float, None) – maximum value that the parameter can reach when increasing;
  • size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.
_compute(*idx, **kwargs)[source]

Returns: The value of the parameter in the provided index.

update(*idx, **kwargs)[source]

Updates the number of visit of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter whose number of visits has to be updated.
__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
shape

The shape of the table of parameters.

Type:Returns
class mushroom.utils.variance_parameters.VarianceIncreasingParameter(value, exponential=False, min_value=None, tol=1.0, size=(1, ))[source]

Bases: mushroom.utils.variance_parameters.VarianceParameter

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
__init__(value, exponential=False, min_value=None, tol=1.0, size=(1, ))

Constructor.

Parameters:
  • value (float) – initial value of the parameter;
  • min_value (float, None) – minimum value that the parameter can reach when decreasing;
  • max_value (float, None) – maximum value that the parameter can reach when increasing;
  • size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.
_compute(*idx, **kwargs)

Returns: The value of the parameter in the provided index.

get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
shape

The shape of the table of parameters.

Type:Returns
update(*idx, **kwargs)

Updates the number of visit of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter whose number of visits has to be updated.
class mushroom.utils.variance_parameters.VarianceDecreasingParameter(value, exponential=False, min_value=None, tol=1.0, size=(1, ))[source]

Bases: mushroom.utils.variance_parameters.VarianceParameter

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
__init__(value, exponential=False, min_value=None, tol=1.0, size=(1, ))

Constructor.

Parameters:
  • value (float) – initial value of the parameter;
  • min_value (float, None) – minimum value that the parameter can reach when decreasing;
  • max_value (float, None) – maximum value that the parameter can reach when increasing;
  • size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.
_compute(*idx, **kwargs)

Returns: The value of the parameter in the provided index.

get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
shape

The shape of the table of parameters.

Type:Returns
update(*idx, **kwargs)

Updates the number of visit of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter whose number of visits has to be updated.
class mushroom.utils.variance_parameters.WindowedVarianceParameter(value, exponential=False, min_value=None, tol=1.0, window=100, size=(1, ))[source]

Bases: mushroom.utils.parameters.Parameter

__init__(value, exponential=False, min_value=None, tol=1.0, window=100, size=(1, ))[source]

Constructor.

Parameters:
  • value (float) – initial value of the parameter;
  • min_value (float, None) – minimum value that the parameter can reach when decreasing;
  • max_value (float, None) – maximum value that the parameter can reach when increasing;
  • size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.
_compute(*idx, **kwargs)[source]

Returns: The value of the parameter in the provided index.

update(*idx, **kwargs)[source]

Updates the number of visit of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter whose number of visits has to be updated.
__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
shape

The shape of the table of parameters.

Type:Returns
class mushroom.utils.variance_parameters.WindowedVarianceIncreasingParameter(value, exponential=False, min_value=None, tol=1.0, window=100, size=(1, ))[source]

Bases: mushroom.utils.variance_parameters.WindowedVarianceParameter

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The updated parameter in the provided index.
__init__(value, exponential=False, min_value=None, tol=1.0, window=100, size=(1, ))

Constructor.

Parameters:
  • value (float) – initial value of the parameter;
  • min_value (float, None) – minimum value that the parameter can reach when decreasing;
  • max_value (float, None) – maximum value that the parameter can reach when increasing;
  • size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.
_compute(*idx, **kwargs)

Returns: The value of the parameter in the provided index.

get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter to return.
Returns:The current value of the parameter in the provided index.
shape

The shape of the table of parameters.

Type:Returns
update(*idx, **kwargs)

Updates the number of visit of the parameter in the provided index.

Parameters:*idx (list) – index of the parameter whose number of visits has to be updated.

Viewer

class mushroom.utils.viewer.ImageViewer(size, dt)[source]

Bases: object

Interface to pygame for visualizing plain images. Used in mujoco.py.

__init__(size, dt)[source]

Constructor.

Parameters:
  • size ([list, tuple]) – size of the displayed image;
  • dt (float) – duration of a control step.
display(img)[source]

Display given frame.

Parameters:img – image to display.
class mushroom.utils.viewer.Viewer(env_width, env_height, width=500, height=500, background=(0, 0, 0))[source]

Bases: object

Interface to pygame for visualizing mushroom native environments.

__init__(env_width, env_height, width=500, height=500, background=(0, 0, 0))[source]

Constructor.

Parameters:
  • env_width (int) – The x dimension limit of the desired environment;
  • env_height (int) – The y dimension limit of the desired environment;
  • width (int, 500) – width of the environment window;
  • height (int, 500) – height of the environment window;
  • background (tuple, (0, 0, 0)) – background color of the screen.
screen

Property.

Returns:The screen created by this viewer.
size

Property.

Returns:The size of the screen.
line(start, end, color=(255, 255, 255), width=1)[source]

Draw a line on the screen.

Parameters:
  • start (np.ndarray) – starting point of the line;
  • end (np.ndarray) – end point of the line;
  • color (tuple (255, 255, 255)) – color of the line;
  • width (int, 1) – width of the line.
square(center, angle, edge, color=(255, 255, 255), width=0)[source]

Draw a square on the screen and apply a roto-translation to it.

Parameters:
  • center (np.ndarray) – the center of the polygon;
  • angle (float) – the rotation to apply to the polygon;
  • edge (float) – length of an edge;
  • color (tuple, (255, 255, 255)) – the color of the polygon;
  • width (int, 0) – the width of the polygon line, 0 to fill the polygon.
polygon(center, angle, points, color=(255, 255, 255), width=0)[source]

Draw a polygon on the screen and apply a roto-translation to it.

Parameters:
  • center (np.ndarray) – the center of the polygon;
  • angle (float) – the rotation to apply to the polygon;
  • points (list) – the points of the polygon w.r.t. the center;
  • color (tuple, (255, 255, 255)) – the color of the polygon;
  • width (int, 0) – the width of the polygon line, 0 to fill the polygon.
circle(center, radius, color=(255, 255, 255), width=0)[source]

Draw a circle on the screen.

Parameters:
  • center (np.ndarray) – the center of the circle;
  • radius (float) – the radius of the circle;
  • color (tuple, (255, 255, 255)) – the color of the circle;
  • width (int, 0) – the width of the circle line, 0 to fill the circle.
torque_arrow(center, torque, max_torque, max_radius, color=(255, 255, 255), width=1)[source]

Draw a torque arrow, i.e. a circular arrow representing a torque. The radius of the arrow is directly proportional to the torque value.

Parameters:
  • center (np.ndarray) – the point where the torque is applied;
  • torque (float) – the applied torque value;
  • max_torque (float) – the maximum torque value;
  • max_radius (float) – the radius to use for the maximum torque;
  • color (tuple, (255, 255, 255)) – the color of the arrow;
  • width (int, 1) – the width of the torque arrow.
arrow_head(center, scale, angle, color=(255, 255, 255))[source]

Draw an harrow head.

Parameters:
  • center (np.ndarray) – the position of the arrow head;
  • scale (float) – scale of the arrow, correspond to the length;
  • angle (float) – the angle of rotation of the angle head;
  • color (tuple, (255, 255, 255)) – the color of the arrow.
background_image(img)[source]

Use the given image as background for the window, rescaling it appropriately.

Parameters:img – the image to be used.
display(s)[source]

Display current frame and initialize the next frame to the background color.

Parameters:s – time to wait in visualization.
close()[source]

Close the viewer, destroy the window.