Utils

Angles

normalize_angle_positive(angle)[source]

Wrap the angle between 0 and 2 * pi.

Parameters:: angle (float) – angle to wrap.
Returns:: The wrapped angle.

normalize_angle(angle)[source]

Wrap the angle between -pi and pi.

Parameters:: angle (float) – angle to wrap.
Returns:: The wrapped angle.

shortest_angular_distance(from_angle, to_angle)[source]

Compute the shortest distance between two angles

Parameters:

from_angle (float) – starting angle;
to_angle (float) – final angle.

Returns:

The shortest distance between from_angle and to_angle.

quat_to_euler(quat)[source]

Convert a quaternion to euler angles.

Parameters:: quat (np.ndarray) – quaternion to be converted, must be in format [w, x, y, z]
Returns:: The euler angles [x, y, z] representation of the quaternion

euler_to_quat(euler)[source]

Convert euler angles into a quaternion.

Parameters:: euler (np.ndarray) – euler angles to be converted
Returns:: Quaternion in format [w, x, y, z]

mat_to_euler(mat)[source]

Convert a rotation matrix to euler angles.

Parameters:: mat (np.ndarray) – a 3d rotation matrix.
Returns:: The euler angles [x, y, z] representation of the quaternion

euler_to_mat(euler)[source]

Convert euler angles into a a rotation matrix.

Parameters:: euler (np.ndarray) – euler angles [x, y, z] to be converted.
Returns:: The rotation matrix representation of the euler angles

Callbacks

class Callback[source]

Bases: object

Interface for all basic callbacks. Implements a list in which it is possible to store data and methods to query and clean the content stored by the callback.

__init__()[source]: Constructor.

__call__(dataset)[source]

Add samples to the samples list.

Parameters:: dataset (list) – the samples to collect.

get()[source]

Returns:: The current collected data as a list.

clean()[source]: Delete the current stored data list

class CollectDataset[source]

Bases: Callback

This callback can be used to collect samples during the learning of the agent.

__call__(dataset)[source]

Add samples to the samples list.

Parameters:: dataset (list) – the samples to collect.

class CollectQ(approximator)[source]

Bases: Callback

This callback can be used to collect the action values in all states at the current time step.

__init__(approximator)[source]

Constructor.

Parameters:: approximator ([Table, EnsembleTable]) – the approximator to use to predict the action values.

__call__(dataset)[source]

Add samples to the samples list.

Parameters:: dataset (list) – the samples to collect.

class CollectMaxQ(approximator, state)[source]

Bases: Callback

This callback can be used to collect the maximum action value in a given state at each call.

__init__(approximator, state)[source]

Constructor.

Parameters:

approximator ([Table, EnsembleTable]) – the approximator to use;
state (np.ndarray) – the state to consider.

__call__(dataset)[source]

Add samples to the samples list.

Parameters:: dataset (list) – the samples to collect.

class CollectParameters(parameter, *idx)[source]

Bases: Callback

This callback can be used to collect the values of a parameter (e.g. learning rate) during a run of the agent.

__init__(parameter, *idx)[source]

Constructor.

Parameters:

parameter (Parameter) – the parameter whose values have to be collected;
*idx (list) – index of the parameter when the parameter is tabular.

__call__(dataset)[source]

Add samples to the samples list.

Parameters:: dataset (list) – the samples to collect.

Dataset

parse_dataset(dataset, features=None)[source]

Split the dataset in its different components and return them.

Parameters:

dataset (list) – the dataset to parse;
features (object, None) – features to apply to the states.

Returns:

The np.ndarray of state, action, reward, next_state, absorbing flag and last step flag. Features are applied to state and next_state, when provided.

arrays_as_dataset(states, actions, rewards, next_states, absorbings, lasts)[source]

Creates a dataset of transitions from the provided arrays.

Parameters:

states (np.ndarray) – array of states;
actions (np.ndarray) – array of actions;
rewards (np.ndarray) – array of rewards;
next_states (np.ndarray) – array of next_states;
absorbings (np.ndarray) – array of absorbing flags;
lasts (np.ndarray) – array of last flags.

Returns:

The list of transitions.

compute_episodes_length(dataset)[source]

Compute the length of each episode in the dataset.

Parameters:: dataset (list) – the dataset to consider.
Returns:: A list of length of each episode in the dataset.

select_first_episodes(dataset, n_episodes, parse=False)[source]

Return the first n_episodes episodes in the provided dataset.

Parameters:

dataset (list) – the dataset to consider;
n_episodes (int) – the number of episodes to pick from the dataset;
parse (bool, False) – whether to parse the dataset to return.

Returns:

A subset of the dataset containing the first n_episodes episodes.

select_random_samples(dataset, n_samples, parse=False)[source]

Return the randomly picked desired number of samples in the provided dataset.

Parameters:

dataset (list) – the dataset to consider;
n_samples (int) – the number of samples to pick from the dataset;
parse (bool, False) – whether to parse the dataset to return.

Returns:

A subset of the dataset containing randomly picked n_samples samples.

get_init_states(dataset)[source]

Get the initial states of a dataset

Parameters:: dataset (list) – the dataset to consider.
Returns:: An array of initial states of the considered dataset.

compute_J(dataset, gamma=1.0)[source]

Compute the cumulative discounted reward of each episode in the dataset.

Parameters:

dataset (list) – the dataset to consider;
gamma (float, 1.) – discount factor.

Returns:

The cumulative discounted reward of each episode in the dataset.

compute_metrics(dataset, gamma=1.0)[source]

Compute the metrics of each complete episode in the dataset.

Parameters:

dataset (list) – the dataset to consider;
gamma (float, 1.) – the discount factor.

Returns:

The minimum score reached in an episode, the maximum score reached in an episode, the mean score reached, the median score reached, the number of completed episodes.

If no episode has been completed, it returns 0 for all values.

Eligibility trace

EligibilityTrace(shape, name='replacing')[source]

Factory method to create an eligibility trace of the provided type.

Parameters:

shape (list) – shape of the eligibility trace table;
name (str, 'replacing') – type of the eligibility trace.

Returns:

The eligibility trace table of the provided shape and type.

class ReplacingTrace(shape, initial_value=0.0, dtype=None)[source]

Bases: Table

Replacing trace.

reset()[source]

update(state, action)[source]

__init__(shape, initial_value=0.0, dtype=None)

Constructor.

Parameters:

shape (tuple) – the shape of the tabular regressor.
initial_value (float, 0.) – the initial value for each entry of the tabular regressor.
dtype ([int, float], None) – the dtype of the table array.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:: **attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

static _append_folder(folder, name)

static _get_serialization_method(class_name)

static _load_json(zip_file, name)

classmethod _load_list(zip_file, folder, length)

static _load_mushroom(zip_file, name)

static _load_numpy(zip_file, name)

static _load_pickle(zip_file, name)

static _load_torch(zip_file, name)

_post_load(): This method can be overwritten to implement logic that is executed after the loading of the agent.

static _save_json(zip_file, name, obj, folder, **_)

static _save_mushroom(zip_file, name, obj, folder, full_save)

static _save_numpy(zip_file, name, obj, folder, **_)

static _save_pickle(zip_file, name, obj, folder, **_)

static _save_torch(zip_file, name, obj, folder, **_)

copy()

Returns:: A deepcopy of the agent.

fit(x, y)

Parameters:

x (int) – index of the table to be filled;
y (float) – value to fill in the table.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:: path (Path, string) – Relative or absolute path to the agents save location.
Returns:: The loaded agent.

classmethod load_zip(zip_file, folder='')

property n_actions: Returns: The number of actions considered by the table.

predict(*z)

Predict the output of the table given an input.

Parameters:

*z (list) – list of input of the model. If the table is a Q-table,
depending (this list may contain states or states and actions) – on whether the call requires to predict all q-values or only one q-value corresponding to the provided action;

Returns:

The table prediction.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:

path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:

zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.

property shape: Returns: The shape of the table.

class AccumulatingTrace(shape, initial_value=0.0, dtype=None)[source]

Bases: Table

Accumulating trace.

reset()[source]

update(state, action)[source]

__init__(shape, initial_value=0.0, dtype=None)

Constructor.

Parameters:

shape (tuple) – the shape of the tabular regressor.
initial_value (float, 0.) – the initial value for each entry of the tabular regressor.
dtype ([int, float], None) – the dtype of the table array.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:: **attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

static _append_folder(folder, name)

static _get_serialization_method(class_name)

static _load_json(zip_file, name)

classmethod _load_list(zip_file, folder, length)

static _load_mushroom(zip_file, name)

static _load_numpy(zip_file, name)

static _load_pickle(zip_file, name)

static _load_torch(zip_file, name)

_post_load(): This method can be overwritten to implement logic that is executed after the loading of the agent.

static _save_json(zip_file, name, obj, folder, **_)

static _save_mushroom(zip_file, name, obj, folder, full_save)

static _save_numpy(zip_file, name, obj, folder, **_)

static _save_pickle(zip_file, name, obj, folder, **_)

static _save_torch(zip_file, name, obj, folder, **_)

copy()

Returns:: A deepcopy of the agent.

fit(x, y)

Parameters:

x (int) – index of the table to be filled;
y (float) – value to fill in the table.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:: path (Path, string) – Relative or absolute path to the agents save location.
Returns:: The loaded agent.

classmethod load_zip(zip_file, folder='')

property n_actions: Returns: The number of actions considered by the table.

predict(*z)

Predict the output of the table given an input.

Parameters:

*z (list) – list of input of the model. If the table is a Q-table,
depending (this list may contain states or states and actions) – on whether the call requires to predict all q-values or only one q-value corresponding to the provided action;

Returns:

The table prediction.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:

path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:

zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.

property shape: Returns: The shape of the table.

Features

uniform_grid(n_centers, low, high, eta=0.25, cyclic=False)[source]

This function is used to create the parameters of uniformly spaced radial basis functions with eta of overlap. It creates a uniformly spaced grid of n_centers[i] points in each dimension i. Also returns a vector containing the appropriate width of the radial basis functions.

Parameters:

n_centers (list) – number of centers of each dimension;
low (np.ndarray) – lowest value for each dimension;
high (np.ndarray) – highest value for each dimension;
eta (float, 0.25) – overlap between two radial basis functions;
cyclic (bool, False) – whether the state space is a ring or not

Returns:

The uniformly spaced grid and the width vector.

Folder

mk_dir_recursive(dir_path)[source]

Create a directory and, if needed, all the directory tree. Differently from os.mkdir, this function does not raise exception when the directory already exists.

Parameters:: dir_path (str) – the path of the directory to create.

force_symlink(src, dst)[source]

Create a symlink deleting the previous one, if it already exists.

Parameters:

src (str) – source;
dst (str) – destination.

Frames

class LazyFrames(frames, history_length)[source]

Bases: object

From OpenAI Baseline. https://github.com/openai/baselines/blob/master/baselines/common/atari_wrappers.py

This class provides a solution to optimize the use of memory when concatenating different frames, e.g. Atari frames in DQN. The frames are individually stored in a list and, when numpy arrays containing them are created, the reference to each frame is used instead of a copy.

__init__(frames, history_length)[source]

preprocess_frame(obs, img_size)[source]

Convert a frame from rgb to grayscale and resize it.

Parameters:

obs (np.ndarray) – array representing an rgb frame;
img_size (tuple) – target size for images.

Returns:

The transformed frame as 8 bit integer array.

Minibatches

minibatch_number(size, batch_size)[source]

Function to retrieve the number of batches, given a batch sizes.

Parameters:

size (int) – size of the dataset;
batch_size (int) – size of the batches.

Returns:

The number of minibatches in the dataset.

minibatch_generator(batch_size, *dataset)[source]

Generator that creates a minibatch from the full dataset.

Parameters:

batch_size (int) – the maximum size of each minibatch;
dataset – the dataset to be splitted.

Returns:

The current minibatch.

Numerical gradient

numerical_diff_policy(policy, state, action, eps=1e-06)[source]

Compute the gradient of a policy in (state, action) numerically.

Parameters:

policy (Policy) – the policy whose gradient has to be returned;
state (np.ndarray) – the state;
action (np.ndarray) – the action;
eps (float, 1e-6) – the value of the perturbation.

Returns:

The gradient of the provided policy in (state, action) computed numerically.

numerical_diff_dist(dist, theta, eps=1e-06)[source]

Compute the gradient of a distribution in theta numerically.

Parameters:

dist (Distribution) – the distribution whose gradient has to be returned;
theta (np.ndarray) – the parametrization where to compute the gradient;
eps (float, 1e-6) – the value of the perturbation.

Returns:

The gradient of the provided distribution theta computed numerically.

numerical_diff_function(function, params, eps=1e-06)[source]

Compute the gradient of a function in theta numerically.

Parameters:

function – a function whose gradient has to be returned;
params – parameter vector w.r.t. we need to compute the gradient;
eps (float, 1e-6) – the value of the perturbation.

Returns:

The numerical gradient of the function computed w.r.t. parameters params.

Parameters

class Parameter(value, min_value=None, max_value=None, size=(1,))[source]

Bases: Serializable

This class implements function to manage parameters, such as learning rate. It also allows to have a single parameter for each state of state-action tuple.

__init__(value, min_value=None, max_value=None, size=(1,))[source]

Constructor.

Parameters:

value (float) – initial value of the parameter;
min_value (float, None) – minimum value that the parameter can reach when decreasing;
max_value (float, None) – maximum value that the parameter can reach when increasing;
size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.

__call__(*idx, **kwargs)[source]

Update and return the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The updated parameter in the provided index.

get_value(*idx, **kwargs)[source]

Return the current value of the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The current value of the parameter in the provided index.

_compute(*idx, **kwargs)[source]

Returns:: The value of the parameter in the provided index.

update(*idx, **kwargs)[source]

Updates the number of visit of the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter whose number of visits has to be updated.

property shape: Returns: The shape of the table of parameters.

property initial_value: Returns: The initial value of the parameters.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:: **attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load(): This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()

Returns:: A deepcopy of the agent.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:: path (Path, string) – Relative or absolute path to the agents save location.
Returns:: The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:

path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:

zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.

class LinearParameter(value, threshold_value, n, size=(1,))[source]

Bases: Parameter

This class implements a linearly changing parameter according to the number of times it has been used.

__init__(value, threshold_value, n, size=(1,))[source]

Constructor.

Parameters:

value (float) – initial value of the parameter;
min_value (float, None) – minimum value that the parameter can reach when decreasing;
max_value (float, None) – maximum value that the parameter can reach when increasing;
size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.

_compute(*idx, **kwargs)[source]

Returns:: The value of the parameter in the provided index.

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The updated parameter in the provided index.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:: **attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load(): This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()

Returns:: A deepcopy of the agent.

get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The current value of the parameter in the provided index.

property initial_value: Returns: The initial value of the parameters.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:: path (Path, string) – Relative or absolute path to the agents save location.
Returns:: The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:

path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:

zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.

property shape: Returns: The shape of the table of parameters.

update(*idx, **kwargs)

Updates the number of visit of the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter whose number of visits has to be updated.

class ExponentialParameter(value, exp=1.0, min_value=None, max_value=None, size=(1,))[source]

Bases: Parameter

This class implements a exponentially changing parameter according to the number of times it has been used.

__init__(value, exp=1.0, min_value=None, max_value=None, size=(1,))[source]

Constructor.

Parameters:

value (float) – initial value of the parameter;
min_value (float, None) – minimum value that the parameter can reach when decreasing;
max_value (float, None) – maximum value that the parameter can reach when increasing;
size (tuple, (1,)) – shape of the matrix of parameters; this shape can be used to have a single parameter for each state or state-action tuple.

_compute(*idx, **kwargs)[source]

Returns:: The value of the parameter in the provided index.

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The updated parameter in the provided index.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:: **attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load(): This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()

Returns:: A deepcopy of the agent.

get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The current value of the parameter in the provided index.

property initial_value: Returns: The initial value of the parameters.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:: path (Path, string) – Relative or absolute path to the agents save location.
Returns:: The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:

path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:

zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.

property shape: Returns: The shape of the table of parameters.

update(*idx, **kwargs)

Updates the number of visit of the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter whose number of visits has to be updated.

Plots

Replay memory

class ReplayMemory(initial_size, max_size)[source]

Bases: Serializable

This class implements function to manage a replay memory as the one used in “Human-Level Control Through Deep Reinforcement Learning” by Mnih V. et al..

__init__(initial_size, max_size)[source]

Constructor.

Parameters:

initial_size (int) – initial number of elements in the replay memory;
max_size (int) – maximum number of elements that the replay memory can contain.

add(dataset, n_steps_return=1, gamma=1.0)[source]

Add elements to the replay memory.

Parameters:

dataset (list) – list of elements to add to the replay memory;
n_steps_return (int, 1) – number of steps to consider for computing n-step return;
gamma (float, 1.) – discount factor for n-step return.

get(n_samples)[source]

Returns the provided number of states from the replay memory. :param n_samples: the number of samples to return. :type n_samples: int

Returns:: The requested number of samples.

reset()[source]: Reset the replay memory.

property initialized: Returns: Whether the replay memory has reached the number of elements that allows it to be used.

property size: Returns: The number of elements contained in the replay memory.

_post_load()[source]: This method can be overwritten to implement logic that is executed after the loading of the agent.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:: **attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

copy()

Returns:: A deepcopy of the agent.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:: path (Path, string) – Relative or absolute path to the agents save location.
Returns:: The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:

path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:

zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.

class SumTree(max_size)[source]

Bases: object

This class implements a sum tree data structure. This is used, for instance, by PrioritizedReplayMemory.

__init__(max_size)[source]

Constructor.

Parameters:: max_size (int) – maximum size of the tree.

add(dataset, priority, n_steps_return, gamma)[source]

Add elements to the tree.

Parameters:

dataset (list) – list of elements to add to the tree;
priority (np.ndarray) – priority of each sample in the dataset;
n_steps_return (int) – number of steps to consider for computing n-step return;
gamma (float) – discount factor for n-step return.

get(s)[source]

Returns the provided number of states from the replay memory.

Parameters:: s (float) – the value of the samples to return.
Returns:: The requested sample.

update(idx, priorities)[source]

Update the priority of the sample at the provided index in the dataset.

Parameters:

idx (np.ndarray) – indexes of the transitions in the dataset;
priorities (np.ndarray) – priorities of the transitions.

property size: Returns: The current size of the tree.

property max_p: Returns: The maximum priority among the ones in the tree.

property total_p: Returns: The sum of the priorities in the tree, i.e. the value of the root node.

class PrioritizedReplayMemory(initial_size, max_size, alpha, beta, epsilon=0.01)[source]

Bases: Serializable

This class implements function to manage a prioritized replay memory as the one used in “Prioritized Experience Replay” by Schaul et al., 2015.

__init__(initial_size, max_size, alpha, beta, epsilon=0.01)[source]

Constructor.

Parameters:

initial_size (int) – initial number of elements in the replay memory;
max_size (int) – maximum number of elements that the replay memory can contain;
alpha (float) – prioritization coefficient;
beta ([float, Parameter]) – importance sampling coefficient;
epsilon (float, .01) – small value to avoid zero probabilities.

add(dataset, p, n_steps_return=1, gamma=1.0)[source]

Add elements to the replay memory.

Parameters:

dataset (list) – list of elements to add to the replay memory;
p (np.ndarray) – priority of each sample in the dataset.
n_steps_return (int, 1) – number of steps to consider for computing n-step return;
gamma (float, 1.) – discount factor for n-step return.

get(n_samples)[source]

Returns the provided number of states from the replay memory.

Parameters:: n_samples (int) – the number of samples to return.
Returns:: The requested number of samples.

update(error, idx)[source]

Update the priority of the sample at the provided index in the dataset.

Parameters:

error (np.ndarray) – errors to consider to compute the priorities;
idx (np.ndarray) – indexes of the transitions in the dataset.

property initialized: Returns: Whether the replay memory has reached the number of elements that allows it to be used.

property max_priority: Returns: The maximum value of priority inside the replay memory.

_post_load()[source]: This method can be overwritten to implement logic that is executed after the loading of the agent.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:: **attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

copy()

Returns:: A deepcopy of the agent.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:: path (Path, string) – Relative or absolute path to the agents save location.
Returns:: The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:

path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:

zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.

Spaces

class Box(low, high, shape=None)[source]

Bases: Serializable

This class implements functions to manage continuous states and action spaces. It is similar to the Box class in gym.spaces.box.

__init__(low, high, shape=None)[source]

Constructor.

Parameters:

low ([float, np.ndarray]) – the minimum value of each dimension of the space. If a scalar value is provided, this value is considered as the minimum one for each dimension. If a np.ndarray is provided, each i-th element is considered the minimum value of the i-th dimension;
high ([float, np.ndarray]) – the maximum value of dimensions of the space. If a scalar value is provided, this value is considered as the maximum one for each dimension. If a np.ndarray is provided, each i-th element is considered the maximum value of the i-th dimension;
shape (np.ndarray, None) – the dimension of the space. Must match the shape of low and high, if they are np.ndarray.

property low: Returns: The minimum value of each dimension of the space.

property high: Returns: The maximum value of each dimension of the space.

property shape: Returns: The dimensions of the space.

class Discrete(n)[source]

Bases: Serializable

This class implements functions to manage discrete states and action spaces. It is similar to the Discrete class in gym.spaces.discrete.

__init__(n)[source]

Constructor.

Parameters:: n (int) – the number of values of the space.

property size: Returns: The number of elements of the space.

property shape: Returns: The shape of the space that is always (1,).

Table

class Table(shape, initial_value=0.0, dtype=None)[source]

Bases: Serializable

Table regressor. Used for discrete state and action spaces.

__init__(shape, initial_value=0.0, dtype=None)[source]

Constructor.

Parameters:

shape (tuple) – the shape of the tabular regressor.
initial_value (float, 0.) – the initial value for each entry of the tabular regressor.
dtype ([int, float], None) – the dtype of the table array.

fit(x, y)[source]

Parameters:

x (int) – index of the table to be filled;
y (float) – value to fill in the table.

predict(*z)[source]

Predict the output of the table given an input.

Parameters:

*z (list) – list of input of the model. If the table is a Q-table,
depending (this list may contain states or states and actions) – on whether the call requires to predict all q-values or only one q-value corresponding to the provided action;

Returns:

The table prediction.

property n_actions: Returns: The number of actions considered by the table.

property shape: Returns: The shape of the table.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:: **attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load(): This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()

Returns:: A deepcopy of the agent.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:: path (Path, string) – Relative or absolute path to the agents save location.
Returns:: The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:

path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:

zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.

class EnsembleTable(n_models, shape, **params)[source]

Bases: Ensemble

This class implements functions to manage table ensembles.

__init__(n_models, shape, **params)[source]

Constructor.

Parameters:

n_models (int) – number of models in the ensemble;
shape (np.ndarray) – shape of each table in the ensemble.
**params – parameters dictionary to create each regressor.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:: **attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load(): This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()

Returns:: A deepcopy of the agent.

fit(*z, idx=None, **fit_params)

Fit the idx-th model of the ensemble if idx is provided, every model otherwise.

Parameters:

*z – a list containing the inputs to use to predict with each regressor of the ensemble;
idx (int, None) – index of the model to fit;
**fit_params – other params.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:: path (Path, string) – Relative or absolute path to the agents save location.
Returns:: The loaded agent.

property model: Returns: The list of the models in the ensemble.

predict(*z, idx=None, prediction=None, compute_variance=False, **predict_params)

Predict.

Parameters:

*z – a list containing the inputs to use to predict with each regressor of the ensemble;
idx (int, None) – index of the model to use for prediction;
prediction (str, None) – the type of prediction to make. When provided, it overrides the prediction class attribute;
compute_variance (bool, False) – whether to compute the variance of the prediction or not;
**predict_params – other parameters used by the predict method the regressor.

Returns:

The predictions of the model.

reset(): Reset the model parameters.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:

path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:

zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.

Torch

set_weights(parameters, weights, use_cuda)[source]

Function used to set the value of a set of torch parameters given a vector of values.

Parameters:

parameters (list) – list of parameters to be considered;
weights (numpy.ndarray) – array of the new values for the parameters;
use_cuda (bool) – whether the parameters are cuda tensors or not;

get_weights(parameters)[source]

Function used to get the value of a set of torch parameters as a single vector of values.

Parameters:: parameters (list) – list of parameters to be considered.
Returns:: A numpy vector consisting of all the values of the vectors.

zero_grad(parameters)[source]

Function used to set to zero the value of the gradient of a set of torch parameters.

Parameters:: parameters (list) – list of parameters to be considered.

get_gradient(params)[source]

Function used to get the value of the gradient of a set of torch parameters.

Parameters:: parameters (list) – list of parameters to be considered.

to_float_tensor(x, use_cuda=False)[source]

Function used to convert a numpy array to a float torch tensor.

Parameters:

x (np.ndarray) – numpy array to be converted as torch tensor;
use_cuda (bool) – whether to build a cuda tensors or not.

Returns:

A float tensor build from the values contained in the input array.

to_int_tensor(x, use_cuda=False)[source]

Function used to convert a numpy array to a float torch tensor.

Parameters:

x (np.ndarray) – numpy array to be converted as torch tensor;
use_cuda (bool) – whether to build a cuda tensors or not.

Returns:

A float tensor build from the values contained in the input array.

class CategoricalWrapper(*args: Any, **kwargs: Any)[source]

Bases: Categorical

Wrapper for the Torch Categorical distribution.

Needed to convert a vector of mushroom discrete action in an input with the proper shape of the original distribution implemented in torch

__init__(logits)[source]

__call__(*args: Any, **kwargs: Any) → Any: Call self as a function.

class DiagonalMultivariateGaussian(*args: Any, **kwargs: Any)[source]

Bases: Normal

Wrapper for the Torch Normal distribution, implementing a diagonal distribution.

It behaves as the MultivariateNormal distribution, but avoids the computation of the full covariance matrix, optimizing the computation time, particulalrly when a high dimensional vector is sampled.

__init__(loc, scale)[source]

__call__(*args: Any, **kwargs: Any) → Any: Call self as a function.

Value Functions

compute_advantage_montecarlo(V, s, ss, r, absorbing, gamma)[source]

Function to estimate the advantage and new value function target over a dataset. The value function is estimated using rollouts (monte carlo estimation).

Parameters:

V (Regressor) – the current value function regressor;
s (numpy.ndarray) – the set of states in which we want to evaluate the advantage;
ss (numpy.ndarray) – the set of next states in which we want to evaluate the advantage;
r (numpy.ndarray) – the reward obtained in each transition from state s to state ss;
absorbing (numpy.ndarray) – an array of boolean flags indicating if the reached state is absorbing;
gamma (float) – the discount factor of the considered problem.

Returns:

The new estimate for the value function of the next state and the advantage function.

compute_advantage(V, s, ss, r, absorbing, gamma)[source]

Function to estimate the advantage and new value function target over a dataset. The value function is estimated using bootstrapping.

Parameters:

V (Regressor) – the current value function regressor;
s (numpy.ndarray) – the set of states in which we want to evaluate the advantage;
ss (numpy.ndarray) – the set of next states in which we want to evaluate the advantage;
r (numpy.ndarray) – the reward obtained in each transition from state s to state ss;
absorbing (numpy.ndarray) – an array of boolean flags indicating if the reached state is absorbing;
gamma (float) – the discount factor of the considered problem.

Returns:

The new estimate for the value function of the next state and the advantage function.

compute_gae(V, s, ss, r, absorbing, last, gamma, lam)[source]

Function to compute Generalized Advantage Estimation (GAE) and new value function target over a dataset.

“High-Dimensional Continuous Control Using Generalized Advantage Estimation”. Schulman J. et al.. 2016.

Parameters:

V (Regressor) – the current value function regressor;
s (numpy.ndarray) – the set of states in which we want to evaluate the advantage;
ss (numpy.ndarray) – the set of next states in which we want to evaluate the advantage;
r (numpy.ndarray) – the reward obtained in each transition from state s to state ss;
absorbing (numpy.ndarray) – an array of boolean flags indicating if the reached state is absorbing;
last (numpy.ndarray) – an array of boolean flags indicating if the reached state is the last of the trajectory;
gamma (float) – the discount factor of the considered problem;
lam (float) – the value for the lamba coefficient used by GEA algorithm.

Returns:

The new estimate for the value function of the next state and the estimated generalized advantage.

Variance parameters

class VarianceParameter(value, exponential=False, min_value=None, tol=1.0, size=(1,))[source]

Bases: Parameter

Abstract class to implement variance-dependent parameters. A target parameter is expected.

__init__(value, exponential=False, min_value=None, tol=1.0, size=(1,))[source]

Constructor.

Parameters:: tol (float) – value of the variance of the target variable such that The parameter value is 0.5.

_compute(*idx, **kwargs)[source]

Returns:: The value of the parameter in the provided index.

update(*idx, **kwargs)[source]

Updates the value of the parameter in the provided index.

Parameters:

*idx (list) – index of the parameter whose number of visits has to be updated.
target (float) – Value of the target variable;
factor (float) – Multiplicative factor for the parameter value, useful when the parameter depend on another parameter value.

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The updated parameter in the provided index.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:: **attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load(): This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()

Returns:: A deepcopy of the agent.

get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The current value of the parameter in the provided index.

property initial_value: Returns: The initial value of the parameters.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:: path (Path, string) – Relative or absolute path to the agents save location.
Returns:: The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:

path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:

zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.

property shape: Returns: The shape of the table of parameters.

class VarianceIncreasingParameter(value, exponential=False, min_value=None, tol=1.0, size=(1,))[source]

Bases: VarianceParameter

Class implementing a parameter that increases with the target variance.

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The updated parameter in the provided index.

__init__(value, exponential=False, min_value=None, tol=1.0, size=(1,))

Constructor.

Parameters:: tol (float) – value of the variance of the target variable such that The parameter value is 0.5.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:: **attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_compute(*idx, **kwargs)

Returns:: The value of the parameter in the provided index.

_post_load(): This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()

Returns:: A deepcopy of the agent.

get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The current value of the parameter in the provided index.

property initial_value: Returns: The initial value of the parameters.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:: path (Path, string) – Relative or absolute path to the agents save location.
Returns:: The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:

path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:

zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.

property shape: Returns: The shape of the table of parameters.

update(*idx, **kwargs)

Updates the value of the parameter in the provided index.

Parameters:

*idx (list) – index of the parameter whose number of visits has to be updated.
target (float) – Value of the target variable;
factor (float) – Multiplicative factor for the parameter value, useful when the parameter depend on another parameter value.

class VarianceDecreasingParameter(value, exponential=False, min_value=None, tol=1.0, size=(1,))[source]

Bases: VarianceParameter

Class implementing a parameter that decreases with the target variance.

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The updated parameter in the provided index.

__init__(value, exponential=False, min_value=None, tol=1.0, size=(1,))

Constructor.

Parameters:: tol (float) – value of the variance of the target variable such that The parameter value is 0.5.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:: **attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_compute(*idx, **kwargs)

Returns:: The value of the parameter in the provided index.

_post_load(): This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()

Returns:: A deepcopy of the agent.

get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The current value of the parameter in the provided index.

property initial_value: Returns: The initial value of the parameters.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:: path (Path, string) – Relative or absolute path to the agents save location.
Returns:: The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:

path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:

zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.

property shape: Returns: The shape of the table of parameters.

update(*idx, **kwargs)

Updates the value of the parameter in the provided index.

Parameters:

*idx (list) – index of the parameter whose number of visits has to be updated.
target (float) – Value of the target variable;
factor (float) – Multiplicative factor for the parameter value, useful when the parameter depend on another parameter value.

class WindowedVarianceParameter(value, exponential=False, min_value=None, tol=1.0, window=100, size=(1,))[source]

Bases: Parameter

Abstract class to implement variance-dependent parameters. A target parameter is expected. differently from the “Variance Parameter” class the variance is computed in a window interval.

__init__(value, exponential=False, min_value=None, tol=1.0, window=100, size=(1,))[source]

Constructor.

Parameters:

tol (float) – value of the variance of the target variable such that the parameter value is 0.5.
window (int) –

_compute(*idx, **kwargs)[source]

Returns:: The value of the parameter in the provided index.

update(*idx, **kwargs)[source]

Updates the value of the parameter in the provided index.

Parameters:

*idx (list) – index of the parameter whose number of visits has to be updated.
target (float) – Value of the target variable;
factor (float) – Multiplicative factor for the parameter value, useful when the parameter depend on another parameter value.

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The updated parameter in the provided index.

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:: **attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_post_load(): This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()

Returns:: A deepcopy of the agent.

get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The current value of the parameter in the provided index.

property initial_value: Returns: The initial value of the parameters.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:: path (Path, string) – Relative or absolute path to the agents save location.
Returns:: The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:

path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:

zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.

property shape: Returns: The shape of the table of parameters.

class WindowedVarianceIncreasingParameter(value, exponential=False, min_value=None, tol=1.0, window=100, size=(1,))[source]

Bases: WindowedVarianceParameter

Class implementing a parameter that decreases with the target variance, where the variance is computed in a fixed length window.

__call__(*idx, **kwargs)

Update and return the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The updated parameter in the provided index.

__init__(value, exponential=False, min_value=None, tol=1.0, window=100, size=(1,))

Constructor.

Parameters:

tol (float) – value of the variance of the target variable such that the parameter value is 0.5.
window (int) –

_add_save_attr(**attr_dict)

Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.

Parameters:: **attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.

_compute(*idx, **kwargs)

Returns:: The value of the parameter in the provided index.

_post_load(): This method can be overwritten to implement logic that is executed after the loading of the agent.

copy()

Returns:: A deepcopy of the agent.

get_value(*idx, **kwargs)

Return the current value of the parameter in the provided index.

Parameters:: *idx (list) – index of the parameter to return.
Returns:: The current value of the parameter in the provided index.

property initial_value: Returns: The initial value of the parameters.

classmethod load(path)

Load and deserialize the agent from the given location on disk.

Parameters:: path (Path, string) – Relative or absolute path to the agents save location.
Returns:: The loaded agent.

save(path, full_save=False)

Serialize and save the object to the given path on disk.

Parameters:

path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.

save_zip(zip_file, full_save, folder='')

Serialize and save the agent to the given path on disk.

Parameters:

zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.

property shape: Returns: The shape of the table of parameters.

update(*idx, **kwargs)

Updates the value of the parameter in the provided index.

Parameters:

*idx (list) – index of the parameter whose number of visits has to be updated.
target (float) – Value of the target variable;
factor (float) – Multiplicative factor for the parameter value, useful when the parameter depend on another parameter value.

Viewer

class ImageViewer(size, dt)[source]

Bases: object

Interface to pygame for visualizing plain images.

__init__(size, dt)[source]

Constructor.

Parameters:

size ([list, tuple]) – size of the displayed image;
dt (float) – duration of a control step.

display(img)[source]

Display given frame.

Parameters:: img – image to display.

property size

Property.

Returns:: The size of the screen.

close()[source]: Close the viewer, destroy the window.

class Viewer(env_width, env_height, width=500, height=500, background=(0, 0, 0))[source]

Bases: object

Interface to pygame for visualizing mushroom native environments.

__init__(env_width, env_height, width=500, height=500, background=(0, 0, 0))[source]

Constructor.

Parameters:

env_width (float) – The x dimension limit of the desired environment;
env_height (float) – The y dimension limit of the desired environment;
width (int, 500) – width of the environment window;
height (int, 500) – height of the environment window;
background (tuple, (0, 0, 0)) – background color of the screen.

property screen

Property.

Returns:: The screen created by this viewer.

property size

Property.

Returns:: The size of the screen.

line(start, end, color=(255, 255, 255), width=1)[source]

Draw a line on the screen.

Parameters:

start (np.ndarray) – starting point of the line;
end (np.ndarray) – end point of the line;
color (tuple (255, 255, 255)) – color of the line;
width (int, 1) – width of the line.

square(center, angle, edge, color=(255, 255, 255), width=0)[source]

Draw a square on the screen and apply a roto-translation to it.

Parameters:

center (np.ndarray) – the center of the polygon;
angle (float) – the rotation to apply to the polygon;
edge (float) – length of an edge;
color (tuple, (255, 255, 255)) – the color of the polygon;
width (int, 0) – the width of the polygon line, 0 to fill the polygon.

polygon(center, angle, points, color=(255, 255, 255), width=0)[source]

Draw a polygon on the screen and apply a roto-translation to it.

Parameters:

center (np.ndarray) – the center of the polygon;
angle (float) – the rotation to apply to the polygon;
points (list) – the points of the polygon w.r.t. the center;
color (tuple, (255, 255, 255)) – the color of the polygon;
width (int, 0) – the width of the polygon line, 0 to fill the polygon.

circle(center, radius, color=(255, 255, 255), width=0)[source]

Draw a circle on the screen.

Parameters:

center (np.ndarray) – the center of the circle;
radius (float) – the radius of the circle;
color (tuple, (255, 255, 255)) – the color of the circle;
width (int, 0) – the width of the circle line, 0 to fill the circle.

arrow_head(center, scale, angle, color=(255, 255, 255))[source]

Draw an harrow head.

Parameters:

center (np.ndarray) – the position of the arrow head;
scale (float) – scale of the arrow, correspond to the length;
angle (float) – the angle of rotation of the angle head;
color (tuple, (255, 255, 255)) – the color of the arrow.

force_arrow(center, direction, force, max_force, max_length, color=(255, 255, 255), width=1)[source]

Draw a force arrow, i.e. an arrow representing a force. The length of the arrow is directly proportional to the force value.

Parameters:

center (np.ndarray) – the point where the force is applied;
direction (np.ndarray) – the direction of the force;
force (float) – the applied force value;
max_force (float) – the maximum force value;
max_length (float) – the length to use for the maximum force;
color (tuple, (255, 255, 255)) – the color of the arrow;
width (int, 1) – the width of the force arrow.

torque_arrow(center, torque, max_torque, max_radius, color=(255, 255, 255), width=1)[source]

Draw a torque arrow, i.e. a circular arrow representing a torque. The radius of the arrow is directly proportional to the torque value.

Parameters:

center (np.ndarray) – the point where the torque is applied;
torque (float) – the applied torque value;
max_torque (float) – the maximum torque value;
max_radius (float) – the radius to use for the maximum torque;
color (tuple, (255, 255, 255)) – the color of the arrow;
width (int, 1) – the width of the torque arrow.

background_image(img)[source]

Use the given image as background for the window, rescaling it appropriately.

Parameters:: img – the image to be used.

function(x_s, x_e, f, n_points=100, width=1, color=(255, 255, 255))[source]

Draw the graph of a function in the image.

Parameters:

x_s (float) – starting x coordinate;
x_e (float) – final x coordinate;
f (function) – the function that maps x coorinates into y coordinates;
n_points (int, 100) – the number of segments used to approximate the function to draw;
width (int, 1) – thw width of the line drawn;
color (tuple, (255,255,255)) – the color of the line.

static get_frame()[source]

Getter.

Returns:: The current Pygame surface as an RGB array.

display(s)[source]

Display current frame and initialize the next frame to the background color.

Parameters:: s – time to wait in visualization.

close()[source]: Close the viewer, destroy the window.

class CV2Viewer(window_name, dt, width, height)[source]

Bases: object

Simple viewer to display rendered images using cv2.

__init__(window_name, dt, width, height)[source]

display(img)[source]

Displays an image.

Parameters:: img (np.array) – Image to display

_wait()[source]: Wait for the specified amount of time. Time is supposed to be in milliseconds.

_window_was_closed()[source]

Check if a window was closed.

Returns:: True if the window was closed.