Approximators

MushroomRL provides a hierarchy of approximator classes for both tabular and function-approximation settings.

The base class Approximator dispatches to an Ensemble of models when n_models > 1 is passed to the constructor. This means any approximator subclass (Table, LinearApproximator, …) can be turned into an ensemble simply by passing n_models:

# single model
q = Table(shape=(10, 4))

# ensemble of 5 tables — returns an Ensemble instance transparently
q = Table(n_models=5, shape=(10, 4))

Approximator

class Approximator(*args, n_models=1, **kwargs)[source]

Bases: MushroomObject

Base class for all approximators. Handles logger attachment and dispatches to an Ensemble when n_models > 1 is requested at construction time.

__init__(input_shape, output_shape, backend='numpy')[source]

Constructor.

Parameters:

input_shape (tuple, list) – shape of the input. A plain tuple for a single input, or a list of shape tuples (one per input) for a model that takes several distinct inputs (e.g. a critic taking state and action separately);
output_shape (tuple, list) – shape of the output. A plain tuple for a single output, or a list of shape tuples (one per output) for a model that produces several outputs;
backend (str, 'numpy') – array backend to use.

predict(*args, **kwargs)[source]

Predict the output of the model given an input.

Parameters:

*args – list of inputs;
**kwargs – other parameters used by the predict method of the regressor.

Returns:

The model prediction.

__call__(*z, **kw)[source]: Call self as a function.

property input_shape: Returns: The shape of the input of the approximator.

property output_shape: Returns: The shape of the output of the approximator.

class Ensemble(*args, n_models=1, **kwargs)[source]

Bases: Approximator

This class is used to create an ensemble of approximators.

__init__(model, n_models, prediction='mean', backend='numpy', **params)[source]

Constructor.

Parameters:

model (class) – the model class to use for each element of the ensemble;
n_models (int) – number of models in the ensemble;
prediction (str, 'mean') – the type of prediction to make across models. One of 'mean', 'sum', 'min', 'max';
backend (str, 'numpy') – array backend to use;
**params – parameters dictionary to create each model.

predict(*z, idx=None, prediction=None, compute_variance=False, **predict_params)[source]

Predict.

Parameters:

*z – list of inputs to use to predict with each model;
idx (int, None) – index of the model to use for prediction. If None, all models are used and aggregated according to prediction;
prediction (str, None) – aggregation mode, overrides the constructor default. One of 'mean', 'sum', 'min', 'max', or None to return all predictions stacked along axis 0;
compute_variance (bool, False) – if True, also return the variance across models;
**predict_params – other parameters passed to each model’s predict method.

Returns:

The stacked predictions along axis 0 if prediction is None, the aggregated predictions otherwise, or a list [predictions, variance] if compute_variance is True.

set_logger(logger, prefix=None, label=None)[source]

Attach the logger to each model of the ensemble so that every model logs its own loss during its fit. The model index is appended to the metric name (e.g. critic/loss_0).

Parameters:

logger (Logger) – the logger object;
prefix (str, None) – optional group prepended to the logged metric names;
label (str, None) – optional name used for the loss. Defaults to 'loss'.

property n_actions: Returns: The number of actions of the first model in the ensemble.

reset()[source]: Reset the parameters of all models in the ensemble.

Q-Approximator

QApproximator is the high-level interface for Q-function approximation, used in classical (non-deep RL) algorithms with function approximators. This design allows to integrate many types of approximators, including sklearn-style regressors seamlessly. Its constructor dispatches to one of three concrete implementations depending on the arguments:

QApproximatorSimple — single multi-output model (output_shape[0] == n_actions);
QApproximatorAction — one independent model per action;
QApproximatorEnsemble — ensemble of Q-approximators (n_models > 1).

class QApproximator(approximator=None, n_actions=None, output_shape=(1,), n_models=1, **kwargs)[source]

Bases: Approximator

Interface for Q-function approximators. Selects the appropriate concrete subclass based on the construction arguments:

n_models > 1: QApproximatorEnsemble — ensemble of independent Q-approximators;
output_shape[0] != n_actions: QApproximatorAction — one model per action;
output_shape[0] == n_actions: QApproximatorSimple — single multi-output model.

class QApproximatorSimple(approximator=None, n_actions=None, output_shape=(1,), n_models=1, **kwargs)[source]

Bases: QApproximator

Approximates the Q-function with a single multi-output model where each output corresponds to the Q-value of one action. Used when output_shape[0] == n_actions.

__init__(approximator, n_actions, output_shape=(1,), n_models=1, input_shape=None, **params)[source]

Constructor.

Parameters:

approximator (class) – the model class to approximate the Q-function;
n_actions (int) – number of actions;
output_shape (tuple, (1,)) – shape of the output of the model;
input_shape (tuple, None) – shape of the input of the model;
**params – other parameters passed to the approximator.

predict(*z, **predict_params)[source]

Predict.

Parameters:

*z – either (state,) to get all Q-values, or (state, action) to get the Q-value of the selected action;
**predict_params – other parameters passed to the model’s predict method.

Returns:

The Q-value predictions.

property model: Returns: The underlying model.

property weights_size: Returns: The size of the array of weights.

get_weights()[source]

Returns:: The set of weights of the model.

set_weights(w)[source]

Setter.

Parameters:: w – the set of weights to set.

diff(state, action=None)[source]

Compute the derivative of the output w.r.t. the model parameters.

Parameters:

state – the state input;
action (int, None) – if provided, return the derivative for that action only.

Returns:

The gradient of the Q-value w.r.t. the model parameters.

reset()[source]: Reset the model parameters.

class QApproximatorAction(approximator=None, n_actions=None, output_shape=(1,), n_models=1, **kwargs)[source]

Bases: QApproximator

Approximates the Q-function with one independent model per action. Used when output_shape[0] != n_actions, typically in MDPs with discrete actions where a separate approximator is trained for each action.

__init__(approximator, n_actions, output_shape=(1,), n_models=1, input_shape=None, **params)[source]

Constructor.

Parameters:

approximator (class) – the model class to approximate the Q-function of each action;
n_actions (int) – number of actions, determines the number of models created;
output_shape (tuple, (1,)) – shape of the output of each model;
input_shape (tuple, None) – shape of the input of each model;
**params – other parameters passed to each model.

predict(*z, **predict_params)[source]

Predict.

Parameters:

*z – either (state,) to get all Q-values, or (state, action) to get the Q-value of the selected action;
**predict_params – other parameters passed to each model’s predict method.

Returns:

The Q-value predictions.

property model: Returns: The list of per-action models.

property weights_size: Returns: The total size of the weights across all action models.

get_weights()[source]

Returns:: The concatenated weights of all action models.

set_weights(w)[source]

Setter. Splits w evenly across action models.

Parameters:: w – the set of weights to set.

diff(state, action=None)[source]

Compute the derivative of the output w.r.t. the model parameters.

Parameters:

state – the state input;
action (int, None) – if provided, return a zero-padded gradient vector with the derivative of the selected action’s model in the corresponding block.

Returns:

A list of per-action gradients when action is None, or a single zero-padded gradient vector otherwise.

reset()[source]: Reset the parameters of all action models.

class QApproximatorEnsemble(approximator=None, n_actions=None, output_shape=(1,), n_models=1, **kwargs)[source]

Bases: QApproximator, Ensemble

Ensemble of QApproximator models. Each model is an independent QApproximatorSimple or QApproximatorAction depending on the output shape.

__init__(approximator, n_actions, output_shape=(1,), n_models=1, prediction='mean', **params)[source]

Constructor.

Parameters:

approximator (class) – the model class for each ensemble member;
n_actions (int) – number of actions;
output_shape (tuple, (1,)) – shape of the output of each model;
n_models (int) – number of models in the ensemble;
prediction (str, 'mean') – aggregation mode across models. One of 'mean', 'sum', 'min', 'max';
**params – other parameters passed to each model.

property weights_size: Returns: The shape of the stacked weights matrix (n_models, weights_size_per_model).

get_weights()[source]

Returns:: The stacked weights of all models, shape (n_models, weights_size_per_model).

set_weights(w)[source]

Set weights for each model in the ensemble independently.

Parameters:: w – stacked weights of shape (n_models, weights_size_per_model).

diff(state, action=None)[source]

Compute the derivative of the output w.r.t. the model parameters for each model, stacked.

Parameters:

state – the state input;
action (int, None) – if provided, return gradients for that action only.

Returns:

The stacked derivatives of all models w.r.t. the model parameters.

Tabular

The simplest approximation type in RL is the table. The Tabular approximator can be used in settings where both state and action are discrete, or can be discretized in a simple way.

class Table(*args, n_models=1, **kwargs)[source]

Bases: Approximator

Table approximator. Used for discrete state and action spaces.

__init__(shape, initial_value=0.0, dtype=None)[source]

Constructor.

Parameters:

shape (tuple) – the shape of the tabular regressor.
initial_value (float, 0.) – the initial value for each entry of the tabular regressor.
dtype ([int, float], None) – the dtype of the table array.

predict(*z)[source]

Predict the output of the table given an input.

Parameters:

*z (list) – list of input of the model. If the table is a Q-table,
depending (this list may contain states or states and actions) – on whether the call requires to predict all q-values or only one q-value corresponding to the provided action;

Returns:

The table prediction.

property n_actions: Returns: The number of actions considered by the table.

property shape: Returns: The shape of the table.

Parametric

The parametric approximators are the most common ones, and allow to learn a function by learning its parameter vector. This approximators are often also differentiable. Mushroom implements many common parametric approximators, and allows the user to set the parameters and compute the gradient.

Linear

class LinearApproximator(*args, n_models=1, **kwargs)[source]

Bases: Approximator

This class implements a linear approximator.

__init__(weights=None, input_shape=None, output_shape=(1,), phi=None, **kwargs)[source]

Constructor.

Parameters:

weights (np.ndarray) – array of weights to initialize the weights of the approximator;
input_shape (np.ndarray, None) – the shape of the input of the model;
output_shape (np.ndarray, (1,)) – the shape of the output of the model;
phi (object, None) – features to extract from the state;
**kwargs – other params of the approximator.

predict(x, **predict_params)[source]

Predict the output of the model given an input.

Parameters:

*args – list of inputs;
**kwargs – other parameters used by the predict method of the regressor.

Returns:

The model prediction.

property weights_size: Returns: The size of the array of weights.

get_weights()[source]

Getter.

Returns:: The set of weights of the approximator.

set_weights(w)[source]

Setter.

Parameters:: w (np.ndarray) – the set of weights to set.

diff(state, action=None)[source]

Compute the derivative of the output w.r.t. state, and action if provided.

Parameters:

state (np.ndarray) – the state;
action (np.ndarray, None) – the action.

Returns:

The derivative of the output w.r.t. state, and action if provided.

CMAC

class CMAC(*args, n_models=1, **kwargs)[source]

Bases: LinearApproximator

This class implements a Cerebellar Model Arithmetic Computer.

__init__(tilings, weights=None, output_shape=(1,), **kwargs)[source]

Constructor.

Parameters:

tilings (list) – list of tilings to discretize the input space.
weights (np.ndarray) – array of weights to initialize the weights of the approximator;
input_shape (np.ndarray, None) – the shape of the input of the model;
output_shape (np.ndarray, (1,)) – the shape of the output of the model;
**kwargs – other params of the approximator.

predict(x, **predict_params)[source]

Predict.

Parameters:

x (np.ndarray) – input;
**predict_params – other parameters used by the predict method the regressor.

Returns:

The predictions of the model.

diff(state, action=None)[source]

Compute the derivative of the output w.r.t. state, and action if provided.

Parameters:

state (np.ndarray) – the state;
action (np.ndarray, None) – the action.

Returns:

The derivative of the output w.r.t. state, and action if provided.

Torch Approximator

class TorchApproximator(input_shape=None, output_shape=None, network=None, optimizer=None, loss=None, batch_size=0, n_fit_targets=1, reinitialize=False, dropout=False, quiet=True, n_models=None, **params)[source]

Bases: Approximator

Class to interface a pytorch model to the mushroom Regressor interface. This class implements all is needed to use a generic pytorch model and train it using a specified optimizer and objective function. This class supports also minibatches. When n_models > 1, construction dispatches to TorchEnsemble.

__init__(input_shape, output_shape, network, optimizer=None, loss=None, batch_size=0, n_fit_targets=1, reinitialize=False, dropout=False, quiet=True, n_models=None, **params)[source]

Constructor.

Parameters:

input_shape (tuple, list) – shape of the input of the network. A plain tuple for a single-input network, or a list of shape tuples (one per positional input) for a network that takes several distinct inputs (e.g. a critic taking state and action separately);
output_shape (tuple, list) – shape of the output of the network. A plain tuple for a single-output network, or a list of shape tuples (one per output tensor) for a network whose forward returns several tensors; the number of outputs to parse is derived from this, not passed separately;
network (torch.nn.Module) – the network class to use;
optimizer (dict) – the optimizer used for every fit step;
loss (torch.nn.functional) – the loss function to optimize in the fit method;
batch_size (int, 0) – the size of each minibatch. If 0, the whole dataset is fed to the optimizer at each epoch;
n_fit_targets (int, 1) – the number of fit targets used by the fit method of the network;
reinitialize (bool, False) – if True, the approximator is reinitialized at every fit call. To perform the initialization, the weights_init method must be defined properly for the selected model network;
dropout (bool, False) – if True, dropout is applied only during train;
quiet (bool, True) – if False, shows two progress bars, one for epochs and one for the minibatches;
**params – dictionary of parameters needed to construct the network.

predict(*args, **kwargs)[source]

Predict.

Parameters:

*args – input;
**kwargs – other parameters used by the predict method of the regressor.

Returns:

The predictions of the model.

parameters()[source]

Returns:: The list of parameters of the network.

property loss_fit: Returns: The average loss of the last epoch of the last fit call.

set_learning_rate(lr)[source]

Set the learning rate of the optimizer.

Parameters:: lr (float) – the new learning rate.

set_weights(weights)[source]

Setter.

Parameters:: weights (np.ndarray) – the set of weights to set.

get_weights()[source]

Getter.

Returns:: The set of weights of the approximator.

property weights_size: Returns: The size of the array of weights.

diff(*args, **kwargs)[source]

Compute the derivative of the output w.r.t. state, and action if provided.

Parameters:

*args – input;
**kwargs – other parameters used by the diff method of the regressor.

Returns:

The derivative of the output w.r.t. state, and action if provided.

class TorchEnsemble(*args, n_models=1, **kwargs)[source]

Bases: Ensemble

Ensemble of TorchApproximator models trained in parallel using torch.func.vmap and torch.func.grad. Constructed automatically by TorchApproximator when n_models > 1.

__init__(input_shape, output_shape, network, optimizer=None, loss=None, batch_size=0, n_fit_targets=1, reinitialize=False, dropout=False, quiet=True, n_models=None, prediction=None, **params)[source]

Constructor.

Parameters:

input_shape (tuple, list) – shape of the input of the network. A plain tuple for a single-input network, or a list of shape tuples (one per positional input) for a network that takes several distinct inputs;
output_shape (tuple, list) – shape of the output of the network. A plain tuple for a single-output network, or a list of shape tuples (one per output tensor) for a network whose forward returns several tensors;
network (torch.nn.Module) – the network class to use;
optimizer (dict) – the optimizer used for every fit step;
loss (torch.nn.functional) – the loss function to optimize in the fit method;
batch_size (int, 0) – the size of each minibatch. If 0, the whole dataset is fed to the optimizer at each epoch;
n_fit_targets (int, 1) – the number of fit targets used by the fit method of the network;
reinitialize (bool, False) – if True, the approximator is reinitialized at every fit call;
dropout (bool, False) – if True, dropout is applied only during train;
quiet (bool, True) – if False, shows a progress bar over epochs;
n_models (int) – number of models in the ensemble;
prediction (str, None) – how to aggregate predictions across models. One of 'mean', 'min', 'max', 'sum', or None to return all predictions;
**params – dictionary of parameters needed to construct the network.

predict(*args, idx=None, prediction=None, **kwargs)[source]

Predict.

Parameters:

*args – input;
idx (int, None) – if provided, use only the model at that index;
prediction (str, None) – aggregation mode, overrides the constructor default. One of 'mean', 'min', 'max', 'sum', or None to return all;
**kwargs – other parameters used by the predict method of the regressor.

Returns:

The predictions of the model, aggregated according to prediction.

parameters()[source]

Returns:: The concatenated parameters of all models in the ensemble.

property network: Returns: The network of the first model in the ensemble.

property loss_fit: Returns: List of per-model losses from the last fit call.

set_learning_rate(lr)[source]

Set the learning rate of the optimizer of all models in the ensemble.

Parameters:: lr (float) – the new learning rate.

set_weights(weights)[source]

Set weights for each model in the ensemble independently.

Parameters:: weights – tensor of shape (n_models, weights_size_per_model).

get_weights()[source]

Returns:: The stacked weights of all models, shape (n_models, weights_size_per_model).

property weights_size: Returns: The shape of the stacked weights matrix (n_models, weights_size_per_model).

diff(*args, **kwargs)[source]

Compute the derivative of the output w.r.t. the input for each model, stacked.

Parameters:

*args – input;
**kwargs – other parameters used by the diff method of the regressor.

Returns:

The stacked derivatives of all models w.r.t. the input, shape (n_models, weights_size, n_outputs).

class NumpyTorchApproximator(input_shape=None, output_shape=None, network=None, optimizer=None, loss=None, batch_size=0, n_fit_targets=1, reinitialize=False, dropout=False, quiet=True, n_models=None, **params)[source]

Bases: TorchApproximator

Wrapper to get a Numpy interface to the TorchApproximator class. This class allows you to use the torch approximator with numpy backend algorithms.

predict(*args, **kwargs)[source]

Predict.

Parameters:

*args – input;
**kwargs – other parameters used by the predict method of the regressor.

Returns:

The predictions of the model.

diff(*args, **kwargs)[source]

Compute the derivative of the output w.r.t. state, and action if provided.

Parameters:

*args – input;
**kwargs – other parameters used by the diff method of the regressor.

Returns:

The derivative of the output w.r.t. state, and action if provided.

set_weights(weights)[source]

Setter.

Parameters:: weights (np.ndarray) – the set of weights to set.

get_weights()[source]

Getter.

Returns:: The set of weights of the approximator.

Networks

Pre-built PyTorch network architectures for use with TorchApproximator.

class AtariNetwork(input_shape, output_shape, **kwargs)[source]

Bases: Module

Convolutional network for Atari from pixel observations, outputting the Q-values for every action.

__init__(input_shape, output_shape, **kwargs)[source]

Constructor.

Parameters:

input_shape (tuple) – shape of the input image (channels, height, width);
output_shape (tuple) – shape of the output (one Q-value per action);
**kwargs – other parameters (unused).

forward(state, action=None)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class AtariFeatureNetwork(input_shape, output_shape, **kwargs)[source]

Bases: Module

Convolutional feature extractor for Atari, sharing the same body as AtariNetwork but returning the features instead of the Q-values. Used as features_network by the distributional networks.

__init__(input_shape, output_shape, **kwargs)[source]

Constructor.

Parameters:

input_shape (tuple) – shape of the input image (channels, height, width);
output_shape (tuple) – shape of the output;
**kwargs – other parameters (unused).

forward(state, action=None)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class ActorNetwork(input_shape, output_shape, n_features, n_layers=2, activation='relu', gain_scale=1.0, weights_init='xavier', bias_init=None, **kwargs)[source]

Bases: Module

Simple feedforward actor network mapping the state to the network output (e.g. the action).

__init__(input_shape, output_shape, n_features, n_layers=2, activation='relu', gain_scale=1.0, weights_init='xavier', bias_init=None, **kwargs)[source]

Constructor.

Parameters:

input_shape (tuple) – shape of the input (the state);
output_shape (tuple) – shape of the output (e.g. the action);
n_features ([int, list]) – size of the hidden layers, or a list of sizes for each of them;
n_layers (int, 2) – number of hidden layers, used only if n_features is an int;
activation (str, 'relu') – activation function for the hidden layers;
gain_scale (float, 1.0) – scaling factor for the weights initialization gain;
weights_init (str, 'xavier') – weights initialization method;
bias_init (str, None) – bias initialization method;
**kwargs – other parameters (unused).

forward(state, **kwargs)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class CriticNetwork(input_shape, output_shape, n_features, n_layers=2, activation='relu', gain_scale=1.0, weights_init='xavier', bias_init=None, **kwargs)[source]

Bases: Module

Simple critic network taking state and action as two separate inputs.

__init__(input_shape, output_shape, n_features, n_layers=2, activation='relu', gain_scale=1.0, weights_init='xavier', bias_init=None, **kwargs)[source]

Constructor.

Parameters:

input_shape (list) – the network has two inputs, so this must be [state_shape, action_shape];
output_shape (tuple) – shape of the output (the Q-value);
n_features ([int, list]) – size of the hidden layers, or a list of sizes for each of them;
n_layers (int, 2) – number of hidden layers, used only if n_features is an int;
activation (str, 'relu') – activation function for the hidden layers;
gain_scale (float, 1.0) – scaling factor for the weights initialization gain;
weights_init (str, 'xavier') – weights initialization method;
bias_init (str, None) – bias initialization method;
**kwargs – other parameters (unused).

forward(state, action)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class QNetwork(input_shape, output_shape, n_features, n_layers=2, activation='relu', gain_scale=1.0, weights_init='xavier', bias_init=None, **kwargs)[source]

Bases: Module

Simple feedforward network outputting the Q-values for every discrete action given the state, or the Q-value of a specific action when one is provided.

__init__(input_shape, output_shape, n_features, n_layers=2, activation='relu', gain_scale=1.0, weights_init='xavier', bias_init=None, **kwargs)[source]

Constructor.

Parameters:

input_shape (tuple) – shape of the input (the state);
output_shape (tuple) – shape of the output (one Q-value per discrete action);
n_features ([int, list]) – size of the hidden layers, or a list of sizes for each of them;
n_layers (int, 2) – number of hidden layers, used only if n_features is an int;
activation (str, 'relu') – activation function for the hidden layers;
gain_scale (float, 1.0) – scaling factor for the weights initialization gain;
weights_init (str, 'xavier') – weights initialization method;
bias_init (str, None) – bias initialization method;
**kwargs – other parameters (unused).

forward(state, action=None, **kwargs)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class LinearNetwork(input_shape, output_shape, use_bias=False, gain=None, **kwargs)[source]

Bases: Module

Single fully connected layer mapping the state to the network output, with no activation function.

__init__(input_shape, output_shape, use_bias=False, gain=None, **kwargs)[source]

Constructor.

Parameters:

input_shape (tuple) – shape of the input (the state);
output_shape (tuple) – shape of the output;
use_bias (bool, False) – whether to add a bias term to the linear layer;
gain (float, None) – gain used for the weights initialization; if None, the linear gain is used;
**kwargs – other parameters (unused).

forward(state, **kwargs)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class DuelingNetwork(input_shape, output_shape, features_network, n_features, avg_advantage, **kwargs)[source]

Bases: Module

Dueling architecture for DQN, splitting the shared features into a state-value stream and an advantage stream that are recombined into the Q-values.

__init__(input_shape, output_shape, features_network, n_features, avg_advantage, **kwargs)[source]

Constructor.

Parameters:

input_shape (tuple) – shape of the input (the state);
output_shape (tuple) – shape of the output (the number of actions);
features_network (nn.Module) – the network used to compute the features;
n_features (int) – number of features extracted by the features network;
avg_advantage (bool) – whether to subtract the mean (True) or the max (False) advantage;
**kwargs – parameters forwarded to the features network.

forward(state, action=None)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class NoisyNetwork(input_shape, output_shape, features_network, n_features, **kwargs)[source]

Bases: Module

Network for Noisy DQN, outputting the Q-values through a noisy linear layer whose learnable noise provides state-dependent exploration.

class NoisyLinear(in_features, out_features, sigma_coeff=0.5, bias=True)[source]

Bases: Module

Factorized noisy linear layer, adding learnable Gaussian noise to the weights and biases as described in “Noisy Networks for Exploration” by Fortunato et al.

__init__(in_features, out_features, sigma_coeff=0.5, bias=True)[source]

Constructor.

Parameters:

in_features (int) – size of each input sample;
out_features (int) – size of each output sample;
sigma_coeff (float, .5) – scaling coefficient for the initial noise standard deviation;
bias (bool, True) – whether to add a learnable (noisy) bias term.

forward(input)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

extra_repr()[source]

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

__init__(input_shape, output_shape, features_network, n_features, **kwargs)[source]

Constructor.

Parameters:

input_shape (tuple) – shape of the input (the state);
output_shape (tuple) – shape of the output (the number of actions);
features_network (nn.Module) – the network used to compute the features;
n_features (int) – number of features extracted by the features network;
**kwargs – parameters forwarded to the features network.

forward(state, action=None)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class CategoricalNetwork(input_shape, output_shape, features_network, n_atoms, v_min, v_max, n_features, **kwargs)[source]

Bases: Module

Distributional network for Categorical DQN (C51), modeling the value distribution of each action as a categorical distribution over a fixed support of n_atoms between v_min and v_max.

__init__(input_shape, output_shape, features_network, n_atoms, v_min, v_max, n_features, **kwargs)[source]

Constructor.

Parameters:

input_shape (tuple) – shape of the input (the state);
output_shape (tuple) – shape of the output (the number of actions);
features_network (nn.Module) – the network used to compute the features;
n_atoms (int) – number of atoms of the support of the value distribution;
v_min (float) – minimum value of the support;
v_max (float) – maximum value of the support;
n_features (int) – number of features extracted by the features network;
**kwargs – parameters forwarded to the features network.

forward(state, action=None, get_distribution=False)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class QuantileNetwork(input_shape, output_shape, features_network, n_quantiles, n_features, **kwargs)[source]

Bases: Module

Distributional network for Quantile Regression DQN (QR-DQN), approximating the value distribution of each action with n_quantiles quantiles whose mean gives the Q-value.

__init__(input_shape, output_shape, features_network, n_quantiles, n_features, **kwargs)[source]

Constructor.

Parameters:

input_shape (tuple) – shape of the input (the state);
output_shape (tuple) – shape of the output (the number of actions);
features_network (nn.Module) – the network used to compute the features;
n_quantiles (int) – number of quantiles used to approximate the value distribution;
n_features (int) – number of features extracted by the features network;
**kwargs – parameters forwarded to the features network.

forward(state, action=None, get_quantiles=False)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class RainbowNetwork(input_shape, output_shape, features_network, n_atoms, v_min, v_max, n_features, sigma_coeff, **kwargs)[source]

Bases: Module

Network for Rainbow, combining a distributional categorical head (n_atoms over [v_min, v_max]) with a dueling architecture built from noisy linear layers.

__init__(input_shape, output_shape, features_network, n_atoms, v_min, v_max, n_features, sigma_coeff, **kwargs)[source]

Constructor.

Parameters:

input_shape (tuple) – shape of the input (the state);
output_shape (tuple) – shape of the output (the number of actions);
features_network (nn.Module) – the network used to compute the features;
n_atoms (int) – number of atoms of the support of the value distribution;
v_min (float) – minimum value of the support;
v_max (float) – maximum value of the support;
n_features (int) – number of features extracted by the features network;
sigma_coeff (float) – scaling coefficient for the initial noise standard deviation;
**kwargs – parameters forwarded to the features network.

forward(state, action=None, get_distribution=False)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class RecurrentActorNetwork(input_shape, output_shape, n_features, rnn_type, n_hidden_features, num_hidden_layers=1, use_prev_action=False, **kwargs)[source]

Bases: Module

Recurrent actor network returning both the action mean and the next hidden state, so output_shape must be [action_shape, policy_state_shape].

__init__(input_shape, output_shape, n_features, rnn_type, n_hidden_features, num_hidden_layers=1, use_prev_action=False, **kwargs)[source]

Constructor.

Parameters:

input_shape (tuple) – shape of the input (the state);
output_shape (list) – the network has two outputs, so this must be [action_shape, policy_state_shape];
n_features (int) – size of the layers feeding into and out of the recurrent layer;
rnn_type (str) – type of recurrent layer, see TorchUtils.get_recurrent_network;
n_hidden_features (int) – size of the recurrent layer’s hidden state;
num_hidden_layers (int, 1) – number of stacked recurrent layers;
use_prev_action (bool, False) – whether the previous action is concatenated to the observation of each timestep;
**kwargs – other parameters (unused).

forward(state, policy_state, lengths, action_history=None)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class RecurrentCriticNetwork(input_shape, output_shape, dim_action, rnn_type, n_hidden_features=128, n_features=128, num_hidden_layers=1, hidden_state_treatment='zero_initial', use_prev_action=False, **kwargs)[source]

Bases: Module

Recurrent critic network, returning the value function of the input sequence.

__init__(input_shape, output_shape, dim_action, rnn_type, n_hidden_features=128, n_features=128, num_hidden_layers=1, hidden_state_treatment='zero_initial', use_prev_action=False, **kwargs)[source]

Constructor.

Parameters:

input_shape (tuple) – shape of the input (the state);
output_shape (tuple) – shape of the output (the value function);
dim_action (int) – dimensionality of the action space;
rnn_type (str) – type of recurrent layer, see TorchUtils.get_recurrent_network;
n_hidden_features (int, 128) – size of the recurrent layer’s hidden state;
n_features (int, 128) – size of the layers feeding into and out of the recurrent layer;
num_hidden_layers (int, 1) – number of stacked recurrent layers;
hidden_state_treatment (str, 'zero_initial') – either 'zero_initial', to start the recurrent layer from a zero hidden state, or 'use_policy_hidden_state', to start it from the policy’s hidden state instead;
use_prev_action (bool, False) – whether the previous action is concatenated to the observation of each timestep;
**kwargs – other parameters (unused).

forward(state, policy_state, lengths, action_history=None)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.