How to use approximators

MushroomRL provides an interface for approximator classes to perform function approximation. The root class is Approximator; every concrete subclass (Table, LinearApproximator, TorchApproximator, …) automatically dispatches to an Ensemble when n_models > 1 is passed to the constructor, allowing ensemble methods to be built without any extra glue code:

# single table
q = Table(shape=(10, 4))

# ensemble of 5 tables — returns an Ensemble transparently
q = Table(n_models=5, shape=(10, 4))

Function approximation

An approximator is a class representing any type of function approximation (parametric, non-parametric, tabular). It exposes a fit / predict interface, and in case of parametric functions, provides weight and gradient access.

The example below fits a LinearApproximator to points sampled from a line with Gaussian noise. Polynomial features of degree 1 are built by hand so that the model can learn both slope and intercept:

import numpy as np
from matplotlib import pyplot as plt

from mushroom_rl.approximators.parametric import LinearApproximator


x = np.arange(10).reshape(-1, 1)

intercept = 10
noise = np.random.randn(10, 1) * 1
y = 2 * x + intercept + noise

phi = np.concatenate((np.ones(10).reshape(-1, 1), x), axis=1)

approximator = LinearApproximator(input_shape=(2,), output_shape=(1,))
approximator.fit(phi, y)

After fitting, the weights, the gradient at a specific input, and a plot of the approximated function can be obtained:

print('Weights: ' + str(approximator.get_weights()))
print('Gradient: ' + str(approximator.diff(np.array([5.]))))

plt.scatter(x, y)
plt.plot(x, approximator.predict(phi))
plt.show()

Q-function approximation

For classical RL algorithms with discrete action spaces, MushroomRL provides QApproximator — a unified interface that selects the appropriate concrete implementation based on the constructor arguments:

n_models > 1: QApproximatorEnsemble — ensemble of independent Q-approximators;
output_shape[0] != n_actions: QApproximatorAction — one independent model per action;
output_shape[0] == n_actions: QApproximatorSimple — a single multi-output model with one output per action.

Algorithms that accept a parametric approximator class (e.g. SARSALambdaContinuous, TrueOnlineSARSALambda, FQI) pass it through QApproximator internally, so the same algorithm code handles all three cases transparently.

QApproximatorSimple is preferred when the number of actions is large, since a single model stores all Q-values jointly. QApproximatorAction trains a separate model per action and is useful when per-action function complexity differs.

Example

The following example trains a SARSA(λ) agent on the MountainCar environment using tile-coded features and a LinearApproximator.

First, the MDP, the policy and the features are set up:

from mushroom_rl.algorithms.value import SARSALambdaContinuous
from mushroom_rl.approximators.parametric import LinearApproximator
from mushroom_rl.core import Core
from mushroom_rl.environments import Gymnasium
from mushroom_rl.features import Features
from mushroom_rl.features.tiles import Tiles
from mushroom_rl.policy import EpsGreedy
from mushroom_rl.utils.callbacks import CollectDataset
from mushroom_rl.rl_utils.parameters import Parameter


# MDP
mdp = Gymnasium(name='MountainCar-v0', horizon=3000, gamma=1.)

# Policy
epsilon = Parameter(value=0.)
pi = EpsGreedy(epsilon=epsilon)

# Q-function approximator
n_tilings = 10
tilings = Tiles.generate(n_tilings, [10, 10],
                         mdp.info.observation_space.low,
                         mdp.info.observation_space.high)
features = Features(tilings=tilings)

Setting output_shape to the number of actions creates a QApproximatorSimple inside the algorithm:

# Agent
learning_rate = Parameter(.1 / n_tilings)
approximator_params = dict(input_shape=(features.size,),
                           output_shape=(mdp.info.action_space.n,),
                           n_actions=mdp.info.action_space.n,
                           phi=features)
agent = SARSALambdaContinuous(mdp.info, pi, LinearApproximator,
                              approximator_params=approximator_params,
                              learning_rate=learning_rate,
                              lambda_coeff=.9)

To use a QApproximatorAction instead — one independent model per action — simply set output_shape to (1,):

approximator_params = dict(input_shape=(features.size,),
                           output_shape=(1,),
                           n_actions=mdp.info.action_space.n,
                           phi=features)

The rest creates the training loop and runs training and evaluation:

# Algorithm
collect_dataset = CollectDataset()
callbacks = [collect_dataset]
core = Core(agent, mdp, callbacks_fit=callbacks)

# Train
core.learn(n_episodes=100, n_steps_per_fit=1)

# Evaluate
core.evaluate(n_episodes=1, render=True)