How to create a regressor

MushroomRL offers a high-level interface to build function regressors. Indeed, it transparently manages regressors for generic functions and Q-function regressors. The user should not care about the low-level implementation of these regressors and should only use the Regressor interface. This interface creates a Q-function regressor or a GenericRegressor depending on whether the n_actions parameter is provided to the constructor or not.

Usage of the Regressor interface

When the action space of RL problems is finite and the adopted approach is value-based,

we want to compute the Q-function of each action. In MushroomRL, this is possible using:

  • a Q-function regressor with a different approximator for each action (ActionRegressor);

  • a single Q-function regressor with a different output for each action (QRegressor).

The QRegressor is suggested when the number of discrete actions is high, due to memory reasons.

The user can create create a QRegressor or an ActionRegressor, setting the output_shape parameter of the Regressor interface. If it is set to (1,), an ActionRegressor is created; otherwise if it is set to the number of discrete actions, a QRegressor is created.

Example

Initially, the MDP, the policy and the features are created:

import numpy as np

from mushroom_rl.algorithms.value import SARSALambdaContinuous
from mushroom_rl.approximators.parametric import LinearApproximator
from mushroom_rl.core import Core
from mushroom_rl.environments import *
from mushroom_rl.features import Features
from mushroom_rl.features.tiles import Tiles
from mushroom_rl.policy import EpsGreedy
from mushroom_rl.utils.callbacks import CollectDataset
from mushroom_rl.rl_utils.parameters import Parameter


# MDP
mdp = Gym(name='MountainCar-v0', horizon=np.inf, gamma=1.)

# Policy
epsilon = Parameter(value=0.)
pi = EpsGreedy(epsilon=epsilon)

# Q-function approximator
n_tilings = 10
tilings = Tiles.generate(n_tilings, [10, 10],
                         mdp.info.observation_space.low,
                         mdp.info.observation_space.high)
features = Features(tilings=tilings)

# Agent
learning_rate = Parameter(.1 / n_tilings)

The following snippet, sets the output shape of the regressor to the number of actions, creating a QRegressor:

approximator_params = dict(input_shape=(features.size,),
                           output_shape=(mdp.info.action_space.n,),
                           n_actions=mdp.info.action_space.n)

If you prefer to use an ActionRegressor, simply set the number of actions to (1,):

approximator_params = dict(input_shape=(features.size,),
                           output_shape=(1,),
                           n_actions=mdp.info.action_space.n)

Then, the rest of the code fits the approximator and runs the evaluation rendering the behaviour of the agent:

agent = SARSALambdaContinuous(mdp.info, pi, LinearApproximator,
                              approximator_params=approximator_params,
                              learning_rate=learning_rate,
                              lambda_coeff= .9, features=features)

# Algorithm
collect_dataset = CollectDataset()
callbacks = [collect_dataset]
core = Core(agent, mdp, callbacks_fit=callbacks)

# Train
core.learn(n_episodes=100, n_steps_per_fit=1)

# Evaluate
core.evaluate(n_episodes=1, render=True)

Generic regressor

Whenever the n_actions parameter is not provided, the Regressor interface creates a GenericRegressor. This regressor can be used for general purposes and it is more flexible to be used. It is commonly used in policy search algorithms.

Example

Create a dataset of points distributed on a line with random gaussian noise.

import numpy as np
from matplotlib import pyplot as plt

from mushroom_rl.approximators import Regressor
from mushroom_rl.approximators.parametric import LinearApproximator


x = np.arange(10).reshape(-1, 1)

intercept = 10
noise = np.random.randn(10, 1) * 1
y = 2 * x + intercept + noise

To fit the intercept, polynomial features of degree 1 are created by hand:

phi = np.concatenate((np.ones(10).reshape(-1, 1), x), axis=1)

The regressor is then created and fit (note that n_actions is not provided):

regressor = Regressor(LinearApproximator,
                      input_shape=(2,),
                      output_shape=(1,))

regressor.fit(phi, y)

Eventually, the approximated function of the regressor is plotted together with the target points. Moreover, the weights and the gradient in point 5 of the linear approximator are printed.

print('Weights: ' + str(regressor.get_weights()))
print('Gradient: ' + str(regressor.diff(np.array([[5.]]))))

plt.scatter(x, y)
plt.plot(x, regressor.predict(phi))
plt.show()