# How to make an advanced experiment¶

Continuous MDPs are a challenging class of problems to solve in RL. In these
problems, a tabular regressor is not enough to approximate the Q-function, since
there are an infinite number of states/actions. The solution to solve them is to
use a function approximator (e.g. neural network) fed with the raw values
of states and actions. In the case a linear approximator is used, it is
convenient to enlarge the input space with the space of non-linear **features**
extracted from the raw values. This way, the linear approximator is often able
to solve the MDPs, despite its simplicity. Many RL algorithms rely on the use of
a linear approximator to solve a MDP, therefore the use of features is very
important.
This tutorial shows how to solve a continuous MDP in Mushroom using an
algorithm that requires the use of a linear approximator.

Initially, the MDP and the policy are created:

```
import numpy as np
from mushroom.algorithms.value import SARSALambdaContinuous
from mushroom.approximators.parametric import LinearApproximator
from mushroom.core import Core
from mushroom.environments import *
from mushroom.features import Features
from mushroom.features.tiles import Tiles
from mushroom.policy import EpsGreedy
from mushroom.utils.callbacks import CollectDataset
from mushroom.utils.parameters import Parameter
# MDP
mdp = Gym(name='MountainCar-v0', horizon=np.inf, gamma=1.)
# Policy
epsilon = Parameter(value=0.)
pi = EpsGreedy(epsilon=epsilon)
```

This is an environment created with the Mushroom interface to the OpenAI Gym
library. Each environment offered by OpenAI Gym can be created this way simply
providing the corresponding id in the `name`

parameter, except for the Atari
that are managed by a separate class.
After the creation of the MDP, the tiles features are created:

```
# Q-function approximator
n_tilings = 10
tilings = Tiles.generate(n_tilings, [10, 10],
mdp.info.observation_space.low,
mdp.info.observation_space.high)
features = Features(tilings=tilings)
approximator_params = dict(input_shape=(features.size,),
output_shape=(mdp.info.action_space.n,),
n_actions=mdp.info.action_space.n)
```

In this example, we use sparse coding by means of **tiles** features. The
`generate`

method generates `n_tilings`

grids of 10x10 tilings evenly spaced
(the way the tilings are created is explained in *“Reinforcement Learning: An Introduction”,
Sutton & Barto, 1998*). Eventually, the grid is passed to the `Features`

factory method that returns the features class.

Mushroom offers other type of features such a **radial basis functions** and
**polynomial** features. The former have also a faster implementation written in
Tensorflow that can be used transparently.

Then, the agent is created as usual, but this time passing the feature to it.
It is important to notice that the learning rate is divided by the number of
tilings for the correctness of the update (see *“Reinforcement Learning: An Introduction”,
Sutton & Barto, 1998* for details). After that, the learning is run as usual:

```
# Agent
learning_rate = Parameter(.1 / n_tilings)
agent = SARSALambdaContinuous(LinearApproximator, pi, mdp.info,
approximator_params=approximator_params,
learning_rate=learning_rate,
lambda_coeff= .9, features=features)
# Algorithm
collect_dataset = CollectDataset()
callbacks = [collect_dataset]
core = Core(agent, mdp, callbacks=callbacks)
# Train
core.learn(n_episodes=100, n_steps_per_fit=1)
```

To visualize the learned policy the rendering method of OpenAI Gym is used. To
activate the rendering in the environments that supports it, it is necessary to
set `render=True`

.

```
# Evaluate
core.evaluate(n_episodes=1, render=True)
```