Mushroom provides the implementations of several algorithms belonging to all categories of RL:

  • value-based;
  • policy-search;
  • actor-critic.

One can easily implement customized algorithms following the structure of the already available ones.


class mushroom.algorithms.agent.Agent(policy, mdp_info, features=None)[source]

Bases: object

This class implements the functions to manage the agent (e.g. move the agent following its policy).

__init__(policy, mdp_info, features=None)[source]


  • policy (Policy) – the policy followed by the agent;
  • mdp_info (MDPInfo) – information about the MDP;
  • features (object, None) – features to extract from the state.

Fit step.

Parameters:dataset (list) – the dataset.

Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).

Parameters:state (np.ndarray) – the state where the agent is.
Returns:The action to be executed.

Called by the agent when a new episode starts.


Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.