Core

class mushroom.core.core.Core(agent, mdp, callbacks=None)[source]

Bases: object

Implements the functions to run a generic algorithm.

__init__(agent, mdp, callbacks=None)[source]

Constructor.

Parameters:
  • agent (Agent) – the agent moving according to a policy;
  • mdp (Environment) – the environment in which the agent moves;
  • callbacks (list) – list of callbacks to execute at the end of each learn iteration.
learn(n_steps=None, n_episodes=None, n_steps_per_fit=None, n_episodes_per_fit=None, render=False, quiet=False)[source]

This function moves the agent in the environment and fits the policy using the collected samples. The agent can be moved for a given number of steps or a given number of episodes and, independently from this choice, the policy can be fitted after a given number of steps or a given number of episodes. By default, the environment is reset.

Parameters:
  • n_steps (int, None) – number of steps to move the agent;
  • n_episodes (int, None) – number of episodes to move the agent;
  • n_steps_per_fit (int, None) – number of steps between each fit of the policy;
  • n_episodes_per_fit (int, None) – number of episodes between each fit of the policy;
  • render (bool, False) – whether to render the environment or not;
  • quiet (bool, False) – whether to show the progress bar or not.
evaluate(initial_states=None, n_steps=None, n_episodes=None, render=False, quiet=False)[source]

This function moves the agent in the environment using its policy. The agent is moved for a provided number of steps, episodes, or from a set of initial states for the whole episode. By default, the environment is reset.

Parameters:
  • initial_states (np.ndarray, None) – the starting states of each episode;
  • n_steps (int, None) – number of steps to move the agent;
  • n_episodes (int, None) – number of episodes to move the agent;
  • render (bool, False) – whether to render the environment or not;
  • quiet (bool, False) – whether to show the progress bar or not.
_step(render)[source]

Single step.

Parameters:render (bool) – whether to render or not.
Returns:A tuple containing the previous state, the action sampled by the agent, the reward obtained, the reached state, the absorbing flag of the reached state and the last step flag.
reset(initial_states=None)[source]

Reset the state of the agent.