Environments

In mushroom_rl we distinguish between two different types of environment classes:

proper environments
generators

While environments directly implement the Environment interface, generators are a set of methods used to generate finite markov chains that represent a specific environment e.g., grid worlds.

Environments

Atari

class Atari(name, width=84, height=84, ends_at_life=False, max_pooling=True, history_length=4, max_no_op_actions=30)[source]

Bases: Environment

The Atari environment as presented in: “Human-level control through deep reinforcement learning”. Mnih et. al.. 2015.

__init__(name, width=84, height=84, ends_at_life=False, max_pooling=True, history_length=4, max_no_op_actions=30)[source]

Constructor.

Parameters:

name (str) – id name of the Atari game in Gym;
width (int, 84) – width of the screen;
height (int, 84) – height of the screen;
ends_at_life (bool, False) – whether the episode ends when a life is lost or not;
max_pooling (bool, True) – whether to do max-pooling or average-pooling of the last two frames when using NoFrameskip;
history_length (int, 4) – number of frames to form a state;
max_no_op_actions (int, 30) – maximum number of no-op action to execute at the beginning of an episode.

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

render(record=False)[source]

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

stop()[source]: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

set_episode_end(ends_at_life)[source]

Setter.

Parameters:: ends_at_life (bool) – whether the episode ends when a life is lost or not.

Car on hill

class CarOnHill(horizon=100, gamma=0.95)[source]

Bases: Environment

The Car On Hill environment as presented in: “Tree-Based Batch Mode Reinforcement Learning”. Ernst D. et al.. 2005.

__init__(horizon=100, gamma=0.95)[source]

Constructor.

Parameters:

horizon (int, 100) – horizon of the problem;
gamma (float, .95) – discount factor.

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

render(record=False)[source]

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

stop()[source]: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

DeepMind Control Suite

class DMControl(domain_name, task_name, horizon=None, gamma=0.99, task_kwargs=None, dt=0.01, width_screen=480, height_screen=480, camera_id=0, use_pixels=False, pixels_width=64, pixels_height=64)[source]

Bases: Environment

Interface for dm_control suite Mujoco environments. It makes it possible to use every dm_control suite Mujoco environment just providing the necessary information.

__init__(domain_name, task_name, horizon=None, gamma=0.99, task_kwargs=None, dt=0.01, width_screen=480, height_screen=480, camera_id=0, use_pixels=False, pixels_width=64, pixels_height=64)[source]

Constructor.

Parameters:

domain_name (str) – name of the environment;
task_name (str) – name of the task of the environment;
horizon (int) – the horizon;
gamma (float) – the discount factor;
task_kwargs (dict, None) – parameters of the task;
dt (float, .01) – duration of a control step;
width_screen (int, 480) – width of the screen;
height_screen (int, 480) – height of the screen;
camera_id (int, 0) – position of camera to render the environment;
use_pixels (bool, False) – if True, pixel observations are used rather than the state vector;
pixels_width (int, 64) – width of the pixel observation;
pixels_height (int, 64) – height of the pixel observation;

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

render(record=False)[source]

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

stop()[source]: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

Finite MDP

class FiniteMDP(p, rew, mu=None, gamma=0.9, horizon=inf, dt=0.1)[source]

Bases: Environment

Finite Markov Decision Process.

__init__(p, rew, mu=None, gamma=0.9, horizon=inf, dt=0.1)[source]

Constructor.

Parameters:

p (np.ndarray) – transition probability matrix;
rew (np.ndarray) – reward matrix;
mu (np.ndarray, None) – initial state probability distribution;
gamma (float, .9) – discount factor;
horizon (int, np.inf) – the horizon;
dt (float, 1e-1) – the control timestep of the environment.

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

Grid World

class AbstractGridWorld(mdp_info, height, width, start, goal)[source]

Bases: Environment

Abstract class to build a grid world.

__init__(mdp_info, height, width, start, goal)[source]

Constructor.

Parameters:

height (int) – height of the grid;
width (int) – width of the grid;
start (tuple) – x-y coordinates of the goal;
goal (tuple) – x-y coordinates of the goal.

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

render(record=False)[source]

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

stop()[source]: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

class GridWorld(height, width, goal, start=(0, 0), dt=0.1)[source]

Bases: AbstractGridWorld

Standard grid world.

__init__(height, width, goal, start=(0, 0), dt=0.1)[source]

Constructor

Parameters:

height (int) – height of the grid;
width (int) – width of the grid;
goal (tuple) – 2D coordinates of the goal state;
start (tuple, (0, 0)) – 2D coordinates of the starting state;
dt (float, 0.1) – the control timestep of the environment.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reset(state=None)

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

class GridWorldVanHasselt(height=3, width=3, goal=(0, 2), start=(2, 0), dt=0.1)[source]

Bases: AbstractGridWorld

A variant of the grid world as presented in: “Double Q-Learning”. Hasselt H. V.. 2010.

__init__(height=3, width=3, goal=(0, 2), start=(2, 0), dt=0.1)[source]

Constructor

Parameters:

height (int, 3) – height of the grid;
width (int, 3) – width of the grid;
goal (tuple, (0, 2)) – 2D coordinates of the goal state;
start (tuple, (2, 0)) – 2D coordinates of the starting state;
dt (float, 0.1) – the control timestep of the environment.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reset(state=None)

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

Gym

class Gym(name, horizon=None, gamma=0.99, wrappers=None, wrappers_args=None, **env_args)[source]

Bases: Environment

Interface for OpenAI Gym environments. It makes it possible to use every Gym environment just providing the id, except for the Atari games that are managed in a separate class.

__init__(name, horizon=None, gamma=0.99, wrappers=None, wrappers_args=None, **env_args)[source]

Constructor.

Parameters:

name (str) – gym id of the environment;
horizon (int) – the horizon. If None, use the one from Gym;
gamma (float, 0.99) – the discount factor;
wrappers – list of wrappers to apply over the environment. It is possible to pass arguments to the wrappers by providing a tuple with two elements: the gym wrapper class and a dictionary containing the parameters needed by the wrapper constructor;

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)[source]

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

stop()[source]: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

Habitat

class Habitat(wrapper, config_file, base_config_file=None, horizon=None, gamma=0.99, width=None, height=None)[source]

Bases: Gym

Interface for Habitat RL environments. This class is very generic and can be used for many Habitat task. Depending on the robot / task, you have to use different wrappers, since observation and action spaces may vary.

See <MUSHROOM_RL PATH>/examples/habitat/ for more details.

__init__(wrapper, config_file, base_config_file=None, horizon=None, gamma=0.99, width=None, height=None)[source]

Constructor. For more details on how to pass YAML configuration files, please see <MUSHROOM_RL PATH>/examples/habitat/README.md

Parameters:

wrapper (str) – wrapper for converting observations and actions (e.g., HabitatRearrangeWrapper);
config_file (str) – path to the YAML file specifying the RL task configuration (see <HABITAT_LAB PATH>/habitat_baselines/configs/);
base_config_file (str, None) – path to an optional YAML file, used as ‘BASE_TASK_CONFIG_PATH’ in the first YAML (see <HABITAT_LAB PATH>/configs/);
horizon (int, None) – the horizon;
gamma (float, 0.99) – the discount factor;
width (int, None) – width of the pixel observation. If None, the value specified in the config file is used.
height (int, None) – height of the pixel observation. If None, the value specified in the config file is used.

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop()[source]: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

render(record=False)[source]

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

class HabitatNavigationWrapper(env)[source]

Use it for navigation tasks, where the agent has to go from point A to B. Action is discrete: stop, turn left, turn right, move forward. The amount of degrees / distance the agent turns / moves is defined in the YAML file. ‘STOP’ ends the episode and the agent must do it to get success rewards. For example, if the agent successfully completes the task but does not do ‘STOP’ it will not get the success reward. The observation is the RGB agent’s view of what it sees in front of itself. We also add the agent’s true (x,y) position to the ‘info’ dictionary.

__init__(env)[source]

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:: env – The environment to wrap

reset()[source]: Resets the environment with kwargs.

step(action)[source]: Steps through the environment with action.

get_shortest_path()[source]: Returns observations and actions corresponding to the shortest path to the goal. If the goal cannot be reached within the episode steps limit, the best path (closest to the goal) will be returned.

get_optimal_policy_return()[source]: Returns the undiscounted sum of rewards of the optimal policy.

class HabitatRearrangeWrapper(env)[source]

Use it for the rearrange task, where the robot has to interact with objects. There are several ‘rearrange’ tasks, such as ‘pick’, ‘place’, ‘open X’, ‘close X’, where X can be a door, the fridge, …

Each task has its own actions, observations, rewards, and terminal conditions. Please check Habitat 2.0 paper for more details https://arxiv.org/pdf/2106.14405.pdf

This wrapper, instead, uses a common observation / action space.

The observation is the RGB image returned by the sensor mounted on the head of the robot. We also return the end-effector position in the ‘info’ dictionary.

The action is mixed. - The first elements of the action vector are continuous values for velocity control of the arm’s joint. - The last element is a scalar value for picking / placing an object: if this scalar is positive and the gripper is not currently holding an object and the end-effector is within 15cm of an object, then the object closest to the end-effector is grasped; if the scalar is negative and the gripper is carrying an object, the object is released. - The last element is a scalar value for stopping the robot. This action ends the episode, and allows the agent to get a positive reward if the task has been completed.

__init__(env)[source]

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:: env – The environment to wrap

reset()[source]: Resets the environment with kwargs.

step(action)[source]: Steps through the environment with action.

iGibson

class iGibson(config_file, horizon=None, gamma=0.99, is_discrete=False, width=None, height=None, debug_gui=False, verbose=False)[source]

Bases: Gym

Interface for iGibson https://github.com/StanfordVL/iGibson

There are both navigation and interaction tasks. Observations are pixel images of what the agent sees in front of itself. Image resolution is specified in the config file. By default, actions are continuous, but can be discretized automatically using a flag. Note that not all robots support discrete actions.

Scene and task details are defined in the YAML config file.

__init__(config_file, horizon=None, gamma=0.99, is_discrete=False, width=None, height=None, debug_gui=False, verbose=False)[source]

Constructor.

Parameters:

config_file (str) – path to the YAML file specifying the task (see igibson/examples/configs/ and igibson/test/);
horizon (int, None) – the horizon;
gamma (float, 0.99) – the discount factor;
is_discrete (bool, False) – if True, actions are automatically discretized by iGibson’s set_up_discrete_action_space. Please note that not all robots support discrete actions.
width (int, None) – width of the pixel observation. If None, the value specified in the config file is used;
height (int, None) – height of the pixel observation. If None, the value specified in the config file is used;
debug_gui (bool, False) – if True, activate the iGibson in GUI mode, showing the pybullet rendering and the robot camera.
verbose (bool, False) – if False, it disable iGibson default messages.

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop()[source]: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

render(record=False)[source]

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

Inverted pendulum

class InvertedPendulum(random_start=False, m=1.0, l=1.0, g=9.8, mu=0.01, max_u=5.0, horizon=5000, gamma=0.99)[source]

Bases: Environment

The Inverted Pendulum environment (continuous version) as presented in: “Reinforcement Learning In Continuous Time and Space”. Doya K.. 2000. “Off-Policy Actor-Critic”. Degris T. et al.. 2012. “Deterministic Policy Gradient Algorithms”. Silver D. et al. 2014.

__init__(random_start=False, m=1.0, l=1.0, g=9.8, mu=0.01, max_u=5.0, horizon=5000, gamma=0.99)[source]

Constructor.

Parameters:

random_start (bool, False) – whether to start from a random position or from the horizontal one;
m (float, 1.0) – mass of the pendulum;
l (float, 1.0) – length of the pendulum;
g (float, 9.8) – gravity acceleration constant;
mu (float, 1e-2) – friction constant of the pendulum;
max_u (float, 5.0) – maximum allowed input torque;
horizon (int, 5000) – horizon of the problem;
gamma (int, .99) – discount factor.

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

render(record=False)[source]

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

stop()[source]: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

Cart Pole

class CartPole(m=2.0, M=8.0, l=0.5, g=9.8, mu=0.01, max_u=50.0, noise_u=10.0, horizon=3000, gamma=0.95)[source]

Bases: Environment

The Inverted Pendulum on a Cart environment as presented in: “Least-Squares Policy Iteration”. Lagoudakis M. G. and Parr R.. 2003.

__init__(m=2.0, M=8.0, l=0.5, g=9.8, mu=0.01, max_u=50.0, noise_u=10.0, horizon=3000, gamma=0.95)[source]

Constructor.

Parameters:

m (float, 2.0) – mass of the pendulum;
M (float, 8.0) – mass of the cart;
l (float, .5) – length of the pendulum;
g (float, 9.8) – gravity acceleration constant;
max_u (float, 50.) – maximum allowed input torque;
noise_u (float, 10.) – maximum noise on the action;
horizon (int, 3000) – horizon of the problem;
gamma (float, .95) – discount factor.

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

render(record=False)[source]

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

stop()[source]: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

LQR

class LQR(A, B, Q, R, max_pos=inf, max_action=inf, random_init=False, episodic=False, gamma=0.9, horizon=50, initial_state=None, dt=0.1)[source]

Bases: Environment

This class implements a Linear-Quadratic Regulator. This task aims to minimize the undesired deviations from nominal values of some controller settings in control problems. The system equations in this task are:

\[x_{t+1} = Ax_t + Bu_t\]

where x is the state and u is the control signal.

The reward function is given by:

\[r_t = -\left( x_t^TQx_t + u_t^TRu_t \right)\]

“Policy gradient approaches for multi-objective sequential decision making”. Parisi S., Pirotta M., Smacchia N., Bascetta L., Restelli M.. 2014

__init__(A, B, Q, R, max_pos=inf, max_action=inf, random_init=False, episodic=False, gamma=0.9, horizon=50, initial_state=None, dt=0.1)[source]

Constructor.

static generate(dimensions=None, s_dim=None, a_dim=None, max_pos=inf, max_action=inf, eps=0.1, index=0, scale=1.0, random_init=False, episodic=False, gamma=0.9, horizon=50, initial_state=None)[source]

Factory method that generates an lqr with identity dynamics and symmetric reward matrices.

Parameters:

dimensions (int) – number of state-action dimensions;
s_dim (int) – number of state dimensions;
a_dim (int) – number of action dimensions;
max_pos (float, np.inf) – maximum value of the state;
max_action (float, np.inf) – maximum value of the action;
eps (double, .1) – reward matrix weights specifier;
index (int, 0) – selector for the principal state;
scale (float, 1.0) – scaling factor for the reward function;
random_init (bool, False) – start from a random state;
episodic (bool, False) – end the episode when the state goes over the threshold;
gamma (float, .9) – discount factor;
horizon (int, 50) – horizon of the mdp.

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

Minigrid

class MiniGrid(name, horizon=None, gamma=0.99, history_length=4, fixed_seed=None, use_pixels=False)[source]

Bases: Gym

Interface for gym_minigrid environments. It makes it possible to use all MiniGrid environments that do not use text instructions, such as MultiRoom, KeyCorridor, BlockedUnblockPickup, ObstructedMaze. This environment uses either MiniGrid’s default 7x7x3 observations or their pixel 56x56x3 version. In both cases, the state is partially observable. To compensate for the partial observability, LazyFrames are used.

__init__(name, horizon=None, gamma=0.99, history_length=4, fixed_seed=None, use_pixels=False)[source]

Constructor.

Parameters:

name (str) – name of the environment;
horizon (int, None) – the horizon;
gamma (float, 0.99) – the discount factor;
history_length (int, 4) – number of frames to form a state;
fixed_seed (int, None) – if passed, it fixes the seed of the environment at every reset. This way, the environment is fixed rather than procedurally generated;
use_pixels (bool, False) – if True, MiniGrid’s default 7x7x3 observations is converted to an image of resolution 56x56x3.

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

Mujoco

class MuJoCo(xml_file, actuation_spec, observation_spec, gamma, horizon, timestep=None, n_substeps=1, n_intermediate_steps=1, additional_data_spec=None, collision_groups=None, max_joint_vel=None, **viewer_params)[source]

Bases: Environment

Class to create a Mushroom environment using the MuJoCo simulator.

__init__(xml_file, actuation_spec, observation_spec, gamma, horizon, timestep=None, n_substeps=1, n_intermediate_steps=1, additional_data_spec=None, collision_groups=None, max_joint_vel=None, **viewer_params)[source]

Constructor.

Parameters:

xml_file (str/xml handle) – A string with a path to the xml or an Mujoco xml handle.
actuation_spec (list) – A list specifying the names of the joints which should be controllable by the agent. Can be left empty when all actuators should be used;
observation_spec (list) – A list containing the names of data that should be made available to the agent as an observation and their type (ObservationType). They are combined with a key, which is used to access the data. An entry in the list is given by: (key, name, type). The name can later be used to retrieve specific observations;
gamma (float) – The discounting factor of the environment;
horizon (int) – The maximum horizon for the environment;
timestep (float) – The timestep used by the MuJoCo simulator. If None, the default timestep specified in the XML will be used;
n_substeps (int, 1) – The number of substeps to use by the MuJoCo simulator. An action given by the agent will be applied for n_substeps before the agent receives the next observation and can act accordingly;
n_intermediate_steps (int, 1) – The number of steps between every action taken by the agent. Similar to n_substeps but allows the user to modify, control and access intermediate states.
additional_data_spec (list, None) – A list containing the data fields of interest, which should be read from or written to during simulation. The entries are given as the following tuples: (key, name, type) key is a string for later referencing in the “read_data” and “write_data” methods. The name is the name of the object in the XML specification and the type is the ObservationType;
collision_groups (list, None) – A list containing groups of geoms for which collisions should be checked during simulation via check_collision. The entries are given as: (key, geom_names), where key is a string for later referencing in the “check_collision” method, and geom_names is a list of geom names in the XML specification.
max_joint_vel (list, None) – A list with the maximum joint velocities which are provided in the mdp_info. The list has to define a maximum velocity for every occurrence of JOINT_VEL in the observation_spec. The velocity will not be limited in mujoco
**viewer_params – other parameters to be passed to the viewer. See MujocoViewer documentation for the available options.

seed(seed)[source]

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

reset(obs=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

render(record=False)[source]

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

stop()[source]: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

_modify_mdp_info(mdp_info)[source]

This method can be overridden to modify the automatically generated MDPInfo data structure. By default, returns the given mdp_info structure unchanged.

Parameters:: mdp_info (MDPInfo) – the MDPInfo structure automatically computed by the environment.
Returns:: The modified MDPInfo data structure.

_create_observation(obs)[source]

This method can be overridden to create a custom observation. Should be used to append observation which have been registered via obs_help.add_obs(self, name, o_type, length, min_value, max_value)

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_create_info_dictionary(obs)[source]

This method can be overridden to create a custom info dictionary.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The information dictionary.

_modify_observation(obs)[source]

This method can be overridden to edit the created observation. This is done after the reward and absorbing functions are evaluated. Especially useful to transform the observation into different frames. If the original observation order is not preserved, the helper functions in ObervationHelper breaks.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_preprocess_action(action)[source]

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_step_init(obs, action)[source]: Allows information to be initialized at the start of a step.

_compute_action(obs, action)[source]

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

obs (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_simulation_pre_step()[source]

Allows information to be accesed and changed at every intermediate step before taking a step in the mujoco simulation. Can be usefull to apply an external force/torque to the specified bodies.

ex: apply a force over X to the torso: force = [200, 0, 0] torque = [0, 0, 0] self.sim.data.xfrc_applied[self.sim.model._body_name2id[“torso”],:] = force + torque

_simulation_post_step()[source]: Allows information to be accesed at every intermediate step after taking a step in the mujoco simulation. Can be usefull to average forces over all intermediate steps.

_step_finalize()[source]: Allows information to be accesed at the end of a step.

_read_data(name)[source]

Read data form the MuJoCo data structure.

Parameters:: name (string) – A name referring to an entry contained the additional_data_spec list handed to the constructor.
Returns:: The desired data as a one-dimensional numpy array.

_write_data(name, value)[source]

Write data to the MuJoCo data structure.

Parameters:

name (string) – A name referring to an entry contained in the additional_data_spec list handed to the constructor;
value (ndarray) – The data that should be written.

_check_collision(group1, group2)[source]

Check for collision between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A flag indicating whether a collision occurred between the given groups or not.

_get_collision_force(group1, group2)[source]

Returns the collision force and torques between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A 6D vector specifying the collision forces/torques[3D force + 3D torque] between the given groups. Vector of 0’s in case there was no collision. http://mujoco.org/book/programming.html#siContact

reward(obs, action, next_obs, absorbing)[source]

Compute the reward based on the given transition.

Parameters:

obs (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_obs (np.array) – the state reached after applying the given action.
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

is_absorbing(obs)[source]

Check whether the given state is an absorbing state or not.

Parameters:: obs (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

setup(obs)[source]: A function that allows to execute setup code after an environment reset.

get_all_observation_keys()[source]

A function that returns all observation keys defined in the observation specification.

Returns:: A list of observation keys.

static get_action_indices(model, data, actuation_spec)[source]

Returns the action indices given the MuJoCo model, data, and actuation_spec.

Parameters:

model – MuJoCo model.
data –
MuJoCo data structure. actuation_spec (list): A list specifying the names of the joints

which should be controllable by the agent. Can be left empty when all actuators should be used;

Returns:

A list of actuator indices.

static get_action_space(action_indices, model)[source]

Returns the action space bounding box given the action_indices and the model.

static user_warning_raise_exception(warning)[source]

Detects warnings in Mujoco and raises the respective exception.

Parameters:: warning – Mujoco warning.

static load_model(xml_file)[source]

Takes an xml_file and compiles and loads the model.

Parameters:: xml_file (str/xml handle) – A string with a path to the xml or an Mujoco xml handle.
Returns:: Mujoco model.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

class MultiMuJoCo(xml_files, actuation_spec, observation_spec, gamma, horizon, timestep=None, n_substeps=1, n_intermediate_steps=1, additional_data_spec=None, collision_groups=None, max_joint_vel=None, random_env_reset=True, **viewer_params)[source]

Bases: MuJoCo

Class to create N environments at the same time using the MuJoCo simulator. This class is not meant to run N environments in parallel, but to load and create N environments, and randomly sample one of the environment every episode.

__init__(xml_files, actuation_spec, observation_spec, gamma, horizon, timestep=None, n_substeps=1, n_intermediate_steps=1, additional_data_spec=None, collision_groups=None, max_joint_vel=None, random_env_reset=True, **viewer_params)[source]

Constructor.

Parameters:

xml_files (str/xml handle) – A list containing strings with a path to the xml or Mujoco xml handles;
actuation_spec (list) – A list specifying the names of the joints which should be controllable by the agent. Can be left empty when all actuators should be used;
observation_spec (list) – A list containing the names of data that should be made available to the agent as an observation and their type (ObservationType). They are combined with a key, which is used to access the data. An entry in the list is given by: (key, name, type);
gamma (float) – The discounting factor of the environment;
horizon (int) – The maximum horizon for the environment;
timestep (float) – The timestep used by the MuJoCo simulator. If None, the default timestep specified in the XML will be used;
n_substeps (int, 1) – The number of substeps to use by the MuJoCo simulator. An action given by the agent will be applied for n_substeps before the agent receives the next observation and can act accordingly;
n_intermediate_steps (int, 1) – The number of steps between every action taken by the agent. Similar to n_substeps but allows the user to modify, control and access intermediate states.
additional_data_spec (list, None) – A list containing the data fields of interest, which should be read from or written to during simulation. The entries are given as the following tuples: (key, name, type) key is a string for later referencing in the “read_data” and “write_data” methods. The name is the name of the object in the XML specification and the type is the ObservationType;
collision_groups (list, None) – A list containing groups of geoms for which collisions should be checked during simulation via check_collision. The entries are given as: (key, geom_names), where key is a string for later referencing in the “check_collision” method, and geom_names is a list of geom names in the XML specification.
max_joint_vel – A list with the maximum joint velocities which are provided in the mdp_info. The list has to define a maximum velocity for every occurrence of JOINT_VEL in the observation_spec. The velocity will not be limited in mujoco.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

_check_collision(group1, group2)

Check for collision between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A flag indicating whether a collision occurred between the given groups or not.

_compute_action(obs, action)

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

obs (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_create_info_dictionary(obs)

This method can be overridden to create a custom info dictionary.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The information dictionary.

_create_observation(obs)

This method can be overridden to create a custom observation. Should be used to append observation which have been registered via obs_help.add_obs(self, name, o_type, length, min_value, max_value)

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_get_collision_force(group1, group2)

Returns the collision force and torques between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A 6D vector specifying the collision forces/torques[3D force + 3D torque] between the given groups. Vector of 0’s in case there was no collision. http://mujoco.org/book/programming.html#siContact

_modify_mdp_info(mdp_info)

This method can be overridden to modify the automatically generated MDPInfo data structure. By default, returns the given mdp_info structure unchanged.

Parameters:: mdp_info (MDPInfo) – the MDPInfo structure automatically computed by the environment.
Returns:: The modified MDPInfo data structure.

_modify_observation(obs)

This method can be overridden to edit the created observation. This is done after the reward and absorbing functions are evaluated. Especially useful to transform the observation into different frames. If the original observation order is not preserved, the helper functions in ObervationHelper breaks.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_preprocess_action(action)

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_read_data(name)

Read data form the MuJoCo data structure.

Parameters:: name (string) – A name referring to an entry contained the additional_data_spec list handed to the constructor.
Returns:: The desired data as a one-dimensional numpy array.

_simulation_post_step(): Allows information to be accesed at every intermediate step after taking a step in the mujoco simulation. Can be usefull to average forces over all intermediate steps.

_simulation_pre_step()

Allows information to be accesed and changed at every intermediate step before taking a step in the mujoco simulation. Can be usefull to apply an external force/torque to the specified bodies.

ex: apply a force over X to the torso: force = [200, 0, 0] torque = [0, 0, 0] self.sim.data.xfrc_applied[self.sim.model._body_name2id[“torso”],:] = force + torque

_step_finalize(): Allows information to be accesed at the end of a step.

_step_init(obs, action): Allows information to be initialized at the start of a step.

_write_data(name, value)

Write data to the MuJoCo data structure.

Parameters:

name (string) – A name referring to an entry contained in the additional_data_spec list handed to the constructor;
value (ndarray) – The data that should be written.

static get_action_indices(model, data, actuation_spec)

Returns the action indices given the MuJoCo model, data, and actuation_spec.

Parameters:

model – MuJoCo model.
data –
MuJoCo data structure. actuation_spec (list): A list specifying the names of the joints

which should be controllable by the agent. Can be left empty when all actuators should be used;

Returns:

A list of actuator indices.

static get_action_space(action_indices, model)

Returns the action space bounding box given the action_indices and the model.

get_all_observation_keys()

A function that returns all observation keys defined in the observation specification.

Returns:: A list of observation keys.

property info: Returns: An object containing the info of the environment.

is_absorbing(obs)

Check whether the given state is an absorbing state or not.

Parameters:: obs (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static load_model(xml_file)

Takes an xml_file and compiles and loads the model.

Parameters:: xml_file (str/xml handle) – A string with a path to the xml or an Mujoco xml handle.
Returns:: Mujoco model.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reward(obs, action, next_obs, absorbing)

Compute the reward based on the given transition.

Parameters:

obs (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_obs (np.array) – the state reached after applying the given action.
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

setup(obs): A function that allows to execute setup code after an environment reset.

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static user_warning_raise_exception(warning)

Detects warnings in Mujoco and raises the respective exception.

Parameters:: warning – Mujoco warning.

reset(obs=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

static _get_env_id_map(current_model_idx, n_models)[source]

Retuns a binary vector to identify environment. This can be passed to the observation space.

Parameters:

current_model_idx (int) – index of the current model.
n_models (int) – total number of models.

Returns:

ndarray containing a binary vector identifying the current environment.

Air Hockey

class AirHockeyBase(n_agents=1, env_noise=False, obs_noise=False, gamma=0.99, horizon=500, timestep=0.004166666666666667, n_substeps=1, n_intermediate_steps=1, default_camera_mode='top_static', **viewer_params)[source]

Bases: MuJoCo

Abstract class for all AirHockey Environments.

__init__(n_agents=1, env_noise=False, obs_noise=False, gamma=0.99, horizon=500, timestep=0.004166666666666667, n_substeps=1, n_intermediate_steps=1, default_camera_mode='top_static', **viewer_params)[source]

Constructor.

Parameters:

n_agents (int, 1) – number of agent to be used in the environment (one or two)
env_noise (bool, False) – if True, the environment uses noisy dynamics.
obs_noise (bool, False) – if True, the environment uses noisy observations.

_simulation_pre_step()[source]

Allows information to be accesed and changed at every intermediate step before taking a step in the mujoco simulation. Can be usefull to apply an external force/torque to the specified bodies.

ex: apply a force over X to the torso: force = [200, 0, 0] torque = [0, 0, 0] self.sim.data.xfrc_applied[self.sim.model._body_name2id[“torso”],:] = force + torque

is_absorbing(obs)[source]

Check whether the given state is an absorbing state or not.

Parameters:: obs (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

_check_collision(group1, group2)

Check for collision between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A flag indicating whether a collision occurred between the given groups or not.

_compute_action(obs, action)

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

obs (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_create_info_dictionary(obs)

This method can be overridden to create a custom info dictionary.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The information dictionary.

_create_observation(obs)

This method can be overridden to create a custom observation. Should be used to append observation which have been registered via obs_help.add_obs(self, name, o_type, length, min_value, max_value)

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_get_collision_force(group1, group2)

Returns the collision force and torques between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A 6D vector specifying the collision forces/torques[3D force + 3D torque] between the given groups. Vector of 0’s in case there was no collision. http://mujoco.org/book/programming.html#siContact

_modify_mdp_info(mdp_info)

This method can be overridden to modify the automatically generated MDPInfo data structure. By default, returns the given mdp_info structure unchanged.

Parameters:: mdp_info (MDPInfo) – the MDPInfo structure automatically computed by the environment.
Returns:: The modified MDPInfo data structure.

_modify_observation(obs)

This method can be overridden to edit the created observation. This is done after the reward and absorbing functions are evaluated. Especially useful to transform the observation into different frames. If the original observation order is not preserved, the helper functions in ObervationHelper breaks.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_preprocess_action(action)

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_read_data(name)

Read data form the MuJoCo data structure.

Parameters:: name (string) – A name referring to an entry contained the additional_data_spec list handed to the constructor.
Returns:: The desired data as a one-dimensional numpy array.

_simulation_post_step(): Allows information to be accesed at every intermediate step after taking a step in the mujoco simulation. Can be usefull to average forces over all intermediate steps.

_step_finalize(): Allows information to be accesed at the end of a step.

_step_init(obs, action): Allows information to be initialized at the start of a step.

_write_data(name, value)

Write data to the MuJoCo data structure.

Parameters:

name (string) – A name referring to an entry contained in the additional_data_spec list handed to the constructor;
value (ndarray) – The data that should be written.

static get_action_indices(model, data, actuation_spec)

Returns the action indices given the MuJoCo model, data, and actuation_spec.

Parameters:

model – MuJoCo model.
data –
MuJoCo data structure. actuation_spec (list): A list specifying the names of the joints

which should be controllable by the agent. Can be left empty when all actuators should be used;

Returns:

A list of actuator indices.

static get_action_space(action_indices, model)

Returns the action space bounding box given the action_indices and the model.

get_all_observation_keys()

A function that returns all observation keys defined in the observation specification.

Returns:: A list of observation keys.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static load_model(xml_file)

Takes an xml_file and compiles and loads the model.

Parameters:: xml_file (str/xml handle) – A string with a path to the xml or an Mujoco xml handle.
Returns:: Mujoco model.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reset(obs=None)

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

reward(obs, action, next_obs, absorbing)

Compute the reward based on the given transition.

Parameters:

obs (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_obs (np.array) – the state reached after applying the given action.
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

setup(obs): A function that allows to execute setup code after an environment reset.

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static user_warning_raise_exception(warning)

Detects warnings in Mujoco and raises the respective exception.

Parameters:: warning – Mujoco warning.

class AirHockeySingle(gamma=0.99, horizon=120, env_noise=False, obs_noise=False, timestep=0.004166666666666667, n_intermediate_steps=1, **viewer_params)[source]

Bases: AirHockeyBase

Base class for single agent air hockey tasks.

__init__(gamma=0.99, horizon=120, env_noise=False, obs_noise=False, timestep=0.004166666666666667, n_intermediate_steps=1, **viewer_params)[source]: Constructor.

get_puck(obs)[source]

Getting the puck properties from the observations :param obs: The current observation

Returns:: ([pos_x, pos_y], [lin_vel_x, lin_vel_y], ang_vel_z)

get_ee()[source]

Getting the ee properties from the current internal state

Returns:: ([pos_x, pos_y, pos_z], [ang_vel_x, ang_vel_y, ang_vel_z, lin_vel_x, lin_vel_y, lin_vel_z])

_modify_observation(obs)[source]

This method can be overridden to edit the created observation. This is done after the reward and absorbing functions are evaluated. Especially useful to transform the observation into different frames. If the original observation order is not preserved, the helper functions in ObervationHelper breaks.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

setup(obs)[source]: A function that allows to execute setup code after an environment reset.

_simulation_post_step()[source]: Allows information to be accesed at every intermediate step after taking a step in the mujoco simulation. Can be usefull to average forces over all intermediate steps.

_create_observation(state)[source]

This method can be overridden to create a custom observation. Should be used to append observation which have been registered via obs_help.add_obs(self, name, o_type, length, min_value, max_value)

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_create_info_dictionary(obs)[source]

This method can be overridden to create a custom info dictionary.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The information dictionary.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

_check_collision(group1, group2)

Check for collision between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A flag indicating whether a collision occurred between the given groups or not.

_compute_action(obs, action)

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

obs (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_get_collision_force(group1, group2)

Returns the collision force and torques between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A 6D vector specifying the collision forces/torques[3D force + 3D torque] between the given groups. Vector of 0’s in case there was no collision. http://mujoco.org/book/programming.html#siContact

_modify_mdp_info(mdp_info)

This method can be overridden to modify the automatically generated MDPInfo data structure. By default, returns the given mdp_info structure unchanged.

Parameters:: mdp_info (MDPInfo) – the MDPInfo structure automatically computed by the environment.
Returns:: The modified MDPInfo data structure.

_preprocess_action(action)

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_read_data(name)

Read data form the MuJoCo data structure.

Parameters:: name (string) – A name referring to an entry contained the additional_data_spec list handed to the constructor.
Returns:: The desired data as a one-dimensional numpy array.

_simulation_pre_step()

Allows information to be accesed and changed at every intermediate step before taking a step in the mujoco simulation. Can be usefull to apply an external force/torque to the specified bodies.

ex: apply a force over X to the torso: force = [200, 0, 0] torque = [0, 0, 0] self.sim.data.xfrc_applied[self.sim.model._body_name2id[“torso”],:] = force + torque

_step_finalize(): Allows information to be accesed at the end of a step.

_step_init(obs, action): Allows information to be initialized at the start of a step.

_write_data(name, value)

Write data to the MuJoCo data structure.

Parameters:

name (string) – A name referring to an entry contained in the additional_data_spec list handed to the constructor;
value (ndarray) – The data that should be written.

static get_action_indices(model, data, actuation_spec)

Returns the action indices given the MuJoCo model, data, and actuation_spec.

Parameters:

model – MuJoCo model.
data –
MuJoCo data structure. actuation_spec (list): A list specifying the names of the joints

which should be controllable by the agent. Can be left empty when all actuators should be used;

Returns:

A list of actuator indices.

static get_action_space(action_indices, model)

Returns the action space bounding box given the action_indices and the model.

get_all_observation_keys()

A function that returns all observation keys defined in the observation specification.

Returns:: A list of observation keys.

property info: Returns: An object containing the info of the environment.

is_absorbing(obs)

Check whether the given state is an absorbing state or not.

Parameters:: obs (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static load_model(xml_file)

Takes an xml_file and compiles and loads the model.

Parameters:: xml_file (str/xml handle) – A string with a path to the xml or an Mujoco xml handle.
Returns:: Mujoco model.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reset(obs=None)

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

reward(obs, action, next_obs, absorbing)

Compute the reward based on the given transition.

Parameters:

obs (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_obs (np.array) – the state reached after applying the given action.
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static user_warning_raise_exception(warning)

Detects warnings in Mujoco and raises the respective exception.

Parameters:: warning – Mujoco warning.

class AirHockeyDouble(gamma=0.99, horizon=120, env_noise=False, obs_noise=False, timestep=0.004166666666666667, n_intermediate_steps=1, **viewer_params)[source]

Bases: AirHockeyBase

Base class for two agents air hockey tasks.

__init__(gamma=0.99, horizon=120, env_noise=False, obs_noise=False, timestep=0.004166666666666667, n_intermediate_steps=1, **viewer_params)[source]: Constructor.

_modify_observation(obs)[source]

This method can be overridden to edit the created observation. This is done after the reward and absorbing functions are evaluated. Especially useful to transform the observation into different frames. If the original observation order is not preserved, the helper functions in ObervationHelper breaks.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

reward(state, action, next_state, absorbing)[source]

Compute the reward based on the given transition.

Parameters:

obs (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_obs (np.array) – the state reached after applying the given action.
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

setup(obs)[source]: A function that allows to execute setup code after an environment reset.

_simulation_post_step()[source]: Allows information to be accesed at every intermediate step after taking a step in the mujoco simulation. Can be usefull to average forces over all intermediate steps.

_create_observation(state)[source]

This method can be overridden to create a custom observation. Should be used to append observation which have been registered via obs_help.add_obs(self, name, o_type, length, min_value, max_value)

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_create_info_dictionary(obs)[source]

This method can be overridden to create a custom info dictionary.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The information dictionary.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

_check_collision(group1, group2)

Check for collision between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A flag indicating whether a collision occurred between the given groups or not.

_compute_action(obs, action)

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

obs (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_get_collision_force(group1, group2)

Returns the collision force and torques between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A 6D vector specifying the collision forces/torques[3D force + 3D torque] between the given groups. Vector of 0’s in case there was no collision. http://mujoco.org/book/programming.html#siContact

_modify_mdp_info(mdp_info)

This method can be overridden to modify the automatically generated MDPInfo data structure. By default, returns the given mdp_info structure unchanged.

Parameters:: mdp_info (MDPInfo) – the MDPInfo structure automatically computed by the environment.
Returns:: The modified MDPInfo data structure.

_preprocess_action(action)

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_read_data(name)

Read data form the MuJoCo data structure.

Parameters:: name (string) – A name referring to an entry contained the additional_data_spec list handed to the constructor.
Returns:: The desired data as a one-dimensional numpy array.

_simulation_pre_step()

Allows information to be accesed and changed at every intermediate step before taking a step in the mujoco simulation. Can be usefull to apply an external force/torque to the specified bodies.

ex: apply a force over X to the torso: force = [200, 0, 0] torque = [0, 0, 0] self.sim.data.xfrc_applied[self.sim.model._body_name2id[“torso”],:] = force + torque

_step_finalize(): Allows information to be accesed at the end of a step.

_step_init(obs, action): Allows information to be initialized at the start of a step.

_write_data(name, value)

Write data to the MuJoCo data structure.

Parameters:

name (string) – A name referring to an entry contained in the additional_data_spec list handed to the constructor;
value (ndarray) – The data that should be written.

static get_action_indices(model, data, actuation_spec)

Returns the action indices given the MuJoCo model, data, and actuation_spec.

Parameters:

model – MuJoCo model.
data –
MuJoCo data structure. actuation_spec (list): A list specifying the names of the joints

which should be controllable by the agent. Can be left empty when all actuators should be used;

Returns:

A list of actuator indices.

static get_action_space(action_indices, model)

Returns the action space bounding box given the action_indices and the model.

get_all_observation_keys()

A function that returns all observation keys defined in the observation specification.

Returns:: A list of observation keys.

property info: Returns: An object containing the info of the environment.

is_absorbing(obs)

Check whether the given state is an absorbing state or not.

Parameters:: obs (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static load_model(xml_file)

Takes an xml_file and compiles and loads the model.

Parameters:: xml_file (str/xml handle) – A string with a path to the xml or an Mujoco xml handle.
Returns:: Mujoco model.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reset(obs=None)

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static user_warning_raise_exception(warning)

Detects warnings in Mujoco and raises the respective exception.

Parameters:: warning – Mujoco warning.

class AirHockeyHit(random_init=False, action_penalty=0.001, init_robot_state='right', gamma=0.99, horizon=120, env_noise=False, obs_noise=False, timestep=0.004166666666666667, n_intermediate_steps=1, **viewer_params)[source]

Bases: AirHockeySingle

Class for the air hockey hitting task. The agent tries to get close to the puck if the hitting does not happen. And will get bonus reward if the robot scores a goal.

__init__(random_init=False, action_penalty=0.001, init_robot_state='right', gamma=0.99, horizon=120, env_noise=False, obs_noise=False, timestep=0.004166666666666667, n_intermediate_steps=1, **viewer_params)[source]

Constructor

Parameters:

random_init (bool, False) – If true, initialize the puck at random position.
action_penalty (float, 1e-3) – The penalty of the action on the reward at each time step
init_robot_state (string, "right") – The configuration in which the robot is initialized. “right”, “left”, “random” available.

setup(obs)[source]: A function that allows to execute setup code after an environment reset.

reward(state, action, next_state, absorbing)[source]

Compute the reward based on the given transition.

Parameters:

obs (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_obs (np.array) – the state reached after applying the given action.
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

_check_collision(group1, group2)

Check for collision between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A flag indicating whether a collision occurred between the given groups or not.

_compute_action(obs, action)

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

obs (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_create_info_dictionary(obs)

This method can be overridden to create a custom info dictionary.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The information dictionary.

_create_observation(state)

This method can be overridden to create a custom observation. Should be used to append observation which have been registered via obs_help.add_obs(self, name, o_type, length, min_value, max_value)

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_get_collision_force(group1, group2)

Returns the collision force and torques between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A 6D vector specifying the collision forces/torques[3D force + 3D torque] between the given groups. Vector of 0’s in case there was no collision. http://mujoco.org/book/programming.html#siContact

_modify_mdp_info(mdp_info)

This method can be overridden to modify the automatically generated MDPInfo data structure. By default, returns the given mdp_info structure unchanged.

Parameters:: mdp_info (MDPInfo) – the MDPInfo structure automatically computed by the environment.
Returns:: The modified MDPInfo data structure.

_modify_observation(obs)

This method can be overridden to edit the created observation. This is done after the reward and absorbing functions are evaluated. Especially useful to transform the observation into different frames. If the original observation order is not preserved, the helper functions in ObervationHelper breaks.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_preprocess_action(action)

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_read_data(name)

Read data form the MuJoCo data structure.

Parameters:: name (string) – A name referring to an entry contained the additional_data_spec list handed to the constructor.
Returns:: The desired data as a one-dimensional numpy array.

_simulation_post_step(): Allows information to be accesed at every intermediate step after taking a step in the mujoco simulation. Can be usefull to average forces over all intermediate steps.

_simulation_pre_step()

Allows information to be accesed and changed at every intermediate step before taking a step in the mujoco simulation. Can be usefull to apply an external force/torque to the specified bodies.

ex: apply a force over X to the torso: force = [200, 0, 0] torque = [0, 0, 0] self.sim.data.xfrc_applied[self.sim.model._body_name2id[“torso”],:] = force + torque

_step_finalize(): Allows information to be accesed at the end of a step.

_step_init(obs, action): Allows information to be initialized at the start of a step.

_write_data(name, value)

Write data to the MuJoCo data structure.

Parameters:

name (string) – A name referring to an entry contained in the additional_data_spec list handed to the constructor;
value (ndarray) – The data that should be written.

static get_action_indices(model, data, actuation_spec)

Returns the action indices given the MuJoCo model, data, and actuation_spec.

Parameters:

model – MuJoCo model.
data –
MuJoCo data structure. actuation_spec (list): A list specifying the names of the joints

which should be controllable by the agent. Can be left empty when all actuators should be used;

Returns:

A list of actuator indices.

static get_action_space(action_indices, model)

Returns the action space bounding box given the action_indices and the model.

get_all_observation_keys()

A function that returns all observation keys defined in the observation specification.

Returns:: A list of observation keys.

get_ee()

Getting the ee properties from the current internal state

Returns:: ([pos_x, pos_y, pos_z], [ang_vel_x, ang_vel_y, ang_vel_z, lin_vel_x, lin_vel_y, lin_vel_z])

get_puck(obs)

Getting the puck properties from the observations :param obs: The current observation

Returns:: ([pos_x, pos_y], [lin_vel_x, lin_vel_y], ang_vel_z)

property info: Returns: An object containing the info of the environment.

is_absorbing(obs)

Check whether the given state is an absorbing state or not.

Parameters:: obs (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static load_model(xml_file)

Takes an xml_file and compiles and loads the model.

Parameters:: xml_file (str/xml handle) – A string with a path to the xml or an Mujoco xml handle.
Returns:: Mujoco model.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reset(obs=None)

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static user_warning_raise_exception(warning)

Detects warnings in Mujoco and raises the respective exception.

Parameters:: warning – Mujoco warning.

class AirHockeyDefend(random_init=False, action_penalty=0.001, init_velocity_range=(1, 2.2), gamma=0.99, horizon=500, env_noise=False, obs_noise=False, timestep=0.004166666666666667, n_intermediate_steps=1, **viewer_params)[source]

Bases: AirHockeySingle

Class for the air hockey defending task. The agent tries to stop the puck at the line x=-0.6. If the puck get into the goal, it will get a punishment.

__init__(random_init=False, action_penalty=0.001, init_velocity_range=(1, 2.2), gamma=0.99, horizon=500, env_noise=False, obs_noise=False, timestep=0.004166666666666667, n_intermediate_steps=1, **viewer_params)[source]

Constructor

Parameters:

random_init (bool, False) – If true, initialize the puck at random position .
action_penalty (float, 1e-3) – The penalty of the action on the reward at each time step
init_velocity_range ((float, float), (1, 2.2)) – The range in which the initial velocity is initialized

setup(obs)[source]: A function that allows to execute setup code after an environment reset.

reward(state, action, next_state, absorbing)[source]

Compute the reward based on the given transition.

Parameters:

obs (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_obs (np.array) – the state reached after applying the given action.
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

is_absorbing(state)[source]

Check whether the given state is an absorbing state or not.

Parameters:: obs (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

_check_collision(group1, group2)

Check for collision between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A flag indicating whether a collision occurred between the given groups or not.

_compute_action(obs, action)

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

obs (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_create_info_dictionary(obs)

This method can be overridden to create a custom info dictionary.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The information dictionary.

_create_observation(state)

This method can be overridden to create a custom observation. Should be used to append observation which have been registered via obs_help.add_obs(self, name, o_type, length, min_value, max_value)

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_get_collision_force(group1, group2)

Returns the collision force and torques between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A 6D vector specifying the collision forces/torques[3D force + 3D torque] between the given groups. Vector of 0’s in case there was no collision. http://mujoco.org/book/programming.html#siContact

_modify_mdp_info(mdp_info)

This method can be overridden to modify the automatically generated MDPInfo data structure. By default, returns the given mdp_info structure unchanged.

Parameters:: mdp_info (MDPInfo) – the MDPInfo structure automatically computed by the environment.
Returns:: The modified MDPInfo data structure.

_modify_observation(obs)

This method can be overridden to edit the created observation. This is done after the reward and absorbing functions are evaluated. Especially useful to transform the observation into different frames. If the original observation order is not preserved, the helper functions in ObervationHelper breaks.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_preprocess_action(action)

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_read_data(name)

Read data form the MuJoCo data structure.

Parameters:: name (string) – A name referring to an entry contained the additional_data_spec list handed to the constructor.
Returns:: The desired data as a one-dimensional numpy array.

_simulation_post_step(): Allows information to be accesed at every intermediate step after taking a step in the mujoco simulation. Can be usefull to average forces over all intermediate steps.

_simulation_pre_step()

Allows information to be accesed and changed at every intermediate step before taking a step in the mujoco simulation. Can be usefull to apply an external force/torque to the specified bodies.

ex: apply a force over X to the torso: force = [200, 0, 0] torque = [0, 0, 0] self.sim.data.xfrc_applied[self.sim.model._body_name2id[“torso”],:] = force + torque

_step_finalize(): Allows information to be accesed at the end of a step.

_step_init(obs, action): Allows information to be initialized at the start of a step.

_write_data(name, value)

Write data to the MuJoCo data structure.

Parameters:

name (string) – A name referring to an entry contained in the additional_data_spec list handed to the constructor;
value (ndarray) – The data that should be written.

static get_action_indices(model, data, actuation_spec)

Returns the action indices given the MuJoCo model, data, and actuation_spec.

Parameters:

model – MuJoCo model.
data –
MuJoCo data structure. actuation_spec (list): A list specifying the names of the joints

which should be controllable by the agent. Can be left empty when all actuators should be used;

Returns:

A list of actuator indices.

static get_action_space(action_indices, model)

Returns the action space bounding box given the action_indices and the model.

get_all_observation_keys()

A function that returns all observation keys defined in the observation specification.

Returns:: A list of observation keys.

get_ee()

Getting the ee properties from the current internal state

Returns:: ([pos_x, pos_y, pos_z], [ang_vel_x, ang_vel_y, ang_vel_z, lin_vel_x, lin_vel_y, lin_vel_z])

get_puck(obs)

Getting the puck properties from the observations :param obs: The current observation

Returns:: ([pos_x, pos_y], [lin_vel_x, lin_vel_y], ang_vel_z)

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static load_model(xml_file)

Takes an xml_file and compiles and loads the model.

Parameters:: xml_file (str/xml handle) – A string with a path to the xml or an Mujoco xml handle.
Returns:: Mujoco model.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reset(obs=None)

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static user_warning_raise_exception(warning)

Detects warnings in Mujoco and raises the respective exception.

Parameters:: warning – Mujoco warning.

class AirHockeyPrepare(random_init=False, action_penalty=0.001, sub_problem='side', gamma=0.99, horizon=500, env_noise=False, obs_noise=False, timestep=0.004166666666666667, n_intermediate_steps=1, **viewer_params)[source]

Bases: AirHockeySingle

Class for the air hockey preparation task. The agent tries to improve the puck position to y = 0. If the agent looses control of the puck, it will get a punishment.

__init__(random_init=False, action_penalty=0.001, sub_problem='side', gamma=0.99, horizon=500, env_noise=False, obs_noise=False, timestep=0.004166666666666667, n_intermediate_steps=1, **viewer_params)[source]

Constructor

Parameters:

random_init (bool, False) – If true, initialize the puck at random position .
action_penalty (float, 1e-3) – The penalty of the action on the reward at each time step
sub_problem (string, "side") – determines which area is considered for the initial puck position. Currently “side” and “bottom” are available.

setup(obs)[source]: A function that allows to execute setup code after an environment reset.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

_check_collision(group1, group2)

Check for collision between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A flag indicating whether a collision occurred between the given groups or not.

_compute_action(obs, action)

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

obs (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_create_info_dictionary(obs)

This method can be overridden to create a custom info dictionary.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The information dictionary.

_create_observation(state)

This method can be overridden to create a custom observation. Should be used to append observation which have been registered via obs_help.add_obs(self, name, o_type, length, min_value, max_value)

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_get_collision_force(group1, group2)

Returns the collision force and torques between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A 6D vector specifying the collision forces/torques[3D force + 3D torque] between the given groups. Vector of 0’s in case there was no collision. http://mujoco.org/book/programming.html#siContact

_modify_mdp_info(mdp_info)

This method can be overridden to modify the automatically generated MDPInfo data structure. By default, returns the given mdp_info structure unchanged.

Parameters:: mdp_info (MDPInfo) – the MDPInfo structure automatically computed by the environment.
Returns:: The modified MDPInfo data structure.

_modify_observation(obs)

This method can be overridden to edit the created observation. This is done after the reward and absorbing functions are evaluated. Especially useful to transform the observation into different frames. If the original observation order is not preserved, the helper functions in ObervationHelper breaks.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_preprocess_action(action)

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_read_data(name)

Read data form the MuJoCo data structure.

Parameters:: name (string) – A name referring to an entry contained the additional_data_spec list handed to the constructor.
Returns:: The desired data as a one-dimensional numpy array.

_simulation_post_step(): Allows information to be accesed at every intermediate step after taking a step in the mujoco simulation. Can be usefull to average forces over all intermediate steps.

_simulation_pre_step()

Allows information to be accesed and changed at every intermediate step before taking a step in the mujoco simulation. Can be usefull to apply an external force/torque to the specified bodies.

ex: apply a force over X to the torso: force = [200, 0, 0] torque = [0, 0, 0] self.sim.data.xfrc_applied[self.sim.model._body_name2id[“torso”],:] = force + torque

_step_finalize(): Allows information to be accesed at the end of a step.

_step_init(obs, action): Allows information to be initialized at the start of a step.

_write_data(name, value)

Write data to the MuJoCo data structure.

Parameters:

name (string) – A name referring to an entry contained in the additional_data_spec list handed to the constructor;
value (ndarray) – The data that should be written.

static get_action_indices(model, data, actuation_spec)

Returns the action indices given the MuJoCo model, data, and actuation_spec.

Parameters:

model – MuJoCo model.
data –
MuJoCo data structure. actuation_spec (list): A list specifying the names of the joints

which should be controllable by the agent. Can be left empty when all actuators should be used;

Returns:

A list of actuator indices.

static get_action_space(action_indices, model)

Returns the action space bounding box given the action_indices and the model.

get_all_observation_keys()

A function that returns all observation keys defined in the observation specification.

Returns:: A list of observation keys.

get_ee()

Getting the ee properties from the current internal state

Returns:: ([pos_x, pos_y, pos_z], [ang_vel_x, ang_vel_y, ang_vel_z, lin_vel_x, lin_vel_y, lin_vel_z])

get_puck(obs)

Getting the puck properties from the observations :param obs: The current observation

Returns:: ([pos_x, pos_y], [lin_vel_x, lin_vel_y], ang_vel_z)

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static load_model(xml_file)

Takes an xml_file and compiles and loads the model.

Parameters:: xml_file (str/xml handle) – A string with a path to the xml or an Mujoco xml handle.
Returns:: Mujoco model.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reset(obs=None)

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

reward(state, action, next_state, absorbing)[source]

Compute the reward based on the given transition.

Parameters:

obs (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_obs (np.array) – the state reached after applying the given action.
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static user_warning_raise_exception(warning)

Detects warnings in Mujoco and raises the respective exception.

Parameters:: warning – Mujoco warning.

is_absorbing(state)[source]

Check whether the given state is an absorbing state or not.

Parameters:: obs (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

class AirHockeyRepel(random_init=False, action_penalty=0.001, init_velocity_range=(1, 2.2), gamma=0.99, horizon=500, env_noise=False, obs_noise=False, timestep=0.004166666666666667, n_intermediate_steps=1, **viewer_params)[source]

Bases: AirHockeySingle

Class for the air hockey repel task. The agent tries repel the puck to the opponent. If the puck get into the goal, it will get a punishment.

__init__(random_init=False, action_penalty=0.001, init_velocity_range=(1, 2.2), gamma=0.99, horizon=500, env_noise=False, obs_noise=False, timestep=0.004166666666666667, n_intermediate_steps=1, **viewer_params)[source]

Constructor

Parameters:

random_init (bool, False) – If true, initialize the puck at random position .
action_penalty (float, 1e-3) – The penalty of the action on the reward at each time step
init_velocity_range ((float, float), (1, 2.2)) – The range in which the initial velocity is initialized

setup(obs)[source]: A function that allows to execute setup code after an environment reset.

reward(state, action, next_state, absorbing)[source]

Compute the reward based on the given transition.

Parameters:

obs (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_obs (np.array) – the state reached after applying the given action.
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

is_absorbing(state)[source]

Check whether the given state is an absorbing state or not.

Parameters:: obs (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

_check_collision(group1, group2)

Check for collision between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A flag indicating whether a collision occurred between the given groups or not.

_compute_action(obs, action)

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

obs (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_create_info_dictionary(obs)

This method can be overridden to create a custom info dictionary.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The information dictionary.

_create_observation(state)

This method can be overridden to create a custom observation. Should be used to append observation which have been registered via obs_help.add_obs(self, name, o_type, length, min_value, max_value)

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_get_collision_force(group1, group2)

Returns the collision force and torques between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A 6D vector specifying the collision forces/torques[3D force + 3D torque] between the given groups. Vector of 0’s in case there was no collision. http://mujoco.org/book/programming.html#siContact

_modify_mdp_info(mdp_info)

This method can be overridden to modify the automatically generated MDPInfo data structure. By default, returns the given mdp_info structure unchanged.

Parameters:: mdp_info (MDPInfo) – the MDPInfo structure automatically computed by the environment.
Returns:: The modified MDPInfo data structure.

_modify_observation(obs)

This method can be overridden to edit the created observation. This is done after the reward and absorbing functions are evaluated. Especially useful to transform the observation into different frames. If the original observation order is not preserved, the helper functions in ObervationHelper breaks.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_preprocess_action(action)

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_read_data(name)

Read data form the MuJoCo data structure.

Parameters:: name (string) – A name referring to an entry contained the additional_data_spec list handed to the constructor.
Returns:: The desired data as a one-dimensional numpy array.

_simulation_post_step(): Allows information to be accesed at every intermediate step after taking a step in the mujoco simulation. Can be usefull to average forces over all intermediate steps.

_simulation_pre_step()

Allows information to be accesed and changed at every intermediate step before taking a step in the mujoco simulation. Can be usefull to apply an external force/torque to the specified bodies.

ex: apply a force over X to the torso: force = [200, 0, 0] torque = [0, 0, 0] self.sim.data.xfrc_applied[self.sim.model._body_name2id[“torso”],:] = force + torque

_step_finalize(): Allows information to be accesed at the end of a step.

_step_init(obs, action): Allows information to be initialized at the start of a step.

_write_data(name, value)

Write data to the MuJoCo data structure.

Parameters:

name (string) – A name referring to an entry contained in the additional_data_spec list handed to the constructor;
value (ndarray) – The data that should be written.

static get_action_indices(model, data, actuation_spec)

Returns the action indices given the MuJoCo model, data, and actuation_spec.

Parameters:

model – MuJoCo model.
data –
MuJoCo data structure. actuation_spec (list): A list specifying the names of the joints

which should be controllable by the agent. Can be left empty when all actuators should be used;

Returns:

A list of actuator indices.

static get_action_space(action_indices, model)

Returns the action space bounding box given the action_indices and the model.

get_all_observation_keys()

A function that returns all observation keys defined in the observation specification.

Returns:: A list of observation keys.

get_ee()

Getting the ee properties from the current internal state

Returns:: ([pos_x, pos_y, pos_z], [ang_vel_x, ang_vel_y, ang_vel_z, lin_vel_x, lin_vel_y, lin_vel_z])

get_puck(obs)

Getting the puck properties from the observations :param obs: The current observation

Returns:: ([pos_x, pos_y], [lin_vel_x, lin_vel_y], ang_vel_z)

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static load_model(xml_file)

Takes an xml_file and compiles and loads the model.

Parameters:: xml_file (str/xml handle) – A string with a path to the xml or an Mujoco xml handle.
Returns:: Mujoco model.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reset(obs=None)

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static user_warning_raise_exception(warning)

Detects warnings in Mujoco and raises the respective exception.

Parameters:: warning – Mujoco warning.

Ball In A Cup

class BallInACup[source]

Bases: MuJoCo

Mujoco simulation of Ball In A Cup task, using Barret WAM robot.

__init__()[source]: Constructor.

reward(cur_obs, action, obs, absorbing)[source]

Compute the reward based on the given transition.

Parameters:

obs (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_obs (np.array) – the state reached after applying the given action.
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

is_absorbing(state)[source]

Check whether the given state is an absorbing state or not.

Parameters:: obs (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

setup(obs)[source]: A function that allows to execute setup code after an environment reset.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

_check_collision(group1, group2)

Check for collision between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A flag indicating whether a collision occurred between the given groups or not.

_compute_action(obs, action)

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

obs (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_create_info_dictionary(obs)

This method can be overridden to create a custom info dictionary.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The information dictionary.

_create_observation(obs)

This method can be overridden to create a custom observation. Should be used to append observation which have been registered via obs_help.add_obs(self, name, o_type, length, min_value, max_value)

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_get_collision_force(group1, group2)

Returns the collision force and torques between the specified groups.

Parameters:

group1 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor;
group2 (string) – A name referring to an entry contained in the collision_groups list handed to the constructor.

Returns:

A 6D vector specifying the collision forces/torques[3D force + 3D torque] between the given groups. Vector of 0’s in case there was no collision. http://mujoco.org/book/programming.html#siContact

_modify_mdp_info(mdp_info)

This method can be overridden to modify the automatically generated MDPInfo data structure. By default, returns the given mdp_info structure unchanged.

Parameters:: mdp_info (MDPInfo) – the MDPInfo structure automatically computed by the environment.
Returns:: The modified MDPInfo data structure.

_modify_observation(obs)

This method can be overridden to edit the created observation. This is done after the reward and absorbing functions are evaluated. Especially useful to transform the observation into different frames. If the original observation order is not preserved, the helper functions in ObervationHelper breaks.

Parameters:: obs (np.ndarray) – the generated observation
Returns:: The environment observation.

_preprocess_action(action)

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_read_data(name)

Read data form the MuJoCo data structure.

Parameters:: name (string) – A name referring to an entry contained the additional_data_spec list handed to the constructor.
Returns:: The desired data as a one-dimensional numpy array.

_simulation_post_step(): Allows information to be accesed at every intermediate step after taking a step in the mujoco simulation. Can be usefull to average forces over all intermediate steps.

_simulation_pre_step()

Allows information to be accesed and changed at every intermediate step before taking a step in the mujoco simulation. Can be usefull to apply an external force/torque to the specified bodies.

ex: apply a force over X to the torso: force = [200, 0, 0] torque = [0, 0, 0] self.sim.data.xfrc_applied[self.sim.model._body_name2id[“torso”],:] = force + torque

_step_finalize(): Allows information to be accesed at the end of a step.

_step_init(obs, action): Allows information to be initialized at the start of a step.

_write_data(name, value)

Write data to the MuJoCo data structure.

Parameters:

name (string) – A name referring to an entry contained in the additional_data_spec list handed to the constructor;
value (ndarray) – The data that should be written.

static get_action_indices(model, data, actuation_spec)

Returns the action indices given the MuJoCo model, data, and actuation_spec.

Parameters:

model – MuJoCo model.
data –
MuJoCo data structure. actuation_spec (list): A list specifying the names of the joints

which should be controllable by the agent. Can be left empty when all actuators should be used;

Returns:

A list of actuator indices.

static get_action_space(action_indices, model)

Returns the action space bounding box given the action_indices and the model.

get_all_observation_keys()

A function that returns all observation keys defined in the observation specification.

Returns:: A list of observation keys.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static load_model(xml_file)

Takes an xml_file and compiles and loads the model.

Parameters:: xml_file (str/xml handle) – A string with a path to the xml or an Mujoco xml handle.
Returns:: Mujoco model.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reset(obs=None)

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static user_warning_raise_exception(warning)

Detects warnings in Mujoco and raises the respective exception.

Parameters:: warning – Mujoco warning.

Puddle World

class PuddleWorld(start=None, goal=None, goal_threshold=0.1, noise_step=0.025, noise_reward=0, reward_goal=0.0, thrust=0.05, puddle_center=None, puddle_width=None, gamma=0.99, horizon=5000)[source]

Bases: Environment

Puddle world as presented in: “Off-Policy Actor-Critic”. Degris T. et al.. 2012.

__init__(start=None, goal=None, goal_threshold=0.1, noise_step=0.025, noise_reward=0, reward_goal=0.0, thrust=0.05, puddle_center=None, puddle_width=None, gamma=0.99, horizon=5000)[source]

Constructor.

Parameters:

start (np.array, None) – starting position of the agent;
goal (np.array, None) – goal position;
goal_threshold (float, .1) – distance threshold of the agent from the goal to consider it reached;
noise_step (float, .025) – noise in actions;
noise_reward (float, 0) – standard deviation of gaussian noise in reward;
reward_goal (float, 0) – reward obtained reaching goal state;
thrust (float, .05) – distance walked during each action;
puddle_center (np.array, None) – center of the puddle;
puddle_width (np.array, None) – width of the puddle;
gamma (float, .99) – discount factor.
horizon (int, 5000) – horizon of the problem;

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

render(record=False)[source]

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

stop()[source]: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

Pybullet

class PyBullet(files, actuation_spec, observation_spec, gamma, horizon, timestep=0.004166666666666667, n_intermediate_steps=1, enforce_joint_velocity_limits=False, debug_gui=False, **viewer_params)[source]

Bases: Environment

Class to create a Mushroom environment using the PyBullet simulator.

__init__(files, actuation_spec, observation_spec, gamma, horizon, timestep=0.004166666666666667, n_intermediate_steps=1, enforce_joint_velocity_limits=False, debug_gui=False, **viewer_params)[source]

Constructor.

Parameters:

files (dict) – dictionary of the URDF/MJCF/SDF files to load (key) and parameters dictionary (value);
actuation_spec (list) – A list of tuples specifying the names of the joints which should be controllable by the agent and their control mode. Can be left empty when all actuators should be used in position control;
observation_spec (list) – A list containing the names of data that should be made available to the agent as an observation and their type (ObservationType). An entry in the list is given by: (name, type);
gamma (float) – The discounting factor of the environment;
horizon (int) – The maximum horizon for the environment;
timestep (float, 0.00416666666) – The timestep used by the PyBullet simulator;
n_intermediate_steps (int) – The number of steps between every action taken by the agent. Allows the user to modify, control and access intermediate states;
enforce_joint_velocity_limits (bool, False) – flag to enforce the velocity limits;
debug_gui (bool, False) – flag to activate the default pybullet visualizer, that can be used for debug purposes;
**viewer_params – other parameters to be passed to the viewer. See PyBulletViewer documentation for the available options.

seed(seed)[source]

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

render(record=False)[source]

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

stop()[source]: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

get_sim_state(obs, name, obs_type)[source]

Returns a specific observation value

Parameters:

obs (np.ndarray) – the observation vector;
name (str) – the name of the object to consider;
obs_type (PyBulletObservationType) – the type of observation to be used.

Returns:

The required elements of the input state vector.

_modify_mdp_info(mdp_info)[source]

This method can be overridden to modify the automatically generated MDPInfo data structure. By default, returns the given mdp_info structure unchanged.

Parameters:: mdp_info (MDPInfo) – the MDPInfo structure automatically computed by the environment.
Returns:: The modified MDPInfo data structure.

_create_observation(state)[source]

This method can be overridden to ctreate an observation vector from the simulator state vector. By default, returns the simulator state vector unchanged.

Parameters:: state (np.ndarray) – the simulator state vector.
Returns:: The environment observation.

_preprocess_action(action)[source]

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_step_init(state, action)[source]: Allows information to be initialized at the start of a step.

_compute_action(state, action)[source]

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

state (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_simulation_pre_step()[source]: Allows information to be accesed and changed at every intermediate step before taking a step in the pybullet simulation. Can be usefull to apply an external force/torque to the specified bodies.

_simulation_post_step()[source]: Allows information to be accesed at every intermediate step after taking a step in the pybullet simulation. Can be usefull to average forces over all intermediate steps.

_step_finalize()[source]: Allows information to be accesed at the end of a step.

_custom_load_models()[source]

Allows to custom load a set of objects in the simulation

Returns:: A dictionary with the names and the ids of the loaded objects

reward(state, action, next_state, absorbing)[source]

Compute the reward based on the given transition.

Parameters:

state (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_state (np.array) – the state reached after applying the given action;
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

is_absorbing(state)[source]

Check whether the given state is an absorbing state or not.

Parameters:: state (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

setup(state)[source]

A function that allows to execute setup code after an environment reset.

Parameters:

state (np.ndarray) – the state to be restored. If the state should be
environment (chosen by the) –
this (state is None. Environments can ignore) –
programmatically. (value if the initial state cannot be set) –

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

Air Hockey

class AirHockeyBaseBullet(gamma=0.99, horizon=500, n_agents=1, env_noise=False, obs_noise=False, obs_delay=False, torque_control=True, step_action_function=None, timestep=0.004166666666666667, n_intermediate_steps=1, debug_gui=False, table_boundary_terminate=False)[source]

Bases: PyBullet

Base class for air hockey environment. The environment is designed for 3 joints planar robot playing Air-Hockey

__init__(gamma=0.99, horizon=500, n_agents=1, env_noise=False, obs_noise=False, obs_delay=False, torque_control=True, step_action_function=None, timestep=0.004166666666666667, n_intermediate_steps=1, debug_gui=False, table_boundary_terminate=False)[source]

Constructor.

Parameters:

gamma (float, 0.99) – discount factor;
horizon (int, 500) – horizon of the task;
n_agents (int, 1) – number of agents;
env_noise (bool, False) – If true, the puck’s movement is affected by the air-flow noise;
obs_noise (bool, False) – If true, the noise is added in the observation;
obs_delay (bool, False) – If true, velocity is observed by the low-pass filter;
control (bool, True) – If false, the robot in position control mode;
step_action_function (object, None) – A callable function to warp-up the policy action to environment command.
table_boundary_terminate (bool, False) – Episode terminates if the mallet is outside the boundary

_compute_action(state, action)[source]

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

state (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_simulation_pre_step()[source]: Allows information to be accesed and changed at every intermediate step before taking a step in the pybullet simulation. Can be usefull to apply an external force/torque to the specified bodies.

is_absorbing(state)[source]

Check whether the given state is an absorbing state or not.

Parameters:: state (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

_create_observation(state)

This method can be overridden to ctreate an observation vector from the simulator state vector. By default, returns the simulator state vector unchanged.

Parameters:: state (np.ndarray) – the simulator state vector.
Returns:: The environment observation.

_custom_load_models()

Allows to custom load a set of objects in the simulation

Returns:: A dictionary with the names and the ids of the loaded objects

_modify_mdp_info(mdp_info)

This method can be overridden to modify the automatically generated MDPInfo data structure. By default, returns the given mdp_info structure unchanged.

Parameters:: mdp_info (MDPInfo) – the MDPInfo structure automatically computed by the environment.
Returns:: The modified MDPInfo data structure.

_preprocess_action(action)

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_simulation_post_step(): Allows information to be accesed at every intermediate step after taking a step in the pybullet simulation. Can be usefull to average forces over all intermediate steps.

_step_finalize(): Allows information to be accesed at the end of a step.

_step_init(state, action): Allows information to be initialized at the start of a step.

get_sim_state(obs, name, obs_type)

Returns a specific observation value

Parameters:

obs (np.ndarray) – the observation vector;
name (str) – the name of the object to consider;
obs_type (PyBulletObservationType) – the type of observation to be used.

Returns:

The required elements of the input state vector.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reset(state=None)

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

reward(state, action, next_state, absorbing)

Compute the reward based on the given transition.

Parameters:

state (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_state (np.array) – the state reached after applying the given action;
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

setup(state)

A function that allows to execute setup code after an environment reset.

Parameters:

state (np.ndarray) – the state to be restored. If the state should be
environment (chosen by the) –
this (state is None. Environments can ignore) –
programmatically. (value if the initial state cannot be set) –

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

class AirHockeySingleBullet(gamma=0.99, horizon=120, env_noise=False, obs_noise=False, obs_delay=False, torque_control=True, step_action_function=None, timestep=0.004166666666666667, n_intermediate_steps=1, debug_gui=False, table_boundary_terminate=False, number_flags=0)[source]

Bases: AirHockeyBaseBullet

Base class for single agent air hockey tasks.

__init__(gamma=0.99, horizon=120, env_noise=False, obs_noise=False, obs_delay=False, torque_control=True, step_action_function=None, timestep=0.004166666666666667, n_intermediate_steps=1, debug_gui=False, table_boundary_terminate=False, number_flags=0)[source]

Constructor.

Parameters:: number_flags (int, 0) – Amount of flags which are added to the observation space

_modify_mdp_info(mdp_info)[source]: puck position indexes: [0, 1] puck velocity indexes: [7, 8, 9] joint position indexes: [13, 14, 15] joint velocity indexes: [16, 17, 18]

_create_observation(state)[source]

This method can be overridden to ctreate an observation vector from the simulator state vector. By default, returns the simulator state vector unchanged.

Parameters:: state (np.ndarray) – the simulator state vector.
Returns:: The environment observation.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

_compute_action(state, action)

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

state (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_custom_load_models()

Allows to custom load a set of objects in the simulation

Returns:: A dictionary with the names and the ids of the loaded objects

_preprocess_action(action)

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_simulation_post_step(): Allows information to be accesed at every intermediate step after taking a step in the pybullet simulation. Can be usefull to average forces over all intermediate steps.

_simulation_pre_step(): Allows information to be accesed and changed at every intermediate step before taking a step in the pybullet simulation. Can be usefull to apply an external force/torque to the specified bodies.

_step_finalize(): Allows information to be accesed at the end of a step.

_step_init(state, action): Allows information to be initialized at the start of a step.

get_sim_state(obs, name, obs_type)

Returns a specific observation value

Parameters:

obs (np.ndarray) – the observation vector;
name (str) – the name of the object to consider;
obs_type (PyBulletObservationType) – the type of observation to be used.

Returns:

The required elements of the input state vector.

property info: Returns: An object containing the info of the environment.

is_absorbing(state)

Check whether the given state is an absorbing state or not.

Parameters:: state (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reset(state=None)

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

reward(state, action, next_state, absorbing)

Compute the reward based on the given transition.

Parameters:

state (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_state (np.array) – the state reached after applying the given action;
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

setup(state)

A function that allows to execute setup code after an environment reset.

Parameters:

state (np.ndarray) – the state to be restored. If the state should be
environment (chosen by the) –
this (state is None. Environments can ignore) –
programmatically. (value if the initial state cannot be set) –

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

class AirHockeyHitBullet(gamma=0.99, horizon=120, env_noise=False, obs_noise=False, obs_delay=False, torque_control=True, step_action_function=None, timestep=0.004166666666666667, n_intermediate_steps=1, debug_gui=False, random_init=False, action_penalty=0.001, table_boundary_terminate=False, init_robot_state='right')[source]

Bases: AirHockeySingleBullet

Class for the air hockey hitting task. The agent tries to get close to the puck if the hitting does not happen. And will get bonus reward if the robot scores a goal.

__init__(gamma=0.99, horizon=120, env_noise=False, obs_noise=False, obs_delay=False, torque_control=True, step_action_function=None, timestep=0.004166666666666667, n_intermediate_steps=1, debug_gui=False, random_init=False, action_penalty=0.001, table_boundary_terminate=False, init_robot_state='right')[source]

Constructor

Parameters:

random_init (bool, False) – If true, initialize the puck at random position.
action_penalty (float, 1e-3) – The penalty of the action on the reward at each time step
init_robot_state (string, "right") – The configuration in which the robot is initialized. “right”, “left”, “random” available

setup(state)[source]

A function that allows to execute setup code after an environment reset.

Parameters:

state (np.ndarray) – the state to be restored. If the state should be
environment (chosen by the) –
this (state is None. Environments can ignore) –
programmatically. (value if the initial state cannot be set) –

reward(state, action, next_state, absorbing)[source]

Compute the reward based on the given transition.

Parameters:

state (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_state (np.array) – the state reached after applying the given action;
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

is_absorbing(state)[source]

Check whether the given state is an absorbing state or not.

Parameters:: state (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

_simulation_post_step()[source]: Allows information to be accesed at every intermediate step after taking a step in the pybullet simulation. Can be usefull to average forces over all intermediate steps.

_create_observation(state)[source]

This method can be overridden to ctreate an observation vector from the simulator state vector. By default, returns the simulator state vector unchanged.

Parameters:: state (np.ndarray) – the simulator state vector.
Returns:: The environment observation.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

_compute_action(state, action)

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

state (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_custom_load_models()

Allows to custom load a set of objects in the simulation

Returns:: A dictionary with the names and the ids of the loaded objects

_modify_mdp_info(mdp_info): puck position indexes: [0, 1] puck velocity indexes: [7, 8, 9] joint position indexes: [13, 14, 15] joint velocity indexes: [16, 17, 18]

_preprocess_action(action)

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_simulation_pre_step(): Allows information to be accesed and changed at every intermediate step before taking a step in the pybullet simulation. Can be usefull to apply an external force/torque to the specified bodies.

_step_finalize(): Allows information to be accesed at the end of a step.

_step_init(state, action): Allows information to be initialized at the start of a step.

get_sim_state(obs, name, obs_type)

Returns a specific observation value

Parameters:

obs (np.ndarray) – the observation vector;
name (str) – the name of the object to consider;
obs_type (PyBulletObservationType) – the type of observation to be used.

Returns:

The required elements of the input state vector.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reset(state=None)

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

class AirHockeyDefendBullet(gamma=0.99, horizon=500, env_noise=False, obs_noise=False, obs_delay=False, torque_control=True, step_action_function=None, timestep=0.004166666666666667, n_intermediate_steps=1, debug_gui=False, random_init=False, action_penalty=0.001, table_boundary_terminate=False, init_velocity_range=(1, 2.2))[source]

Bases: AirHockeySingleBullet

Class for the air hockey defending task. The agent tries to stop the puck at the line x=-0.6. If the puck get into the goal, it will get a punishment.

__init__(gamma=0.99, horizon=500, env_noise=False, obs_noise=False, obs_delay=False, torque_control=True, step_action_function=None, timestep=0.004166666666666667, n_intermediate_steps=1, debug_gui=False, random_init=False, action_penalty=0.001, table_boundary_terminate=False, init_velocity_range=(1, 2.2))[source]

Constructor

Parameters:

random_init (bool, False) – If true, initialize the puck at random position .
action_penalty (float, 1e-3) – The penalty of the action on the reward at each time step
init_velocity_range ((float, float), (1, 2.2)) – The range in which the initial velocity is initialized

setup(state=None)[source]

A function that allows to execute setup code after an environment reset.

Parameters:

state (np.ndarray) – the state to be restored. If the state should be
environment (chosen by the) –
this (state is None. Environments can ignore) –
programmatically. (value if the initial state cannot be set) –

reward(state, action, next_state, absorbing)[source]

Compute the reward based on the given transition.

Parameters:

state (np.array) – the current state of the system;
action (np.array) – the action that is applied in the current state;
next_state (np.array) – the state reached after applying the given action;
absorbing (bool) – whether next_state is an absorbing state or not.

Returns:

The reward as a floating point scalar value.

is_absorbing(state)[source]

Check whether the given state is an absorbing state or not.

Parameters:: state (np.array) – the state of the system.
Returns:: A boolean flag indicating whether this state is absorbing or not.

_simulation_post_step()[source]: Allows information to be accesed at every intermediate step after taking a step in the pybullet simulation. Can be usefull to average forces over all intermediate steps.

_create_observation(state)[source]

This method can be overridden to ctreate an observation vector from the simulator state vector. By default, returns the simulator state vector unchanged.

Parameters:: state (np.ndarray) – the simulator state vector.
Returns:: The environment observation.

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

_compute_action(state, action)

Compute a transformation of the action at every intermediate step. Useful to add control signals simulated directly in python.

Parameters:

state (np.ndarray) – numpy array with the current state of teh simulation;
action (np.ndarray) – numpy array with the actions, provided at every step.

Returns:

The action to be set in the actual pybullet simulation.

_custom_load_models()

Allows to custom load a set of objects in the simulation

Returns:: A dictionary with the names and the ids of the loaded objects

_modify_mdp_info(mdp_info): puck position indexes: [0, 1] puck velocity indexes: [7, 8, 9] joint position indexes: [13, 14, 15] joint velocity indexes: [16, 17, 18]

_preprocess_action(action)

Compute a transformation of the action provided to the environment.

Parameters:: action (np.ndarray) – numpy array with the actions provided to the environment.
Returns:: The action to be used for the current step

_simulation_pre_step(): Allows information to be accesed and changed at every intermediate step before taking a step in the pybullet simulation. Can be usefull to apply an external force/torque to the specified bodies.

_step_finalize(): Allows information to be accesed at the end of a step.

_step_init(state, action): Allows information to be initialized at the start of a step.

get_sim_state(obs, name, obs_type)

Returns a specific observation value

Parameters:

obs (np.ndarray) – the observation vector;
name (str) – the name of the object to consider;
obs_type (PyBulletObservationType) – the type of observation to be used.

Returns:

The required elements of the input state vector.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

render(record=False)

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

reset(state=None)

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

step(action)

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

stop(): Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

Segway

class Segway(random_start=False)[source]

Bases: Environment

The Segway environment (continuous version) as presented in: “Deep Learning for Actor-Critic Reinforcement Learning”. Xueli Jia. 2015.

__init__(random_start=False)[source]

Constructor.

Parameters:: random_start (bool, False) – whether to start from a random position or from the horizontal one.

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

render(record=False)[source]

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

stop()[source]: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

Ship steering

class ShipSteering(small=True, n_steps_action=3)[source]

Bases: Environment

The Ship Steering environment as presented in: “Hierarchical Policy Gradient Algorithms”. Ghavamzadeh M. and Mahadevan S.. 2013.

__init__(small=True, n_steps_action=3)[source]

Constructor.

Parameters:

small (bool, True) – whether to use a small state space or not.
n_steps_action (int, 3) – number of integration intervals for each step of the mdp.

reset(state=None)[source]

Reset the current state.

Parameters:: state (np.ndarray, None) – the state to set to the current state.
Returns:: The current state.

step(action)[source]

Move the agent from its current state according to the action.

Parameters:: action (np.ndarray) – the action to execute.
Returns:: The state reached by the agent executing action in its current state, the reward obtained in the transition and a flag to signal if the next state is absorbing. Also, an additional dictionary is returned (possibly empty).

render(record=False)[source]

Parameters:: record (bool, False) – whether the visualized image should be returned or not.
Returns:: The visualized image, or None if the record flag is set to false.

stop()[source]: Method used to stop an mdp. Useful when dealing with real world environments, simulators, or when using openai-gym rendering

static _bound(x, min_value, max_value)

Method used to bound state and action variables.

Parameters:

x – the variable to bound;
min_value – the minimum value;
max_value – the maximum value;

Returns:

The bounded variable.

property info: Returns: An object containing the info of the environment.

static list_registered()

List registered environments.

Returns:: The list of the registered environments.

static make(env_name, *args, **kwargs)

Generate an environment given an environment name and parameters. The environment is created using the generate method, if available. Otherwise, the constructor is used. The generate method has a simpler interface than the constructor, making it easier to generate a standard version of the environment. If the environment name contains a ‘.’ separator, the string is splitted, the first element is used to select the environment and the other elements are passed as positional parameters.

Parameters:

env_name (str) – Name of the environment,
*args – positional arguments to be provided to the environment generator;
**kwargs – keyword arguments to be provided to the environment generator.

Returns:

An instance of the constructed environment.

classmethod register(): Register an environment in the environment list.

seed(seed)

Set the seed of the environment.

Parameters:: seed (float) – the value of the seed.

Generators

Grid world

generate_grid_world(grid, prob, pos_rew, neg_rew, gamma=0.9, horizon=100)[source]

This Grid World generator requires a .txt file to specify the shape of the grid world and the cells. There are five types of cells: ‘S’ is the starting position where the agent is; ‘G’ is the goal state; ‘.’ is a normal cell; ‘*’ is a hole, when the agent steps on a hole, it receives a negative reward and the episode ends; ‘#’ is a wall, when the agent is supposed to step on a wall, it actually remains in its current state. The initial states distribution is uniform among all the initial states provided.

The grid is expected to be rectangular.

Parameters:

grid (str) – the path of the file containing the grid structure;
prob (float) – probability of success of an action;
pos_rew (float) – reward obtained in goal states;
neg_rew (float) – reward obtained in “hole” states;
gamma (float, .9) – discount factor;
horizon (int, 100) – the horizon.

Returns:

A FiniteMDP object built with the provided parameters.

parse_grid(grid)[source]

Parse the grid file:

Parameters:: grid (str) – the path of the file containing the grid structure;
Returns:: A list containing the grid structure.

compute_probabilities(grid_map, cell_list, prob)[source]

Compute the transition probability matrix.

Parameters:

grid_map (list) – list containing the grid structure;
cell_list (list) – list of non-wall cells;
prob (float) – probability of success of an action.

Returns:

The transition probability matrix;

compute_reward(grid_map, cell_list, pos_rew, neg_rew)[source]

Compute the reward matrix.

Parameters:

grid_map (list) – list containing the grid structure;
cell_list (list) – list of non-wall cells;
pos_rew (float) – reward obtained in goal states;
neg_rew (float) – reward obtained in “hole” states;

Returns:

The reward matrix.

compute_mu(grid_map, cell_list)[source]

Compute the initial states distribution.

Parameters:

grid_map (list) – list containing the grid structure;
cell_list (list) – list of non-wall cells.

Returns:

The initial states distribution.

Simple chain

generate_simple_chain(state_n, goal_states, prob, rew, mu=None, gamma=0.9, horizon=100)[source]

Simple chain generator.

Parameters:

state_n (int) – number of states;
goal_states (list) – list of goal states;
prob (float) – probability of success of an action;
rew (float) – reward obtained in goal states;
mu (np.ndarray) – initial state probability distribution;
gamma (float, .9) – discount factor;
horizon (int, 100) – the horizon.

Returns:

A FiniteMDP object built with the provided parameters.

compute_probabilities(state_n, prob)[source]

Compute the transition probability matrix.

Parameters:

state_n (int) – number of states;
prob (float) – probability of success of an action.

Returns:

The transition probability matrix;

compute_reward(state_n, goal_states, rew)[source]

Compute the reward matrix.

Parameters:

state_n (int) – number of states;
goal_states (list) – list of goal states;
rew (float) – reward obtained in goal states.

Returns:

The reward matrix.

Taxi

generate_taxi(grid, prob=0.9, rew=(0, 1, 3, 15), gamma=0.99, horizon=inf)[source]

This Taxi generator requires a .txt file to specify the shape of the grid world and the cells. There are five types of cells: ‘S’ is the starting where the agent is; ‘G’ is the goal state; ‘.’ is a normal cell; ‘F’ is a passenger, when the agent steps on a hole, it picks up it. ‘#’ is a wall, when the agent is supposed to step on a wall, it actually remains in its current state. The initial states distribution is uniform among all the initial states provided. The episode terminates when the agent reaches the goal state. The reward is always 0, except for the goal state where it depends on the number of collected passengers. Each action has a certain probability of success and, if it fails, the agent goes in a perpendicular direction from the supposed one.

The grid is expected to be rectangular.

This problem is inspired from: “Bayesian Q-Learning”. Dearden R. et al.. 1998.

Parameters:

grid (str) – the path of the file containing the grid structure;
prob (float, .9) – probability of success of an action;
rew (tuple, (0, 1, 3, 15)) – rewards obtained in goal states;
gamma (float, .99) – discount factor;
horizon (int, np.inf) – the horizon.

Returns:

A FiniteMDP object built with the provided parameters.

parse_grid(grid)[source]

Parse the grid file:

Parameters:: grid (str) – the path of the file containing the grid structure.
Returns:: A list containing the grid structure.

compute_probabilities(grid_map, cell_list, passenger_list, prob)[source]

Compute the transition probability matrix.

Parameters:

grid_map (list) – list containing the grid structure;
cell_list (list) – list of non-wall cells;
passenger_list (list) – list of passenger cells;
prob (float) – probability of success of an action.

Returns:

The transition probability matrix;

compute_reward(grid_map, cell_list, passenger_list, rew)[source]

Compute the reward matrix.

Parameters:

grid_map (list) – list containing the grid structure;
cell_list (list) – list of non-wall cells;
passenger_list (list) – list of passenger cells;
rew (tuple) – rewards obtained in goal states.

Returns:

The reward matrix.

compute_mu(grid_map, cell_list, passenger_list)[source]

Compute the initial states distribution.

Parameters:

grid_map (list) – list containing the grid structure;
cell_list (list) – list of non-wall cells;
passenger_list (list) – list of passenger cells.

Returns:

The initial states distribution.