Policy search
Policy gradient
- class REINFORCE(mdp_info, policy, optimizer)[source]
Bases:
PolicyGradient
REINFORCE algorithm. “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning”, Williams R. J.. 1992.
- __init__(mdp_info, policy, optimizer)[source]
Constructor.
- Parameters:
optimizer – the gradient optimizer.
- _compute_gradient(J)[source]
Return the gradient computed by the algorithm.
- Parameters:
J (list) – list of the cumulative discounted rewards for each episode in the dataset.
- Returns:
The gradient computed by the algorithm.
- _step_update(x, u, r)[source]
This function is called, when parsing the dataset, at each episode step.
- Parameters:
x (np.ndarray) – the state at the current step;
u (np.ndarray) – the action at the current step;
r (np.ndarray) – the reward at the current step.
- _episode_end_update()[source]
This function is called, when parsing the dataset, at the beginning of each episode. The implementation is dependent on the algorithm (e.g. REINFORCE updates some data structures).
- _init_update()[source]
This function is called, when parsing the dataset, at the beginning of each episode. The implementation is dependent on the algorithm (e.g. REINFORCE resets some data structure).
- _add_save_attr(**attr_dict)
Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.
- Parameters:
**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.
- _agent_preprocess(state)
Applies all the agent’s preprocessors to the state.
- Parameters:
state (Array) – the state where the agent is;
- Returns:
The preprocessed state.
- _parse(sample)
Utility to parse the sample.
- Parameters:
sample (list) – the current episode step.
- Returns:
A tuple containing state, action, reward, next state, absorbing and last flag. If provided,
state
is preprocessed with the features.
- _post_load()
This method can be overwritten to implement logic that is executed after the loading of the agent.
- _update_agent_preprocessor(state)
Updates the stats of all the agent’s preprocessors given the state.
- Parameters:
state (Array) – the state where the agent is;
- _update_parameters(J)
Update the parameters of the policy.
- Parameters:
J (list) – list of the cumulative discounted rewards for each episode in the dataset.
- add_agent_preprocessor(preprocessor)
Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.
- Parameters:
preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.
- add_core_preprocessor(preprocessor)
Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.
- Parameters:
preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.
- copy()
- Returns:
A deepcopy of the agent.
- property core_preprocessors
Access to core’s state preprocessors stored in the agent.
- draw_action(state, policy_state=None)
Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).
- Parameters:
state – the state where the agent is;
policy_state – the policy internal state.
- Returns:
The action to be executed.
- episode_start(initial_state, episode_info)
Called by the Core when a new episode starts.
- Parameters:
initial_state (Array) – vector representing the initial state of the environment.
episode_info (dict) – a dictionary containing the information at reset, such as context.
- Returns:
A tuple containing the policy initial state and, optionally, the policy parameters
- episode_start_vectorized(initial_states, episode_info, start_mask)
Called by the VectorCore when a new episode starts.
- Parameters:
initial_states (Array) – the initial states of the environment.
episode_info (dict) – a dictionary containing the information at reset, such as context;
start_mask (Array) – boolean mask to select the environments that are starting a new episode
- Returns:
A tuple containing the policy initial states and, optionally, the policy parameters
- fit(dataset)
Fit step.
- Parameters:
dataset (Dataset) – the dataset.
- classmethod load(path)
Load and deserialize the agent from the given location on disk.
- Parameters:
path (Path, string) – Relative or absolute path to the agents save location.
- Returns:
The loaded agent.
- save(path, full_save=False)
Serialize and save the object to the given path on disk.
- Parameters:
path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.
- save_zip(zip_file, full_save, folder='')
Serialize and save the agent to the given path on disk.
- Parameters:
zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.
- set_logger(logger)
Setter that can be used to pass a logger to the algorithm
- Parameters:
logger (Logger) – the logger to be used by the algorithm.
- stop()
Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.
- class GPOMDP(mdp_info, policy, optimizer)[source]
Bases:
PolicyGradient
GPOMDP algorithm. “Infinite-Horizon Policy-Gradient Estimation”. Baxter J. and Bartlett P. L.. 2001.
- __init__(mdp_info, policy, optimizer)[source]
Constructor.
- Parameters:
optimizer – the gradient optimizer.
- _compute_gradient(J)[source]
Return the gradient computed by the algorithm.
- Parameters:
J (list) – list of the cumulative discounted rewards for each episode in the dataset.
- Returns:
The gradient computed by the algorithm.
- _step_update(x, u, r)[source]
This function is called, when parsing the dataset, at each episode step.
- Parameters:
x (np.ndarray) – the state at the current step;
u (np.ndarray) – the action at the current step;
r (np.ndarray) – the reward at the current step.
- _episode_end_update()[source]
This function is called, when parsing the dataset, at the beginning of each episode. The implementation is dependent on the algorithm (e.g. REINFORCE updates some data structures).
- _init_update()[source]
This function is called, when parsing the dataset, at the beginning of each episode. The implementation is dependent on the algorithm (e.g. REINFORCE resets some data structure).
- _add_save_attr(**attr_dict)
Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.
- Parameters:
**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.
- _agent_preprocess(state)
Applies all the agent’s preprocessors to the state.
- Parameters:
state (Array) – the state where the agent is;
- Returns:
The preprocessed state.
- _parse(sample)
Utility to parse the sample.
- Parameters:
sample (list) – the current episode step.
- Returns:
A tuple containing state, action, reward, next state, absorbing and last flag. If provided,
state
is preprocessed with the features.
- _post_load()
This method can be overwritten to implement logic that is executed after the loading of the agent.
- _update_agent_preprocessor(state)
Updates the stats of all the agent’s preprocessors given the state.
- Parameters:
state (Array) – the state where the agent is;
- _update_parameters(J)
Update the parameters of the policy.
- Parameters:
J (list) – list of the cumulative discounted rewards for each episode in the dataset.
- add_agent_preprocessor(preprocessor)
Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.
- Parameters:
preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.
- add_core_preprocessor(preprocessor)
Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.
- Parameters:
preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.
- copy()
- Returns:
A deepcopy of the agent.
- property core_preprocessors
Access to core’s state preprocessors stored in the agent.
- draw_action(state, policy_state=None)
Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).
- Parameters:
state – the state where the agent is;
policy_state – the policy internal state.
- Returns:
The action to be executed.
- episode_start(initial_state, episode_info)
Called by the Core when a new episode starts.
- Parameters:
initial_state (Array) – vector representing the initial state of the environment.
episode_info (dict) – a dictionary containing the information at reset, such as context.
- Returns:
A tuple containing the policy initial state and, optionally, the policy parameters
- episode_start_vectorized(initial_states, episode_info, start_mask)
Called by the VectorCore when a new episode starts.
- Parameters:
initial_states (Array) – the initial states of the environment.
episode_info (dict) – a dictionary containing the information at reset, such as context;
start_mask (Array) – boolean mask to select the environments that are starting a new episode
- Returns:
A tuple containing the policy initial states and, optionally, the policy parameters
- fit(dataset)
Fit step.
- Parameters:
dataset (Dataset) – the dataset.
- classmethod load(path)
Load and deserialize the agent from the given location on disk.
- Parameters:
path (Path, string) – Relative or absolute path to the agents save location.
- Returns:
The loaded agent.
- save(path, full_save=False)
Serialize and save the object to the given path on disk.
- Parameters:
path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.
- save_zip(zip_file, full_save, folder='')
Serialize and save the agent to the given path on disk.
- Parameters:
zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.
- set_logger(logger)
Setter that can be used to pass a logger to the algorithm
- Parameters:
logger (Logger) – the logger to be used by the algorithm.
- stop()
Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.
- class eNAC(mdp_info, policy, optimizer, critic_features=None)[source]
Bases:
PolicyGradient
Episodic Natural Actor Critic algorithm. “A Survey on Policy Search for Robotics”, Deisenroth M. P., Neumann G., Peters J. 2013.
- __init__(mdp_info, policy, optimizer, critic_features=None)[source]
Constructor.
- Parameters:
critic_features (Features, None) – features used by the critic.
- _compute_gradient(J)[source]
Return the gradient computed by the algorithm.
- Parameters:
J (list) – list of the cumulative discounted rewards for each episode in the dataset.
- Returns:
The gradient computed by the algorithm.
- _step_update(x, u, r)[source]
This function is called, when parsing the dataset, at each episode step.
- Parameters:
x (np.ndarray) – the state at the current step;
u (np.ndarray) – the action at the current step;
r (np.ndarray) – the reward at the current step.
- _episode_end_update()[source]
This function is called, when parsing the dataset, at the beginning of each episode. The implementation is dependent on the algorithm (e.g. REINFORCE updates some data structures).
- _init_update()[source]
This function is called, when parsing the dataset, at the beginning of each episode. The implementation is dependent on the algorithm (e.g. REINFORCE resets some data structure).
- _add_save_attr(**attr_dict)
Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.
- Parameters:
**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.
- _agent_preprocess(state)
Applies all the agent’s preprocessors to the state.
- Parameters:
state (Array) – the state where the agent is;
- Returns:
The preprocessed state.
- _parse(sample)
Utility to parse the sample.
- Parameters:
sample (list) – the current episode step.
- Returns:
A tuple containing state, action, reward, next state, absorbing and last flag. If provided,
state
is preprocessed with the features.
- _post_load()
This method can be overwritten to implement logic that is executed after the loading of the agent.
- _update_agent_preprocessor(state)
Updates the stats of all the agent’s preprocessors given the state.
- Parameters:
state (Array) – the state where the agent is;
- _update_parameters(J)
Update the parameters of the policy.
- Parameters:
J (list) – list of the cumulative discounted rewards for each episode in the dataset.
- add_agent_preprocessor(preprocessor)
Add preprocessor to the agent’s preprocessor list. The preprocessors are applied in order.
- Parameters:
preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.
- add_core_preprocessor(preprocessor)
Add preprocessor to the core’s preprocessor list. The preprocessors are applied in order.
- Parameters:
preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.
- copy()
- Returns:
A deepcopy of the agent.
- property core_preprocessors
Access to core’s state preprocessors stored in the agent.
- draw_action(state, policy_state=None)
Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).
- Parameters:
state – the state where the agent is;
policy_state – the policy internal state.
- Returns:
The action to be executed.
- episode_start(initial_state, episode_info)
Called by the Core when a new episode starts.
- Parameters:
initial_state (Array) – vector representing the initial state of the environment.
episode_info (dict) – a dictionary containing the information at reset, such as context.
- Returns:
A tuple containing the policy initial state and, optionally, the policy parameters
- episode_start_vectorized(initial_states, episode_info, start_mask)
Called by the VectorCore when a new episode starts.
- Parameters:
initial_states (Array) – the initial states of the environment.
episode_info (dict) – a dictionary containing the information at reset, such as context;
start_mask (Array) – boolean mask to select the environments that are starting a new episode
- Returns:
A tuple containing the policy initial states and, optionally, the policy parameters
- fit(dataset)
Fit step.
- Parameters:
dataset (Dataset) – the dataset.
- classmethod load(path)
Load and deserialize the agent from the given location on disk.
- Parameters:
path (Path, string) – Relative or absolute path to the agents save location.
- Returns:
The loaded agent.
- save(path, full_save=False)
Serialize and save the object to the given path on disk.
- Parameters:
path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.
- save_zip(zip_file, full_save, folder='')
Serialize and save the agent to the given path on disk.
- Parameters:
zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.
- set_logger(logger)
Setter that can be used to pass a logger to the algorithm
- Parameters:
logger (Logger) – the logger to be used by the algorithm.
- stop()
Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.