Policy search
Policy gradient
- class REINFORCE(mdp_info, policy, optimizer, features=None)[source]
Bases:
PolicyGradient
REINFORCE algorithm. “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning”, Williams R. J.. 1992.
- __init__(mdp_info, policy, optimizer, features=None)[source]
Constructor.
- Parameters:
optimizer – the gradient optimizer.
- _compute_gradient(J)[source]
Return the gradient computed by the algorithm.
- Parameters:
J (list) – list of the cumulative discounted rewards for each episode in the dataset.
- Returns:
The gradient computed by the algorithm.
- _step_update(x, u, r)[source]
This function is called, when parsing the dataset, at each episode step.
- Parameters:
x (np.ndarray) – the state at the current step;
u (np.ndarray) – the action at the current step;
r (np.ndarray) – the reward at the current step.
- _episode_end_update()[source]
This function is called, when parsing the dataset, at the beginning of each episode. The implementation is dependent on the algorithm (e.g. REINFORCE updates some data structures).
- _init_update()[source]
This function is called, when parsing the dataset, at the beginning of each episode. The implementation is dependent on the algorithm (e.g. REINFORCE resets some data structure).
- _add_save_attr(**attr_dict)
Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.
- Parameters:
**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.
- _parse(sample)
Utility to parse the sample.
- Parameters:
sample (list) – the current episode step.
- Returns:
A tuple containing state, action, reward, next state, absorbing and last flag. If provided,
state
is preprocessed with the features.
- _post_load()
This method can be overwritten to implement logic that is executed after the loading of the agent.
- _update_parameters(J)
Update the parameters of the policy.
- Parameters:
J (list) – list of the cumulative discounted rewards for each episode in the dataset.
- add_preprocessor(preprocessor)
Add preprocessor to the preprocessor list. The preprocessors are applied in order.
- Parameters:
preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.
- copy()
- Returns:
A deepcopy of the agent.
- draw_action(state)
Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).
- Parameters:
state (np.ndarray) – the state where the agent is.
- Returns:
The action to be executed.
- episode_start()
Called by the agent when a new episode starts.
- fit(dataset, **info)
Fit step.
- Parameters:
dataset (list) – the dataset.
- classmethod load(path)
Load and deserialize the agent from the given location on disk.
- Parameters:
path (Path, string) – Relative or absolute path to the agents save location.
- Returns:
The loaded agent.
- property preprocessors
Access to state preprocessors stored in the agent.
- save(path, full_save=False)
Serialize and save the object to the given path on disk.
- Parameters:
path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.
- save_zip(zip_file, full_save, folder='')
Serialize and save the agent to the given path on disk.
- Parameters:
zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.
- set_logger(logger)
Setter that can be used to pass a logger to the algorithm
- Parameters:
logger (Logger) – the logger to be used by the algorithm.
- stop()
Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.
- class GPOMDP(mdp_info, policy, optimizer, features=None)[source]
Bases:
PolicyGradient
GPOMDP algorithm. “Infinite-Horizon Policy-Gradient Estimation”. Baxter J. and Bartlett P. L.. 2001.
- __init__(mdp_info, policy, optimizer, features=None)[source]
Constructor.
- Parameters:
optimizer – the gradient optimizer.
- _compute_gradient(J)[source]
Return the gradient computed by the algorithm.
- Parameters:
J (list) – list of the cumulative discounted rewards for each episode in the dataset.
- Returns:
The gradient computed by the algorithm.
- _step_update(x, u, r)[source]
This function is called, when parsing the dataset, at each episode step.
- Parameters:
x (np.ndarray) – the state at the current step;
u (np.ndarray) – the action at the current step;
r (np.ndarray) – the reward at the current step.
- _episode_end_update()[source]
This function is called, when parsing the dataset, at the beginning of each episode. The implementation is dependent on the algorithm (e.g. REINFORCE updates some data structures).
- _init_update()[source]
This function is called, when parsing the dataset, at the beginning of each episode. The implementation is dependent on the algorithm (e.g. REINFORCE resets some data structure).
- _add_save_attr(**attr_dict)
Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.
- Parameters:
**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.
- _parse(sample)
Utility to parse the sample.
- Parameters:
sample (list) – the current episode step.
- Returns:
A tuple containing state, action, reward, next state, absorbing and last flag. If provided,
state
is preprocessed with the features.
- _post_load()
This method can be overwritten to implement logic that is executed after the loading of the agent.
- _update_parameters(J)
Update the parameters of the policy.
- Parameters:
J (list) – list of the cumulative discounted rewards for each episode in the dataset.
- add_preprocessor(preprocessor)
Add preprocessor to the preprocessor list. The preprocessors are applied in order.
- Parameters:
preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.
- copy()
- Returns:
A deepcopy of the agent.
- draw_action(state)
Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).
- Parameters:
state (np.ndarray) – the state where the agent is.
- Returns:
The action to be executed.
- episode_start()
Called by the agent when a new episode starts.
- fit(dataset, **info)
Fit step.
- Parameters:
dataset (list) – the dataset.
- classmethod load(path)
Load and deserialize the agent from the given location on disk.
- Parameters:
path (Path, string) – Relative or absolute path to the agents save location.
- Returns:
The loaded agent.
- property preprocessors
Access to state preprocessors stored in the agent.
- save(path, full_save=False)
Serialize and save the object to the given path on disk.
- Parameters:
path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.
- save_zip(zip_file, full_save, folder='')
Serialize and save the agent to the given path on disk.
- Parameters:
zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.
- set_logger(logger)
Setter that can be used to pass a logger to the algorithm
- Parameters:
logger (Logger) – the logger to be used by the algorithm.
- stop()
Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.
- class eNAC(mdp_info, policy, optimizer, features=None, critic_features=None)[source]
Bases:
PolicyGradient
Episodic Natural Actor Critic algorithm. “A Survey on Policy Search for Robotics”, Deisenroth M. P., Neumann G., Peters J. 2013.
- __init__(mdp_info, policy, optimizer, features=None, critic_features=None)[source]
Constructor.
- Parameters:
critic_features (Features, None) – features used by the critic.
- _compute_gradient(J)[source]
Return the gradient computed by the algorithm.
- Parameters:
J (list) – list of the cumulative discounted rewards for each episode in the dataset.
- Returns:
The gradient computed by the algorithm.
- _step_update(x, u, r)[source]
This function is called, when parsing the dataset, at each episode step.
- Parameters:
x (np.ndarray) – the state at the current step;
u (np.ndarray) – the action at the current step;
r (np.ndarray) – the reward at the current step.
- _episode_end_update()[source]
This function is called, when parsing the dataset, at the beginning of each episode. The implementation is dependent on the algorithm (e.g. REINFORCE updates some data structures).
- _init_update()[source]
This function is called, when parsing the dataset, at the beginning of each episode. The implementation is dependent on the algorithm (e.g. REINFORCE resets some data structure).
- _add_save_attr(**attr_dict)
Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.
- Parameters:
**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.
- _parse(sample)
Utility to parse the sample.
- Parameters:
sample (list) – the current episode step.
- Returns:
A tuple containing state, action, reward, next state, absorbing and last flag. If provided,
state
is preprocessed with the features.
- _post_load()
This method can be overwritten to implement logic that is executed after the loading of the agent.
- _update_parameters(J)
Update the parameters of the policy.
- Parameters:
J (list) – list of the cumulative discounted rewards for each episode in the dataset.
- add_preprocessor(preprocessor)
Add preprocessor to the preprocessor list. The preprocessors are applied in order.
- Parameters:
preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.
- copy()
- Returns:
A deepcopy of the agent.
- draw_action(state)
Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).
- Parameters:
state (np.ndarray) – the state where the agent is.
- Returns:
The action to be executed.
- episode_start()
Called by the agent when a new episode starts.
- fit(dataset, **info)
Fit step.
- Parameters:
dataset (list) – the dataset.
- classmethod load(path)
Load and deserialize the agent from the given location on disk.
- Parameters:
path (Path, string) – Relative or absolute path to the agents save location.
- Returns:
The loaded agent.
- property preprocessors
Access to state preprocessors stored in the agent.
- save(path, full_save=False)
Serialize and save the object to the given path on disk.
- Parameters:
path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.
- save_zip(zip_file, full_save, folder='')
Serialize and save the agent to the given path on disk.
- Parameters:
zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.
- set_logger(logger)
Setter that can be used to pass a logger to the algorithm
- Parameters:
logger (Logger) – the logger to be used by the algorithm.
- stop()
Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.
Black-Box optimization
- class RWR(mdp_info, distribution, policy, beta, features=None)[source]
Bases:
BlackBoxOptimization
Reward-Weighted Regression algorithm. “A Survey on Policy Search for Robotics”, Deisenroth M. P., Neumann G., Peters J.. 2013.
- __init__(mdp_info, distribution, policy, beta, features=None)[source]
Constructor.
- Parameters:
beta ([float, Parameter]) – the temperature for the exponential reward transformation.
- _update(Jep, theta)[source]
Function that implements the update routine of distribution parameters. Every black box algorithms should implement this function with the proper update.
- Parameters:
Jep (np.ndarray) – a vector containing the J of the considered trajectories;
theta (np.ndarray) – a matrix of policy parameters of the considered trajectories.
- _add_save_attr(**attr_dict)
Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.
- Parameters:
**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.
- _post_load()
This method can be overwritten to implement logic that is executed after the loading of the agent.
- add_preprocessor(preprocessor)
Add preprocessor to the preprocessor list. The preprocessors are applied in order.
- Parameters:
preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.
- copy()
- Returns:
A deepcopy of the agent.
- draw_action(state)
Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).
- Parameters:
state (np.ndarray) – the state where the agent is.
- Returns:
The action to be executed.
- episode_start()
Called by the agent when a new episode starts.
- fit(dataset, **info)
Fit step.
- Parameters:
dataset (list) – the dataset.
- classmethod load(path)
Load and deserialize the agent from the given location on disk.
- Parameters:
path (Path, string) – Relative or absolute path to the agents save location.
- Returns:
The loaded agent.
- property preprocessors
Access to state preprocessors stored in the agent.
- save(path, full_save=False)
Serialize and save the object to the given path on disk.
- Parameters:
path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.
- save_zip(zip_file, full_save, folder='')
Serialize and save the agent to the given path on disk.
- Parameters:
zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.
- set_logger(logger)
Setter that can be used to pass a logger to the algorithm
- Parameters:
logger (Logger) – the logger to be used by the algorithm.
- stop()
Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.
- class PGPE(mdp_info, distribution, policy, optimizer, features=None)[source]
Bases:
BlackBoxOptimization
Policy Gradient with Parameter Exploration algorithm. “A Survey on Policy Search for Robotics”, Deisenroth M. P., Neumann G., Peters J.. 2013.
- __init__(mdp_info, distribution, policy, optimizer, features=None)[source]
Constructor.
- Parameters:
optimizer – the gradient step optimizer.
- _update(Jep, theta)[source]
Function that implements the update routine of distribution parameters. Every black box algorithms should implement this function with the proper update.
- Parameters:
Jep (np.ndarray) – a vector containing the J of the considered trajectories;
theta (np.ndarray) – a matrix of policy parameters of the considered trajectories.
- _add_save_attr(**attr_dict)
Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.
- Parameters:
**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.
- _post_load()
This method can be overwritten to implement logic that is executed after the loading of the agent.
- add_preprocessor(preprocessor)
Add preprocessor to the preprocessor list. The preprocessors are applied in order.
- Parameters:
preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.
- copy()
- Returns:
A deepcopy of the agent.
- draw_action(state)
Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).
- Parameters:
state (np.ndarray) – the state where the agent is.
- Returns:
The action to be executed.
- episode_start()
Called by the agent when a new episode starts.
- fit(dataset, **info)
Fit step.
- Parameters:
dataset (list) – the dataset.
- classmethod load(path)
Load and deserialize the agent from the given location on disk.
- Parameters:
path (Path, string) – Relative or absolute path to the agents save location.
- Returns:
The loaded agent.
- property preprocessors
Access to state preprocessors stored in the agent.
- save(path, full_save=False)
Serialize and save the object to the given path on disk.
- Parameters:
path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.
- save_zip(zip_file, full_save, folder='')
Serialize and save the agent to the given path on disk.
- Parameters:
zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.
- set_logger(logger)
Setter that can be used to pass a logger to the algorithm
- Parameters:
logger (Logger) – the logger to be used by the algorithm.
- stop()
Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.
- class REPS(mdp_info, distribution, policy, eps, features=None)[source]
Bases:
BlackBoxOptimization
Episodic Relative Entropy Policy Search algorithm. “A Survey on Policy Search for Robotics”, Deisenroth M. P., Neumann G., Peters J.. 2013.
- __init__(mdp_info, distribution, policy, eps, features=None)[source]
Constructor.
- Parameters:
eps ([float, Parameter]) – the maximum admissible value for the Kullback-Leibler divergence between the new distribution and the previous one at each update step.
- _update(Jep, theta)[source]
Function that implements the update routine of distribution parameters. Every black box algorithms should implement this function with the proper update.
- Parameters:
Jep (np.ndarray) – a vector containing the J of the considered trajectories;
theta (np.ndarray) – a matrix of policy parameters of the considered trajectories.
- _add_save_attr(**attr_dict)
Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.
- Parameters:
**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.
- _post_load()
This method can be overwritten to implement logic that is executed after the loading of the agent.
- add_preprocessor(preprocessor)
Add preprocessor to the preprocessor list. The preprocessors are applied in order.
- Parameters:
preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.
- copy()
- Returns:
A deepcopy of the agent.
- draw_action(state)
Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).
- Parameters:
state (np.ndarray) – the state where the agent is.
- Returns:
The action to be executed.
- episode_start()
Called by the agent when a new episode starts.
- fit(dataset, **info)
Fit step.
- Parameters:
dataset (list) – the dataset.
- classmethod load(path)
Load and deserialize the agent from the given location on disk.
- Parameters:
path (Path, string) – Relative or absolute path to the agents save location.
- Returns:
The loaded agent.
- property preprocessors
Access to state preprocessors stored in the agent.
- save(path, full_save=False)
Serialize and save the object to the given path on disk.
- Parameters:
path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.
- save_zip(zip_file, full_save, folder='')
Serialize and save the agent to the given path on disk.
- Parameters:
zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.
- set_logger(logger)
Setter that can be used to pass a logger to the algorithm
- Parameters:
logger (Logger) – the logger to be used by the algorithm.
- stop()
Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.
- class ConstrainedREPS(mdp_info, distribution, policy, eps, kappa, features=None)[source]
Bases:
BlackBoxOptimization
Episodic Relative Entropy Policy Search algorithm with constrained policy update.
- __init__(mdp_info, distribution, policy, eps, kappa, features=None)[source]
Constructor.
- Parameters:
eps ([float, Parameter]) – the maximum admissible value for the Kullback-Leibler divergence between the new distribution and the previous one at each update step.
kappa ([float, Parameter]) – the maximum admissible value for the entropy decrease between the new distribution and the previous one at each update step.
- _update(Jep, theta)[source]
Function that implements the update routine of distribution parameters. Every black box algorithms should implement this function with the proper update.
- Parameters:
Jep (np.ndarray) – a vector containing the J of the considered trajectories;
theta (np.ndarray) – a matrix of policy parameters of the considered trajectories.
- _add_save_attr(**attr_dict)
Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.
- Parameters:
**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.
- _post_load()
This method can be overwritten to implement logic that is executed after the loading of the agent.
- add_preprocessor(preprocessor)
Add preprocessor to the preprocessor list. The preprocessors are applied in order.
- Parameters:
preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.
- copy()
- Returns:
A deepcopy of the agent.
- draw_action(state)
Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).
- Parameters:
state (np.ndarray) – the state where the agent is.
- Returns:
The action to be executed.
- episode_start()
Called by the agent when a new episode starts.
- fit(dataset, **info)
Fit step.
- Parameters:
dataset (list) – the dataset.
- classmethod load(path)
Load and deserialize the agent from the given location on disk.
- Parameters:
path (Path, string) – Relative or absolute path to the agents save location.
- Returns:
The loaded agent.
- property preprocessors
Access to state preprocessors stored in the agent.
- save(path, full_save=False)
Serialize and save the object to the given path on disk.
- Parameters:
path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.
- save_zip(zip_file, full_save, folder='')
Serialize and save the agent to the given path on disk.
- Parameters:
zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.
- set_logger(logger)
Setter that can be used to pass a logger to the algorithm
- Parameters:
logger (Logger) – the logger to be used by the algorithm.
- stop()
Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.
- class MORE(mdp_info, distribution, policy, eps, h0=-75, kappa=0.99, features=None)[source]
Bases:
BlackBoxOptimization
Model-Based Relative Entropy Stochastic Search algorithm. “Model-Based Relative Entropy Stochastic Search”, Abdolmaleki, Abbas and Lioutikov, Rudolf and Peters, Jan R and Lau, Nuno and Pualo Reis, Luis and Neumann, Gerhard. 2015.
- __init__(mdp_info, distribution, policy, eps, h0=-75, kappa=0.99, features=None)[source]
Constructor.
- Parameters:
distribution (GaussianCholeskyDistribution) – the distribution of policy parameters.
eps ([float, Parameter]) – the maximum admissible value for the Kullback-Leibler divergence between the new distribution and the previous one at each update step.
h0 ([float, Parameter]) – minimum exploration policy.
kappa ([float, Parameter]) – regularization parameter for the entropy decrease.
- _update(Jep, theta)[source]
Function that implements the update routine of distribution parameters. Every black box algorithms should implement this function with the proper update.
- Parameters:
Jep (np.ndarray) – a vector containing the J of the considered trajectories;
theta (np.ndarray) – a matrix of policy parameters of the considered trajectories.
- _add_save_attr(**attr_dict)
Add attributes that should be saved for an agent. For every attribute, it is necessary to specify the method to be used to save and load. Available methods are: numpy, mushroom, torch, json, pickle, primitive and none. The primitive method can be used to store primitive attributes, while the none method always skip the attribute, but ensure that it is initialized to None after the load. The mushroom method can be used with classes that implement the Serializable interface. All the other methods use the library named. If a “!” character is added at the end of the method, the field will be saved only if full_save is set to True.
- Parameters:
**attr_dict – dictionary of attributes mapped to the method that should be used to save and load them.
- _post_load()
This method can be overwritten to implement logic that is executed after the loading of the agent.
- add_preprocessor(preprocessor)
Add preprocessor to the preprocessor list. The preprocessors are applied in order.
- Parameters:
preprocessor (object) – state preprocessors to be applied to state variables before feeding them to the agent.
- copy()
- Returns:
A deepcopy of the agent.
- draw_action(state)
Return the action to execute in the given state. It is the action returned by the policy or the action set by the algorithm (e.g. in the case of SARSA).
- Parameters:
state (np.ndarray) – the state where the agent is.
- Returns:
The action to be executed.
- episode_start()
Called by the agent when a new episode starts.
- fit(dataset, **info)
Fit step.
- Parameters:
dataset (list) – the dataset.
- classmethod load(path)
Load and deserialize the agent from the given location on disk.
- Parameters:
path (Path, string) – Relative or absolute path to the agents save location.
- Returns:
The loaded agent.
- property preprocessors
Access to state preprocessors stored in the agent.
- save(path, full_save=False)
Serialize and save the object to the given path on disk.
- Parameters:
path (Path, str) – Relative or absolute path to the object save location;
full_save (bool) – Flag to specify the amount of data to save for MushroomRL data structures.
- save_zip(zip_file, full_save, folder='')
Serialize and save the agent to the given path on disk.
- Parameters:
zip_file (ZipFile) – ZipFile where te object needs to be saved;
full_save (bool) – flag to specify the amount of data to save for MushroomRL data structures;
folder (string, '') – subfolder to be used by the save method.
- set_logger(logger)
Setter that can be used to pass a logger to the algorithm
- Parameters:
logger (Logger) – the logger to be used by the algorithm.
- stop()
Method used to stop an agent. Useful when dealing with real world environments, simulators, or to cleanup environments internals after a core learn/evaluate to enforce consistency.