Solvers¶
Dynamic programming¶
-
mushroom_rl.solvers.dynamic_programming.
value_iteration
(prob, reward, gamma, eps)[source]¶ Value iteration algorithm to solve a dynamic programming problem.
Parameters: - prob (np.ndarray) – transition probability matrix;
- reward (np.ndarray) – reward matrix;
- gamma (float) – discount factor;
- eps (float) – accuracy threshold.
Returns: The optimal value of each state.
-
mushroom_rl.solvers.dynamic_programming.
policy_iteration
(prob, reward, gamma)[source]¶ Policy iteration algorithm to solve a dynamic programming problem.
Parameters: - prob (np.ndarray) – transition probability matrix;
- reward (np.ndarray) – reward matrix;
- gamma (float) – discount factor.
Returns: The optimal value of each state and the optimal policy.
Car-On-Hill brute-force solver¶
-
mushroom_rl.solvers.car_on_hill.
step
(mdp, state, action)[source]¶ Perform a step in the tree.
Parameters: - mdp (CarOnHill) – the Car-On-Hill environment;
- state (np.array) – the state;
- action (np.array) – the action.
Returns: The resulting transition executing
action
instate
.
-
mushroom_rl.solvers.car_on_hill.
bfs
(mdp, frontier, k, max_k)[source]¶ Perform Breadth-First tree search.
Parameters: - mdp (CarOnHill) – the Car-On-Hill environment;
- frontier (list) – the state at the frontier of the BFS;
- k (int) – the current depth of the tree;
- max_k (int) – maximum depth to consider.
Returns: A tuple containing a flag for the algorithm ending, and the updated depth of the tree.
-
mushroom_rl.solvers.car_on_hill.
solve_car_on_hill
(mdp, states, actions, gamma, max_k=50)[source]¶ Solver of the Car-On-Hill environment.
Parameters: - mdp (CarOnHill) – the Car-On-Hill environment;
- states (np.ndarray) – the states;
- actions (np.ndarray) – the actions;
- gamma (float) – the discount factor;
- max_k (int, 50) – maximum depth to consider.
Returns: The Q-value for each
state
-action
tuple.
LQR solver¶
-
mushroom_rl.solvers.lqr.
compute_lqr_feedback_gain
(lqr, max_iterations=100)[source]¶ Computes the optimal gain matrix K.
Parameters: - lqr (LQR) – LQR environment;
- max_iterations (int) – max iterations for convergence.
Returns: Feedback gain matrix K.
-
mushroom_rl.solvers.lqr.
compute_lqr_P
(lqr, K)[source]¶ Computes the P matrix for a given gain matrix K.
Parameters: - lqr (LQR) – LQR environment;
- K (np.ndarray) – controller matrix.
Returns: The P matrix of the value function.
-
mushroom_rl.solvers.lqr.
compute_lqr_V
(s, lqr, K)[source]¶ Computes the value function at a state s, with the given gain matrix K.
Parameters: - s (np.ndarray) – state;
- lqr (LQR) – LQR environment;
- K (np.ndarray) – controller matrix.
Returns: The value function at s
-
mushroom_rl.solvers.lqr.
compute_lqr_V_gaussian_policy
(s, lqr, K, Sigma)[source]¶ Computes the value function at a state s, with the given gain matrix K and covariance Sigma.
Parameters: - s (np.ndarray) – state;
- lqr (LQR) – LQR environment;
- K (np.ndarray) – controller matrix;
- Sigma (np.ndarray) – covariance matrix.
Returns: The value function at s.
-
mushroom_rl.solvers.lqr.
compute_lqr_Q
(s, a, lqr, K)[source]¶ Computes the state-action value function Q at a state-action pair (s, a), with the given gain matrix K.
Parameters: - s (np.ndarray) – state;
- a (np.ndarray) – action;
- lqr (LQR) – LQR environment;
- K (np.ndarray) – controller matrix.
Returns: The Q function at s, a.
-
mushroom_rl.solvers.lqr.
compute_lqr_Q_gaussian_policy
(s, a, lqr, K, Sigma)[source]¶ Computes the state-action value function Q at a state-action pair (s, a), with the given gain matrix K and covariance Sigma.
Parameters: - s (np.ndarray) – state;
- a (np.ndarray) – action;
- lqr (LQR) – LQR environment;
- K (np.ndarray) – controller matrix;
- Sigma (np.ndarray) – covariance matrix.
Returns: The Q function at (s, a).
-
mushroom_rl.solvers.lqr.
compute_lqr_V_gaussian_policy_gradient_K
(s, lqr, K, Sigma)[source]¶ Computes the gradient of the objective function J (equal to the value function V) at state s, w.r.t. the controller matrix K, with the current policy parameters K and Sigma. J(s, K, Sigma) = ValueFunction(s, K, Sigma).
Parameters: - s (np.ndarray) – state;
- lqr (LQR) – LQR environment;
- K (np.ndarray) – controller matrix;
- Sigma (np.ndarray) – covariance matrix.
Returns: The gradient of J wrt to K.
-
mushroom_rl.solvers.lqr.
compute_lqr_Q_gaussian_policy_gradient_K
(s, a, lqr, K, Sigma)[source]¶ Computes the gradient of the state-action Value function at state-action pair (s, a), w.r.t. the controller matrix K, with the current policy parameters K and Sigma.
Parameters: - s (np.ndarray) – state;
- a (np.ndarray) – action;
- lqr (LQR) – LQR environment;
- K (np.ndarray) – controller matrix;
- Sigma (np.ndarray) – covariance matrix.
Returns: The gradient of Q wrt to K.