# Solvers¶

## Dynamic programming¶

mushroom_rl.solvers.dynamic_programming.value_iteration(prob, reward, gamma, eps)[source]

Value iteration algorithm to solve a dynamic programming problem.

Parameters: prob (np.ndarray) – transition probability matrix; reward (np.ndarray) – reward matrix; gamma (float) – discount factor; eps (float) – accuracy threshold. The optimal value of each state.
mushroom_rl.solvers.dynamic_programming.policy_iteration(prob, reward, gamma)[source]

Policy iteration algorithm to solve a dynamic programming problem.

Parameters: prob (np.ndarray) – transition probability matrix; reward (np.ndarray) – reward matrix; gamma (float) – discount factor. The optimal value of each state and the optimal policy.

## Car-On-Hill brute-force solver¶

mushroom_rl.solvers.car_on_hill.step(mdp, state, action)[source]

Perform a step in the tree.

Parameters: mdp (CarOnHill) – the Car-On-Hill environment; state (np.array) – the state; action (np.array) – the action. The resulting transition executing action in state.
mushroom_rl.solvers.car_on_hill.bfs(mdp, frontier, k, max_k)[source]

Parameters: mdp (CarOnHill) – the Car-On-Hill environment; frontier (list) – the state at the frontier of the BFS; k (int) – the current depth of the tree; max_k (int) – maximum depth to consider. A tuple containing a flag for the algorithm ending, and the updated depth of the tree.
mushroom_rl.solvers.car_on_hill.solve_car_on_hill(mdp, states, actions, gamma, max_k=50)[source]

Solver of the Car-On-Hill environment.

Parameters: mdp (CarOnHill) – the Car-On-Hill environment; states (np.ndarray) – the states; actions (np.ndarray) – the actions; gamma (float) – the discount factor; max_k (int, 50) – maximum depth to consider. The Q-value for each state-action tuple.

## LQR solver¶

mushroom_rl.solvers.lqr.compute_lqr_feedback_gain(lqr, max_iterations=100)[source]

Computes the optimal gain matrix K.

Parameters: lqr (LQR) – LQR environment; max_iterations (int) – max iterations for convergence. Feedback gain matrix K.
mushroom_rl.solvers.lqr.compute_lqr_P(lqr, K)[source]

Computes the P matrix for a given gain matrix K.

Parameters: lqr (LQR) – LQR environment; K (np.ndarray) – controller matrix. The P matrix of the value function.
mushroom_rl.solvers.lqr.compute_lqr_V(s, lqr, K)[source]

Computes the value function at a state s, with the given gain matrix K.

Parameters: s (np.ndarray) – state; lqr (LQR) – LQR environment; K (np.ndarray) – controller matrix. The value function at s
mushroom_rl.solvers.lqr.compute_lqr_V_gaussian_policy(s, lqr, K, Sigma)[source]

Computes the value function at a state s, with the given gain matrix K and covariance Sigma.

Parameters: s (np.ndarray) – state; lqr (LQR) – LQR environment; K (np.ndarray) – controller matrix; Sigma (np.ndarray) – covariance matrix. The value function at s.
mushroom_rl.solvers.lqr.compute_lqr_Q(s, a, lqr, K)[source]

Computes the state-action value function Q at a state-action pair (s, a), with the given gain matrix K.

Parameters: s (np.ndarray) – state; a (np.ndarray) – action; lqr (LQR) – LQR environment; K (np.ndarray) – controller matrix. The Q function at s, a.
mushroom_rl.solvers.lqr.compute_lqr_Q_gaussian_policy(s, a, lqr, K, Sigma)[source]

Computes the state-action value function Q at a state-action pair (s, a), with the given gain matrix K and covariance Sigma.

Parameters: s (np.ndarray) – state; a (np.ndarray) – action; lqr (LQR) – LQR environment; K (np.ndarray) – controller matrix; Sigma (np.ndarray) – covariance matrix. The Q function at (s, a).
mushroom_rl.solvers.lqr.compute_lqr_V_gaussian_policy_gradient_K(s, lqr, K, Sigma)[source]

Computes the gradient of the objective function J (equal to the value function V) at state s, w.r.t. the controller matrix K, with the current policy parameters K and Sigma. J(s, K, Sigma) = ValueFunction(s, K, Sigma).

Parameters: s (np.ndarray) – state; lqr (LQR) – LQR environment; K (np.ndarray) – controller matrix; Sigma (np.ndarray) – covariance matrix. The gradient of J wrt to K.
mushroom_rl.solvers.lqr.compute_lqr_Q_gaussian_policy_gradient_K(s, a, lqr, K, Sigma)[source]

Computes the gradient of the state-action Value function at state-action pair (s, a), w.r.t. the controller matrix K, with the current policy parameters K and Sigma.

Parameters: s (np.ndarray) – state; a (np.ndarray) – action; lqr (LQR) – LQR environment; K (np.ndarray) – controller matrix; Sigma (np.ndarray) – covariance matrix. The gradient of Q wrt to K.