Solvers
Dynamic programming
- value_iteration(prob, reward, gamma, eps)[source]
Value iteration algorithm to solve a dynamic programming problem.
- Parameters:
prob (np.ndarray) – transition probability matrix;
reward (np.ndarray) – reward matrix;
gamma (float) – discount factor;
eps (float) – accuracy threshold.
- Returns:
The optimal value of each state.
- policy_iteration(prob, reward, gamma)[source]
Policy iteration algorithm to solve a dynamic programming problem.
- Parameters:
prob (np.ndarray) – transition probability matrix;
reward (np.ndarray) – reward matrix;
gamma (float) – discount factor.
- Returns:
The optimal value of each state and the optimal policy.
Car-On-Hill brute-force solver
- step(mdp, state, action)[source]
Perform a step in the tree.
- Parameters:
mdp (CarOnHill) – the Car-On-Hill environment;
state (np.array) – the state;
action (np.array) – the action.
- Returns:
The resulting transition executing
action
instate
.
- bfs(mdp, frontier, k, max_k)[source]
Perform Breadth-First tree search.
- Parameters:
mdp (CarOnHill) – the Car-On-Hill environment;
frontier (list) – the state at the frontier of the BFS;
k (int) – the current depth of the tree;
max_k (int) – maximum depth to consider.
- Returns:
A tuple containing a flag for the algorithm ending, and the updated depth of the tree.
- solve_car_on_hill(mdp, states, actions, gamma, max_k=50)[source]
Solver of the Car-On-Hill environment.
- Parameters:
mdp (CarOnHill) – the Car-On-Hill environment;
states (np.ndarray) – the states;
actions (np.ndarray) – the actions;
gamma (float) – the discount factor;
max_k (int, 50) – maximum depth to consider.
- Returns:
The Q-value for each
state
-action
tuple.
LQR solver
- compute_lqr_feedback_gain(lqr, max_iterations=100)[source]
Computes the optimal gain matrix K.
- Parameters:
lqr (LQR) – LQR environment;
max_iterations (int) – max iterations for convergence.
- Returns:
Feedback gain matrix K.
- compute_lqr_P(lqr, K)[source]
Computes the P matrix for a given gain matrix K.
- Parameters:
lqr (LQR) – LQR environment;
K (np.ndarray) – controller matrix.
- Returns:
The P matrix of the value function.
- compute_lqr_V(s, lqr, K)[source]
Computes the value function at a state s, with the given gain matrix K.
- Parameters:
s (np.ndarray) – state;
lqr (LQR) – LQR environment;
K (np.ndarray) – controller matrix.
- Returns:
The value function at s
- compute_lqr_V_gaussian_policy(s, lqr, K, Sigma)[source]
Computes the value function at a state s, with the given gain matrix K and covariance Sigma.
- Parameters:
s (np.ndarray) – state;
lqr (LQR) – LQR environment;
K (np.ndarray) – controller matrix;
Sigma (np.ndarray) – covariance matrix.
- Returns:
The value function at s.
- compute_lqr_Q(s, a, lqr, K)[source]
Computes the state-action value function Q at a state-action pair (s, a), with the given gain matrix K.
- Parameters:
s (np.ndarray) – state;
a (np.ndarray) – action;
lqr (LQR) – LQR environment;
K (np.ndarray) – controller matrix.
- Returns:
The Q function at s, a.
- compute_lqr_Q_gaussian_policy(s, a, lqr, K, Sigma)[source]
Computes the state-action value function Q at a state-action pair (s, a), with the given gain matrix K and covariance Sigma.
- Parameters:
s (np.ndarray) – state;
a (np.ndarray) – action;
lqr (LQR) – LQR environment;
K (np.ndarray) – controller matrix;
Sigma (np.ndarray) – covariance matrix.
- Returns:
The Q function at (s, a).
- compute_lqr_V_gaussian_policy_gradient_K(s, lqr, K, Sigma)[source]
Computes the gradient of the objective function J (equal to the value function V) at state s, w.r.t. the controller matrix K, with the current policy parameters K and Sigma. J(s, K, Sigma) = ValueFunction(s, K, Sigma).
- Parameters:
s (np.ndarray) – state;
lqr (LQR) – LQR environment;
K (np.ndarray) – controller matrix;
Sigma (np.ndarray) – covariance matrix.
- Returns:
The gradient of J wrt to K.
- compute_lqr_Q_gaussian_policy_gradient_K(s, a, lqr, K, Sigma)[source]
Computes the gradient of the state-action Value function at state-action pair (s, a), w.r.t. the controller matrix K, with the current policy parameters K and Sigma.
- Parameters:
s (np.ndarray) – state;
a (np.ndarray) – action;
lqr (LQR) – LQR environment;
K (np.ndarray) – controller matrix;
Sigma (np.ndarray) – covariance matrix.
- Returns:
The gradient of Q wrt to K.