Solvers¶
Dynamic programming¶
-
mushroom_rl.solvers.dynamic_programming.
value_iteration
(prob, reward, gamma, eps)[source]¶ Value iteration algorithm to solve a dynamic programming problem.
Parameters: - prob (np.ndarray) – transition probability matrix;
- reward (np.ndarray) – reward matrix;
- gamma (float) – discount factor;
- eps (float) – accuracy threshold.
Returns: The optimal value of each state.
-
mushroom_rl.solvers.dynamic_programming.
policy_iteration
(prob, reward, gamma)[source]¶ Policy iteration algorithm to solve a dynamic programming problem.
Parameters: - prob (np.ndarray) – transition probability matrix;
- reward (np.ndarray) – reward matrix;
- gamma (float) – discount factor.
Returns: The optimal value of each state and the optimal policy.
Car-On-Hill brute-force solver¶
-
mushroom_rl.solvers.car_on_hill.
step
(mdp, state, action)[source]¶ Perform a step in the tree.
Parameters: - mdp (CarOnHill) – the Car-On-Hill environment;
- state (np.array) – the state;
- action (np.array) – the action.
Returns: The resulting transition executing
action
instate
.
-
mushroom_rl.solvers.car_on_hill.
bfs
(mdp, frontier, k, max_k)[source]¶ Perform Breadth-First tree search.
Parameters: - mdp (CarOnHill) – the Car-On-Hill environment;
- frontier (list) – the state at the frontier of the BFS;
- k (int) – the current depth of the tree;
- max_k (int) – maximum depth to consider.
Returns: A tuple containing a flag for the algorithm ending, and the updated depth of the tree.
-
mushroom_rl.solvers.car_on_hill.
solve_car_on_hill
(mdp, states, actions, gamma, max_k=50)[source]¶ Solver of the Car-On-Hill environment.
Parameters: - mdp (CarOnHill) – the Car-On-Hill environment;
- states (np.ndarray) – the states;
- actions (np.ndarray) – the actions;
- gamma (float) – the discount factor;
- max_k (int, 50) – maximum depth to consider.
Returns: The Q-value for each
state
-action
tuple.