Solvers¶

Dynamic programming¶

mushroom_rl.solvers.dynamic_programming.value_iteration(prob, reward, gamma, eps)[source]¶

Value iteration algorithm to solve a dynamic programming problem.

Parameters:	prob (np.ndarray) – transition probability matrix; reward (np.ndarray) – reward matrix; gamma (float) – discount factor; eps (float) – accuracy threshold.
Returns:	The optimal value of each state.

mushroom_rl.solvers.dynamic_programming.policy_iteration(prob, reward, gamma)[source]¶

Policy iteration algorithm to solve a dynamic programming problem.

Parameters:	prob (np.ndarray) – transition probability matrix; reward (np.ndarray) – reward matrix; gamma (float) – discount factor.
Returns:	The optimal value of each state and the optimal policy.

mushroom_rl.solvers.car_on_hill.step(mdp, state, action)[source]¶

Perform a step in the tree.

Parameters:	mdp (CarOnHill) – the Car-On-Hill environment; state (np.array) – the state; action (np.array) – the action.
Returns:	The resulting transition executing `action` in `state`.

mushroom_rl.solvers.car_on_hill.bfs(mdp, frontier, k, max_k)[source]¶

Perform Breadth-First tree search.

Parameters:	mdp (CarOnHill) – the Car-On-Hill environment; frontier (list) – the state at the frontier of the BFS; k (int) – the current depth of the tree; max_k (int) – maximum depth to consider.
Returns:	A tuple containing a flag for the algorithm ending, and the updated depth of the tree.

mushroom_rl.solvers.car_on_hill.solve_car_on_hill(mdp, states, actions, gamma, max_k=50)[source]¶

Solver of the Car-On-Hill environment.

Parameters:	mdp (CarOnHill) – the Car-On-Hill environment; states (np.ndarray) – the states; actions (np.ndarray) – the actions; gamma (float) – the discount factor; max_k (int, 50) – maximum depth to consider.
Returns:	The Q-value for each `state`-`action` tuple.