Solvers

Dynamic programming

mushroom.solvers.dynamic_programming.value_iteration(prob, reward, gamma, eps)[source]

Value iteration algorithm to solve a dynamic programming problem.

Parameters:
  • prob (np.ndarray) – transition probability matrix;
  • reward (np.ndarray) – reward matrix;
  • gamma (float) – discount factor;
  • eps (float) – accuracy threshold.
Returns:

The optimal value of each state.

mushroom.solvers.dynamic_programming.policy_iteration(prob, reward, gamma)[source]

Policy iteration algorithm to solve a dynamic programming problem.

Parameters:
  • prob (np.ndarray) – transition probability matrix;
  • reward (np.ndarray) – reward matrix;
  • gamma (float) – discount factor.
Returns:

The optimal value of each state and the optimal policy.