Solvers

Dynamic programming

mushroom.solvers.dynamic_programming.value_iteration(prob, reward, gamma, eps)[source]

Value iteration algorithm to solve a dynamic programming problem.

Parameters:
  • prob (np.ndarray) – transition probability matrix;
  • reward (np.ndarray) – reward matrix;
  • gamma (float) – discount factor;
  • eps (float) – accuracy threshold.
Returns:

The optimal value of each state.

mushroom.solvers.dynamic_programming.policy_iteration(prob, reward, gamma)[source]

Policy iteration algorithm to solve a dynamic programming problem.

Parameters:
  • prob (np.ndarray) – transition probability matrix;
  • reward (np.ndarray) – reward matrix;
  • gamma (float) – discount factor.
Returns:

The optimal value of each state and the optimal policy.

Car-On-Hill brute-force solver

mushroom.solvers.car_on_hill.step(mdp, state, action)[source]

Perform a step in the tree.

Parameters:
  • mdp (CarOnHill) – the Car-On-Hill environment;
  • state (np.array) – the state;
  • action (np.array) – the action.
Returns:

The resulting transition executing action in state.

mushroom.solvers.car_on_hill.bfs(mdp, frontier, k, max_k)[source]

Perform Breadth-First tree search.

Parameters:
  • mdp (CarOnHill) – the Car-On-Hill environment;
  • frontier (list) – the state at the frontier of the BFS;
  • k (int) – the current depth of the tree;
  • max_k (int) – maximum depth to consider.
Returns:

A tuple containing a flag for the algorithm ending, and the updated depth of the tree.

mushroom.solvers.car_on_hill.solve_car_on_hill(mdp, states, actions, gamma, max_k=50)[source]

Solver of the Car-On-Hill environment.

Parameters:
  • mdp (CarOnHill) – the Car-On-Hill environment;
  • states (np.ndarray) – the states;
  • actions (np.ndarray) – the actions;
  • gamma (float) – the discount factor;
  • max_k (int, 50) – maximum depth to consider.
Returns:

The Q-value for each state-action tuple.