Markov Decision Process (MDP) Concept
The MDP is an extension of the Markov Reward Process (MRP) with added Action (A) and Policy (π). While the goal of MRP is to calculate the overall value of an episode or environment, MDP aims to determine a policy that maximizes the value of the env…