Policy Evaluation and Policy Control

Various terms are used in reinforcement learning. While they may seem straightforward once understood, they can hinder comprehension of other concepts if unknown. Let's look at a few key terms.

Policy Evaluation and Policy Control

In MDP, evaluating a policy is equivalent to finding the state-value function. The significance of the state-value function in MDP lies in calculating the total rewards when following a policy (π). Calculating the state-value function indicates the reward that reflects the policy’s effectiveness. Thus, the larger the state-value function value, the better the policy.

Policy control involves changing the policy. If evaluating the policy reveals that the set policy yields too little or too much reward, adjustments are necessary. Since the ultimate goal of an MDP is to find a policy that maximizes value, the optimal policy can be discovered through iterative policy control.

Policy evaluation and policy control work complementarily. The policy is evaluated to check its adequacy, and through policy control, it is updated to a new policy, which can then be evaluated for its effectiveness again.

How policy evaluation and policy control are specifically applied will be explored in dynamic programming later on.


Post a Comment

Previous Post Next Post