Various terms are used in reinforcement learning.
While they may seem straightforward once understood, they can hinder
comprehension of other concepts if unknown. Let's look at a few key terms.
Policy Evaluation and
Policy Control
In MDP, evaluating a policy is equivalent to finding
the state-value function. The significance of the state-value function in MDP
lies in calculating the total rewards when following a policy (π). Calculating
the state-value function indicates the reward that reflects the policy’s
effectiveness. Thus, the larger the state-value function value, the better the
policy.
Policy control involves changing the policy. If
evaluating the policy reveals that the set policy yields too little or too much
reward, adjustments are necessary. Since the ultimate goal of an MDP is to find
a policy that maximizes value, the optimal policy can be discovered through
iterative policy control.
Policy evaluation and policy control work
complementarily. The policy is evaluated to check its adequacy, and through
policy control, it is updated to a new policy, which can then be evaluated for
its effectiveness again.
How policy evaluation and policy control are
specifically applied will be explored in dynamic programming later on.