Model-Based and Model-Free
When studying reinforcement learning algorithms, the
terms model-based and model-free are frequently encountered. Simply put,
model-based implies having full knowledge of the environment. The environment
refers to all the surrounding states in which the MDP operates. The environment
generally consists of the state, state transition probability, reward, action,
and discount factor. If everything is illustrated in a diagram like the
examples examined earlier, it can be considered model-based. Conversely, an environment
represented as a black box, where new states and rewards are returned depending
on the input action and state, is model-free.
The major difference between model-based and
model-free in reinforcement learning is the ability to predict the next state.
In the example above, each current state is connected by arrows to possible
future states in the next time step, allowing for straightforward verification
of the next state without complex algorithms. This is model-based reinforcement
learning. In a model-free scenario, the agent cannot predict the state it can
move to in the next time step, necessitating complex algorithms to determine
it. Most problems that reinforcement learning aims to solve exist in model-free
environments.