A Markov Chain describes the changes in the state
of a system overtime, which adheres to the Markov property. In other
words, when the future state is conditioned only on the current state,
independent of the past, this is known as the Markov Chain. If the state space
is discrete, it’s called a Markov Chain, while a continuous state space is
called a Markov Process.
A Quick Note: Discrete and Continuous
The term "discrete" means separated or distinct. In contrast, "continuous" implies being connected or uninterrupted. Thinking in terms of numbers makes this easier to understand. For example, integers like 1, 2, and 3 are distinct and separate, so they are considered discrete. On the other hand, real numbers like 1.000000001 and 1.00000000002 are difficult to distinguish precisely and are connected, so they are considered continuous.
A Markov Chain consists of two elements: a set
of states (S) and a state transition matrix (P). The state transition matrix is
a matrix that organizes the probabilities of each state transition.
Let’s use a weather prediction system as an example.
Although simplified, suppose statistical analysis of past data has determined
the probability of the next day’s weather based on today’s weather.
State Transition Matrix
Assuming two weather states, clear and rainy, there
are four conditional probabilities: (1) Clear → Clear, (2) Clear → Rainy, (3) Rainy → Clear, and (4) Rainy → Rainy. For example, if today is clear, the
probability of another clear day tomorrow is 0.6 (based on historical
analysis). The probabilities for the other transitions are 0.4, 0.7, and 0.3,
respectively. This can be represented in a matrix, known as a state transition
matrix.
Using this state transition matrix, we can predict the
weather three days from now, assuming it is clear today. Since the Markov state
ignores historical data, we only consider the conditional probabilities for
future events. To predict the weather three days from now, we multiply the
state transition matrix by itself three times. Simple matrix multiplication
allows us to make a plausible future prediction.
Weather Prediction for 3 Days Later
The state transition matrix after three days is as
follows: 0.2424, 0.4984, 0.2532, and 0.5362. Here, the probability of clear weather
in three days, given that it’s clear today, is 0.2424, while the probability of
rain is 0.4984. Therefore, if it’s clear today, there is a higher chance of
rain three days from now.
Markov Chains are used to represent the state changes
of a system over time and consist of a set of states and a state transition
matrix. In the weather prediction example, we expressed conditional
probabilities in a matrix form. Now, let’s look at a system (or environment)
represented in a network form to explore the Markov Chain in more depth.
Various Representations of a Markov Chain
In the figure, you can see a system where a path from
the starting point (S) to the destination (F) passes through routers (R). The
arrows indicate possible directions, and each arrow has an associated
probability. The probability of moving from S to R1 is 0.4, and the probability
of moving to R2 is 0.6. All the options from S add up to 1.
The Markov Chain can be represented as a network
(left) or in matrix form (right), depending on the problem being solved.
Suppose we move one step per time unit and want to determine the probability of
reaching the destination (F) in exactly three time units from the start (S).
Moving one step per time unit is also called a timestep.
Probability Calculation in a Markov Chain
Let’s identify all possible routes from the starting
point to the destination in exactly three timesteps (excluding arrival in two
timesteps). There are three possible routes: (S, R1, R3, F), (S, R1, R2, F),
and (S, R2, R3, F). This series of consecutive state changes is called an
"episode." In this example, there are three types of episodes leading
to the destination in three timesteps. The probabilities for each route are
multiplied due to the conditional probabilities involved, and summing these
values gives the probability of reaching the destination in three timesteps
(0.424). The purpose of using a Markov Chain is to determine the probability of
an event.
Environment and Episode in
Reinforcement Learning
Markov Chains are widely applied in practice. They are
notably popular in baseball statistics, as illustrated in the 2011 movie Moneyball,
where the protagonist uses Markov Chains to predict baseball outcomes. By
analyzing past baseball data, average scoring probabilities for each player
were obtained, and models were created to estimate expected scores for the next
game, helping decide which players to field. The Markov Chain is a simple
concept, yet its application in baseball opened an era of scientific analysis
in sports.
The purpose of using Markov Chain theory is to
calculate the probability of an event occurring. This event could be the
batting average of the fourth hitter three days from now or the projected sales
of a department store three years later. Based on the calculated probabilities,
teams can select players for upcoming games or strategize department store
sales.