Markov Chain

A Markov Chain describes the changes in the state of a system overtime, which adheres to the Markov property. In other words, when the future state is conditioned only on the current state, independent of the past, this is known as the Markov Chain. If the state space is discrete, it’s called a Markov Chain, while a continuous state space is called a Markov Process.

A Quick Note: Discrete and Continuous
The term "discrete" means separated or distinct. In contrast, "continuous" implies being connected or uninterrupted. Thinking in terms of numbers makes this easier to understand. For example, integers like 1, 2, and 3 are distinct and separate, so they are considered discrete. On the other hand, real numbers like 1.000000001 and 1.00000000002 are difficult to distinguish precisely and are connected, so they are considered continuous.

A Markov Chain consists of two elements: a set of states (S) and a state transition matrix (P). The state transition matrix is a matrix that organizes the probabilities of each state transition.

Components of a Markov Chain

Let’s use a weather prediction system as an example. Although simplified, suppose statistical analysis of past data has determined the probability of the next day’s weather based on today’s weather.

State Transition Matrix

Assuming two weather states, clear and rainy, there are four conditional probabilities: (1) Clear → Clear, (2) Clear → Rainy, (3) Rainy → Clear, and (4) Rainy → Rainy. For example, if today is clear, the probability of another clear day tomorrow is 0.6 (based on historical analysis). The probabilities for the other transitions are 0.4, 0.7, and 0.3, respectively. This can be represented in a matrix, known as a state transition matrix.

Using this state transition matrix, we can predict the weather three days from now, assuming it is clear today. Since the Markov state ignores historical data, we only consider the conditional probabilities for future events. To predict the weather three days from now, we multiply the state transition matrix by itself three times. Simple matrix multiplication allows us to make a plausible future prediction.

Weather Prediction for 3 Days Later

The state transition matrix after three days is as follows: 0.2424, 0.4984, 0.2532, and 0.5362. Here, the probability of clear weather in three days, given that it’s clear today, is 0.2424, while the probability of rain is 0.4984. Therefore, if it’s clear today, there is a higher chance of rain three days from now.

Markov Chains are used to represent the state changes of a system over time and consist of a set of states and a state transition matrix. In the weather prediction example, we expressed conditional probabilities in a matrix form. Now, let’s look at a system (or environment) represented in a network form to explore the Markov Chain in more depth.

Various Representations of a Markov Chain

In the figure, you can see a system where a path from the starting point (S) to the destination (F) passes through routers (R). The arrows indicate possible directions, and each arrow has an associated probability. The probability of moving from S to R1 is 0.4, and the probability of moving to R2 is 0.6. All the options from S add up to 1.

The Markov Chain can be represented as a network (left) or in matrix form (right), depending on the problem being solved. Suppose we move one step per time unit and want to determine the probability of reaching the destination (F) in exactly three time units from the start (S). Moving one step per time unit is also called a timestep.

Probability Calculation in a Markov Chain

Let’s identify all possible routes from the starting point to the destination in exactly three timesteps (excluding arrival in two timesteps). There are three possible routes: (S, R1, R3, F), (S, R1, R2, F), and (S, R2, R3, F). This series of consecutive state changes is called an "episode." In this example, there are three types of episodes leading to the destination in three timesteps. The probabilities for each route are multiplied due to the conditional probabilities involved, and summing these values gives the probability of reaching the destination in three timesteps (0.424). The purpose of using a Markov Chain is to determine the probability of an event.

Environment and Episode in Reinforcement Learning

Markov Chains are widely applied in practice. They are notably popular in baseball statistics, as illustrated in the 2011 movie Moneyball, where the protagonist uses Markov Chains to predict baseball outcomes. By analyzing past baseball data, average scoring probabilities for each player were obtained, and models were created to estimate expected scores for the next game, helping decide which players to field. The Markov Chain is a simple concept, yet its application in baseball opened an era of scientific analysis in sports.

The purpose of using Markov Chain theory is to calculate the probability of an event occurring. This event could be the batting average of the fourth hitter three days from now or the projected sales of a department store three years later. Based on the calculated probabilities, teams can select players for upcoming games or strategize department store sales.

Markov Chain

Post a Comment