Linear Regression Analysis

To understand the concept of machine learning, let's examine simple one-dimensional linear regression analysis. Linear regression analysis is a type of supervised learning used to create a predictive model that can forecast outcomes for unknown data.

Linear Regression Analysis

In the figure, the training data consists of two-dimensional (x, y) data. The goal is to create a model that can predict the value of y when an x value that isn’t in the training data is given. While the data doesn’t follow a perfect one-dimensional linear distribution, identifying a linear equation that approximates the data can provide an estimated value for y, even with some error.

Cost Function

As shown in the figure, the model takes the form of a linear equation, requiring only the values of the coefficient W and the intercept b. In machine learning, W is called the weight, and b is called the bias. The goal of one-dimensional linear regression analysis is to find the values of W and b that best describe the data.

Let's see how to find W and b. Start by assigning arbitrary values to W and b. For example, if W=1 and b=3, using the known data point (x, y) = (10, 6) in the model yields y=13. The calculated value of 13 differs from the known value of 6 by approximately -7. This difference between the known and calculated values is called the cost or loss.

To find the values of W and b that minimize the difference between actual and calculated values, we define a loss function (or cost function). Since the sign of the difference (positive or negative) doesn’t matter, we square it.

The remaining question is how to adjust W and b. This is where gradient descent comes in.

Gradient Descent

By taking the partial derivative of the loss function defined above and adjusting W and b in the direction that reduces the gradient, we can eventually reach the point of minimum error. This algorithm is called gradient descent. This book doesn’t delve deeply into the specifics of gradient descent, but it is essential to know that it minimizes the error function.

If gradient descent finds that the optimal values of W and b are (0.15,5) then the model becomes Y=0.15×X+5Y = 0.15. Now, the model can calculate the value of y for unknown x values.




Post a Comment

Previous Post Next Post