Notes on Linear Regression From Scratch

A simple model is still a full model: assumptions, parameters, loss, optimization, and evaluation.

Linear regression is a useful place to begin because the machinery is visible. A prediction is only a weighted sum, but the surrounding process is the same shape as much larger models.

The model

For one feature, the model can be written as y_hat = wx + b. The parameters are small enough to inspect directly, which makes every update feel less abstract.

The loss

Mean squared error gives the model a surface to move across. When predictions are far from targets, the loss grows quickly. That sharp penalty is not always ideal, but it is excellent for learning the mechanics.

The update

Gradient descent changes each parameter in the direction that reduces loss. The learning rate controls how bold that step is. Too small, and learning looks frozen. Too large, and the model overshoots the low point.

The lesson

The value of implementing linear regression from scratch is not speed. Libraries already solve it better. The value is seeing the skeleton that later models keep wearing under more impressive clothes.