Part 2 · Chapter 04

DIY Linear Regression

Baby's first model. Predict stuff with a straight line, from scratch.

Alright, buckle up. The theory is over. The hand-holding is done. It's time to write some code and build our very first model from scratch. No sklearn, no Keras, no magic black boxes. We're opening up the machine and building the engine ourselves with nothing but Python and NumPy.

Why? Because once you've built a car engine with your own hands, you'll never be afraid to look under the hood again.

Our mission is to build a Linear Regression model. It's the "Hello, World!" of machine learning. The goal is simple: predict a continuous value (like a house price or an exam score) by fitting a straight line to the data.

The Goal: Predicting Stuff with a Straight Line

Let's imagine a simple problem. We want to predict a student's final exam score based on the number of hours they studied. We have some data:

Hours Studied (x)	Exam Score (y)
2	65
3	70
5	75
6	85
8	90

You can see a clear trend: more hours studied generally leads to a higher score. A straight line seems like a reasonable way to model this relationship. And what's the equation for a straight line? You know this from middle school.

y = m x + b

y: The value we want to predict (Exam Score).
x: Our input feature (Hours Studied).
m: The slope of the line. How much the score increases for each extra hour of study.
b: The y-intercept (or bias). The score someone would get with 0 hours of study (maybe they're just a genius).

In machine learning, we often call m the weight and b the bias. The "learning" part of linear regression is just finding the best possible values for m and b that make our line fit the data as closely as possible.

Code-First Implementation: Let's Get Our Hands Dirty

Let's fire up our editor and build this thing piece by piece.

Step 1: The predict() Function

First, we need a function that, given an input x and our line's parameters m and b, can predict what y should be. This is just the line equation.

step1_predict.py

Step 2: The loss() Function (Mean Squared Error)

Our model is terrible. But how terrible? We need to quantify the error. We'll use the most common loss function for regression: Mean Squared Error (MSE).

The logic is simple:

For each data point, calculate the difference between the true score and our predicted score. This is the error.
Square the error. This makes all errors positive and punishes big errors way more than small ones. An error of 4 becomes 16, while an error of 2 only becomes 4.
Calculate the average of all these squared errors.

MSE = \frac{1}{n} i = 1 \sum n (y_{true, i} - y_{pred, i})^{2}

step2_loss.py

Step 3: The update() Function (Gradient Descent)

This is the heart of the machine. How do we find better values for m and b? We use the "blindfolded hiker" method: Gradient Descent.

We need to calculate the gradient of our loss function. The gradient is just a vector of partial derivatives—one for m and one for b. These derivatives tell us the slope of the loss with respect to each parameter.

I'll spare you the full calculus derivation. The partial derivatives of our MSE loss function are:

Derivative w.r.t. m: $\frac{\partial L}{\partial m} = - 2 \cdot mean (X \cdot (y_{true} - y_{pred}))$
Derivative w.r.t. b: $\frac{\partial L}{\partial b} = - 2 \cdot mean (y_{true} - y_{pred})$

To descend the hill, we just need to take a small step in the opposite direction of the gradient.

step3_update.py

Putting It All Together: The Training Loop

Now we just need to put these pieces into a loop. We'll repeat the process of predicting, calculating loss, and updating our parameters for a set number of times (called epochs).

train.py

Visualizing the Heartbeat

The most satisfying part is watching the loss curve. It's like a heartbeat monitor for your model's learning process. Hit train and watch it descend in real time:

Live · gradient descent on hours studied → exam score

m (weight)

0.000

b (bias)

0.000

loss (MSE)

6015.000

learning rate0.0200

Try a tiny learning rate (0.0005) — it crawls. Bump it to 0.05 — it converges fast. Push it to 0.1 — it diverges. Same algorithm, different vibes.

This is it. This predict → loss → update loop is the fundamental engine of nearly all supervised learning, from this simple line-fitter to massive neural networks like GPT. The models get more complex, the predict function becomes a monstrous beast, and the update step uses more advanced calculus (hello, backpropagation), but the core logic you just built remains the same. You didn't just build a linear regression model. You built the blueprint.