Part 2 · Chapter 07

KNN: The Neighborhood Watch

No training, just vibes. You are the company you keep.

Alright, let's talk about the laziest, most intuitive, and possibly most relatable algorithm in all of machine learning: K-Nearest Neighbors (KNN).

If Decision Trees are your judgmental auntie, KNN is your chill, go-with-the-flow friend who makes decisions based purely on peer pressure. It has no grand theory, no complex model, no training phase. It operates on a single, simple principle: You are the company you keep.

That's it. To classify a new, mysterious data point, KNN just looks at its closest neighbors and makes it join their clique. It's classification by social circle.

The Philosophy: No Training, Just Vibes

Seriously. KNN is a "lazy learner". This is a technical term, not an insult (mostly). Unlike linear regression or decision trees, KNN doesn't have a distinct "training" phase where it learns parameters like weights or rules.

Training a Linear Regression model: Find the best m and b.
Training a Decision Tree: Find the best questions to ask.
"Training" a KNN model: Store the entire training dataset in memory.

...yep, that's the whole training process. It just memorizes every single data point. All the real work happens at prediction time, which is why it's called "lazy."

The Algorithm in 3 Simple Steps

Let's say we get a new data point and we want to classify it as a blue square or a red triangle.

Live · click the plot to move the probe

k = 3 · votes: A 3 · B 0 → prediction: A

The gray dashed lines are the 3 nearest neighbors. Their majority class is your answer.

Here's the entire KNN algorithm:

Calculate Distance: Measure the distance from our new point to every single other point in the dataset.
Find Neighbors: Find the k closest points. k is a number we choose.
Vote! Have the k neighbors vote on the class. Majority wins.

The choice of k is crucial. It's a hyperparameter we have to tune.

From-Scratch Implementation: Let's Get Building

We're going to build this lazy beast from scratch. The core components are a distance function and a prediction function.

1. Distance Metrics: How We Travel

The concept of "closest" depends on how you measure distance. The two most common ways are Euclidean and Manhattan distance.

Euclidean Distance: "As the crow flies." It's the straight-line distance between two points: √((x₂-x₁)² + (y₂-y₁)²). This is the default for most problems where features are continuous and comparable.
Manhattan Distance: "The Taxicab distance." Imagine you're in a city with a perfect grid layout. You can't cut through buildings; you have to travel along the blocks. The Manhattan distance is the sum of the absolute differences along each axis: |x₂-x₁| + |y₂-y₁|. Better for high-dimensional data or features on different scales.

distances.py

2. The predict() Function

This function will orchestrate the whole process.

knn_predict.py

And that's it! A fully functional KNN classifier. So simple, so elegant. What could possibly go wrong?

The Curse of Dimensionality: Lost in High-Dimensional Space

Here's KNN's Achilles' heel. The algorithm relies entirely on the idea of "distance." But what happens to distance when you have a lot of features (dimensions)?

Analogy: Finding Your Friend

1 Dimension: Your friend is somewhere on a 100-meter line. You can probably find them pretty quickly.
2 Dimensions: Your friend is somewhere on a 100m × 100m field. It's harder, but the concept of "nearby" still makes sense.
3 Dimensions: Your friend is in a 100m × 100m × 100m building. Now it's getting really tough.
1000 Dimensions: Your friend is in a 1000-dimensional hypercube. Good luck. You'll never see them again.

This is the Curse of Dimensionality. As you add more dimensions, the volume of the space increases exponentially. Your data points, which might have been dense in a few dimensions, become incredibly sparse. Everything becomes far away from everything else. The concept of a "nearest neighbor" becomes meaningless because even your closest neighbor might be astronomically far away.

This isn't just a problem for KNN; it's a fundamental challenge in all of machine learning. It's the reason why simply adding more features to your model often makes it worse, not better. It increases the risk of finding random, meaningless patterns (overfitting) and breaks algorithms that rely on distance. This is the primary motivation for an entire subfield of ML dedicated to dimensionality reduction—techniques that try to squash high-dimensional data down to a lower-dimensional space while preserving the important information.

So while KNN is simple and effective in low dimensions, it gets lost in space as your feature count grows.

Find the Right K

Let's explore the most important hyperparameter in KNN: k.

Grab a standard dataset like the Iris or Breast Cancer dataset from sklearn.datasets.

Split it into a training and testing set.

Using your from-scratch predict_knn function, write a loop that trains and evaluates the model for different values of k (from 1 to, say, 21, using only odd numbers to avoid ties).

For each k, calculate the accuracy on the test set.

Plot the accuracy as a function of k.

Analyze the plot:

What happens when k=1? Is the accuracy high or low? Why might this be a case of high variance (overfitting)?
What happens when k is very large (e.g., the size of the entire training set)? What does the model predict every time? Why is this a case of high bias (underfitting)?

You've just manually plotted your first bias-variance tradeoff curve, one of the most important concepts for debugging any ML model.

find_k.py