K-Nearest Neighbors (KNN)

A machine learning algorithm that makes predictions by looking at the most similar past examples and choosing the majority outcome.

Understanding KNN

k-Nearest Neighbors is one of the simplest machine learning algorithms, and it works by comparing new data points to ones it has already seen.

The core idea behind k-Nearest Neighbors is that things that are alike tend to have similar characteristics.

If you want to make a prediction about something new — like whether someone will like a movie — you find other people in your data who are most similar to them. Then you look at what those similar people did and assume the new person will probably do the same.

In kNN, “similarity” is usually measured by how close two data points are based on their features. For example:

  • Age, income, or interests for predicting behavior

  • Pixel values in images for image classification

  • Product ratings in recommendation systems

The “k” in kNN is just a number you choose that tells the algorithm how many neighbors to consider. For example, if k = 5, it looks at the five most similar examples in the dataset. If most of those five examples belong to a certain category, the new point is predicted to belong to that category too.

Example: KNN Predicting Movie Preference

To see how k-Nearest Neighbors works in practice, let’s look at a simple example where we want to predict whether someone will like an action movie based on what similar people liked.

You have a small dataset of people. For each person, you know:

  • Their age

  • Whether they like action movies (yes or no)

Now, someone new comes in. You know their age, but you don’t know if they like action movies. Your goal is to predict whether this new person likes action movies.

Now a new person comes in — they’re 25 years old, but we don’t know if they like action movies.

Let’s use k=3, that means we’ll look at the 3 people closest in age to this new person:

  • Person E (24 years old) → distance = 1

  • Person B (22 years old) → distance = 3

  • Person C (27 years old) → distance = 2

So the three nearest neighbors are:

  • E (Yes), C (No), and B (Yes)

Since the majority says yes, we can predict that the new person likes action movies.