Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

A Simple Guide to How SVMs Uses Data to Make Predictions

Understanding DBSCAN

It’s an unsupervised learning algorithm used to find clusters in data—but instead of grouping based on distance or shape (like k-means), it groups based on density.

That means DBSCAN looks for areas where points are closely packed together, and treats areas with fewer points as noise or outliers. This makes it especially good for:

Odd-shaped clusters

Data with noise

Situations where you don’t know how many clusters there should be

DBSCAN groups points based on how many neighbors they have
within a certain radius. It classifies points into three types:

  • Core Points:

    • A point with enough neighbors nearby

      (at least MinPts within radius ε).

      These are the “center” of clusters.

  • Border Points:

    • Not enough neighbors to be core points,

      but close to a core point.

  • Noise Points (Outliers):

    • Not a core point, and not close to any

      core point.

  1. ε (epsilon):
    The maximum distance to consider a neighbor. This defines how close points need to be to be considered part of the same cluster.

  2. Minimum Points:
    The minimum number of neighbors a point must have (within ε) to be considered a core point.

A common starting rule: Minimum Points ≈ 2 × number of features

Key Parameters

Source: Daily Dose of Data Science

Step-by-Step: How DBSCAN Works

  1. Start with any point that hasn’t been visited.

  2. Find all points within ε of it.

    • If there are fewer than the number of minimum points, label it as noise (for now).

    • If there are more than the number of minimum points, mark it as a core point, and start a new cluster.

  3. Add all reachable points to this cluster:

    • If any of those are also core points, expand the cluster by repeating step 2 for them.

  4. Move to the next unvisited point and repeat.

  5. After all points are visited, you’ll have:

    • A set of clusters of different shapes/sizes.

    • A set of noise points.