Unsupervised Learning

A overview of what unsupervised learning is, the main types of tasks it solves, and the most common models used to uncover hidden patterns in data.

What is Unsupervised Learning?

Unsupervised learning is a type of machine learning where the model finds patterns in data without labeled outcomes. Unlike supervised learning, there’s no “correct answer” to guide the training—just raw input data.

The goal of unsupervised learning is to find hidden patterns in data. For example:

Unsupervised learning might group similar things together into clusters—like putting customers with similar shopping habits into the same group.
Or it might simplify the data by reducing how many features it looks at, while still keeping the most important information. This helps us better understand and work with complex datasets.

These insights often take the form of clusters, associations, or reduced-dimensional representations that simplify and organize complex data.

Core Supervised Learning Tasks

Clustering
- Clustering groups similar data points into clusters based on shared features.
- It’s used to find natural groupings in datasets—such as customer segments, product types, or behavior patterns—without knowing the correct labels ahead of time.
Association Rules
- Association learning discovers interesting relationships between variables in large datasets. It’s widely used in market basket analysis to understand which items are often bought together—powering features like:
  1. Amazon’s “Customers Also Bought”
  2. Spotify’s “Discover Weekly”
Dimensionality Reduction
- High-dimensional data (lots of features) can be hard to visualize and may lead to overfitting.
- Dimensionality reduction techniques simplify this by projecting the data into fewer dimensions while preserving important patterns.
  - This is often a preprocessing step before modeling or visualization.

Types of Supervised Learning Models

Learn how models use labeled data to make accurate predictions in various applications.

K-Means Clustering

A classification algorithm that estimates the probability a data point belongs to a class.

Hierarchical Clustering

A regression algorithm that predicts a continuous value by fitting the best-fit line through the data.

A predictive model that repeatedly splits data by feature rules, forming branches that lead to a final decision.

DBSCAN

Principal Component Analysis