Unsupervised Machine Learning Algorithms

A concise overview of the key concepts, history, contributors, and examples of unsupervised learning.

Chapter 1: History

Unsupervised machine learning, a subfield of artificial intelligence, aims to discover patterns and structure in data without using labeled responses Labeled responses: Data points that have known outcomes or categories. For example, in a dataset of animal images, each image might be labeled as "cat" or "dog." Unsupervised learning does not use these labels. . Its origins trace back to the early work in statistics and pattern recognition Pattern recognition: The process of classifying data based on statistical information extracted from patterns in the data. It involves identifying regularities, trends, or structures within datasets. in the mid-20th century.

Chapter 2: Major Contributors

Chapter 3: Major Algorithms

  1. k-means Clustering
    Partitions data into k clusters by minimizing within-cluster variance.
  2. Hierarchical Clustering
    Builds a hierarchy of clusters using agglomerative or divisive approaches.
  3. Principal Component Analysis (PCA)
    Reduces data dimensionality by finding new axes (principal components) that retain most of the data variance.
  4. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
    Groups data points that are closely packed together, marking outliers as noise.
  5. Gaussian Mixture Models (GMM)
    Models data as a mixture of multiple Gaussian distributions.
  6. Autoencoders
    Neural networks that learn to encode data efficiently, mainly used for dimensionality reduction or feature learning.
  7. Visualizing Colors with Self-Organizing Maps (SOM)
    How Self-Organizing Maps (SOM) Work
    1. Initialize a 2D grid of "neurons" (each with a weight vector, e.g. RGB color).
    2. For each data point (e.g., a color), find the "Best Matching Unit" (BMU) — the neuron whose weight is closest to the data point.
    3. Update the BMU and its neighbors to move closer to the data point (using a learning rate and neighborhood function).
    4. Repeat for many iterations, gradually reducing the learning rate and neighborhood size.
    5. Result: Similar colors are mapped to nearby neurons on the grid, forming smooth color regions.
    Let's see a simple example:
    Step 1: Fake Color Dataset
    We'll use 25 random colors (as RGB triples) to mimic a palette:
    [ [237, 50, 55], [ 40, 230, 90], ... , [ 55, 65, 210] ]
    Step 2: Organize on a 2D Grid
    We'll lay out a 5×5 grid (25 neurons), and use a simple SOM algorithm so that each neuron learns to represent one of the input colors. Neighboring neurons on the grid will end up with similar colors.
    Step 3: Result
    The grid below shows how the SOM has organized the original colors into a smooth map—colors that are similar are close together on the grid!
    Each cell shows a learned color.
    Neighboring cells have similar colors, revealing natural groupings.

Chapter 4: Easy-to-Understand Examples

Clustering Customers: Understanding K-Means Algorithm
Suppose you have a list of customers and you want to group them based on their spending patterns. The k-means algorithm is a simple, popular way to do this.
How K-Means Works:
  1. Choose k: Decide how many clusters (k) you want to find.
  2. Initialize Centers: Place k cluster centers randomly on your data space.
  3. Assign Points: Assign each customer to the nearest cluster center.
  4. Update Centers: Move each center to the mean of all customers assigned to it.
  5. Repeat: Repeat steps 3–4 until clusters stop changing.
Result: Each customer belongs to a cluster with others who have similar spending patterns.
K-means finds structure in your data, even if you don't know the groups ahead of time.
K-Means Step-by-Step Simulation
Step through the algorithm below to see how k-means clustering forms groups from customer data!
Cluster 1   Cluster 2   Cluster 3   Cluster Center
Dimensionality Reduction for Visualization: Understanding PCA
How Principal Component Analysis (PCA) Works
  1. Standardize the data: Subtract the mean and divide by the standard deviation for each feature.
  2. Compute the covariance matrix: Measure how variables vary together.
  3. Calculate eigenvectors and eigenvalues: Find the directions (principal components) that capture the most variance.
  4. Select top components: Keep the components with the largest eigenvalues (e.g., the top 2).
  5. Project the data: Transform the original data onto the selected principal components (lower dimensions).
Let's see a simplified example:
Step 1: Create a Fake Dataset
Imagine you have 12 samples (customers), each with 100 features (spending in 100 product categories).
For demonstration, here are the first 3 samples (real PCA would use all 100!):
Sample 1: [2.1, -1.4, 0.5, ... , 1.2]
Sample 2: [2.0, -1.5, 0.8, ... , 1.0]
Sample 3: [1.9, -1.2, 0.3, ... , 0.9]
...
Each row is a customer, each column is a product.
Step 2: Standardize Each Feature
For each feature (column), subtract the mean and divide by the standard deviation.
X_std = (X - mean) / std
Step 3: Compute the Covariance Matrix
The covariance matrix shows how features vary together.
Cov(X) = (X_std)T × X_std / (n_samples - 1)
Step 4: Calculate Eigenvectors and Eigenvalues
Find the directions (principal components) that capture the most variance.
Suppose the top 2 eigenvectors are:
PC1: [0.09, -0.10, ..., 0.13]
PC2: [0.05, 0.14, ..., 0.07]
Step 5: Project the Data onto Principal Components
Multiply your standardized data by the principal component vectors:
X_pca = X_std × [PC1, PC2]
Now each customer is a point in 2D space!
Each dot is a customer projected from 100D to 2D using PCA.
PC1 and PC2 are new axes that capture the most variance.
Anomaly Detection: Understanding DBSCAN
How DBSCAN Works
  1. Choose parameters: Set eps (radius for neighborhood) and minPts (minimum neighbors to form a cluster).
  2. Classify points: For each point, count how many others are within eps distance.
    • If at least minPts neighbors: core point (part of a cluster)
    • If not a core, but reachable from a core: border point
    • If not reachable: noise (anomaly)
  3. Expand clusters: Connect core points and their neighbors to form clusters.
  4. Result: Points not assigned to any cluster are considered anomalies.
Let's see a step-by-step anomaly detection example:
Step 1: Fake Network Traffic Data
Each point is a network session with 2 features: duration (x) and bytes transferred (y).
[2.1, 8.2], [2.2, 7.9], [2.3, 8.4], ... (normal traffic)
[7.0, 2.2], [7.3, 2.1], ... (another pattern)
[5.5, 13.5] (anomaly)
Step 2: Choose Parameters
eps = 1.2    minPts = 3
Step 3: For Each Point, Find Neighbors
Count how many other points are within eps distance.
Example: Point [2.2, 7.9] has 3 close neighbors ⇒ core point.
[5.5, 13.5] has no close neighborsnoise (anomaly).
Step 4: Expand Clusters
Connect core and border points together. Any point not connected is an anomaly.
Step 5: Result
Points not assigned to any cluster are flagged as anomalies.
Cluster 1   Cluster 2   × Anomaly (noise)   eps neighborhood
Image Compression: Understanding Autoencoders
How Autoencoders Work
  1. Input Layer: Each pixel of the image is an input node.
  2. Encoder: Compress the input into a small set of numbers (the bottleneck).
  3. Decoder: Expand the bottleneck back into a full-size image.
  4. Training: Adjust the network so the output image is as close as possible to the input image.
  5. Compression: After training, the bottleneck representation is a compressed version of the original image.
Let's see a toy example of image compression:
Step 1: Fake Image Data
Suppose you have a 6×6 grayscale image (each value is a pixel brightness 0-1):
[[0, 0, 1, 1, 0, 0],
 [0, 1, 1, 1, 1, 0],
 [1, 1, 0.8, 0.8, 1, 1],
 [1, 1, 0.8, 0.8, 1, 1],
 [0, 1, 1, 1, 1, 0],
 [0, 0, 1, 1, 0, 0]]
Step 2: Encode (Compress)
How does the compression work?
The encoder learns to summarize the important structure of the image using only a few numbers.

In this example, we split the image into 4 quadrants (each 3×3 block):
  • Quadrant 1 (top-left): upper-left 3×3 pixels
  • Quadrant 2 (top-right): upper-right 3×3 pixels
  • Quadrant 3 (bottom-left): lower-left 3×3 pixels
  • Quadrant 4 (bottom-right): lower-right 3×3 pixels
For each quadrant, we average the pixel values to get a single number.
bottleneck[0] = average(Quadrant 1) = 0.64
bottleneck[1] = average(Quadrant 2) = 0.64
bottleneck[2] = average(Quadrant 3) = 0.64
bottleneck[3] = average(Quadrant 4) = 0.64
The bottleneck [0.64, 0.64, 0.64, 0.64] is the compressed representation.
In a real autoencoder, these would be learned values, not just averages—but the idea is the same: compress the information down to just the essentials.
Step 3: Decode (Reconstruct)
How does the reconstruction work?
The decoder takes the 4 bottleneck values and "spreads" each value back over its corresponding 3×3 quadrant. So, every pixel in each quadrant gets set to the same value as its bottleneck:
  • All pixels in Quadrant 1 become 0.64
  • All pixels in Quadrant 2 become 0.64
  • All pixels in Quadrant 3 become 0.64
  • All pixels in Quadrant 4 become 0.64
The result is a "blocky" version of the original image, but it still captures the main structure. In a real autoencoder, the decoder learns to reconstruct finer details from the compressed representation.
Left: Original image   Right: Reconstructed from compressed code
(Compression: 36 → 4 values!)
Visualizing Colors with Self-Organizing Maps:
Imagine you have a palette of hundreds of colors. You want to organize them so that similar colors are close together on a map. A Self-Organizing Map (SOM) can take all these colors and arrange them on a 2D grid, where neighboring cells represent similar colors. This helps you see color families and smooth gradients at a glance, making it easy to navigate and choose colors.

Why is this helpful? SOMs help reveal underlying patterns in high-dimensional data (like RGB color values) by organizing and visualizing them in an intuitive way.

Machine Learning Concepts Quiz

Test your knowledge of clustering, PCA, DBSCAN, autoencoders, self-organizing maps, and more!
20 questions · Immediate feedback after you submit.
Incorrect answers will be highlighted with explanations.