Chapter 1: Early History
The roots of Convolutional Neural Networks (CNNs) trace back to neuroscience and early computer science. The concept was inspired by discoveries about how animals process visual information, and by early attempts to create machines that can learn.
Frank Rosenblatt invents the perceptron, the first artificial neural network model with trainable weights, laying the groundwork for future neural nets.
David Hubel and Torsten Wiesel discover that neurons in the cat visual cortex respond to specific regions of the visual field—“receptive fields.” This idea of local processing is crucial for CNNs.
Kunihiko Fukushima proposes the Neocognitron, the first artificial vision system with convolution and pooling, inspired by biological vision.
Yann LeCun and collaborators develop the first practical Convolutional Neural Network for recognizing handwritten digits, using supervised learning.
LeCun's team publishes LeNet-5, a classic CNN used for automatic check and digit recognition.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton train a deep CNN on GPUs, dramatically winning the ImageNet competition and launching the deep learning revolution.
Chapter 2: Design of Convolutional Neural Networks
CNNs are designed to process data with a grid-like topology, such as images, by learning local patterns and combining them into complex features.
(28×28)
Filters
(10)
Core Building Blocks
- Input Layer: Receives raw image data (pixels).
- Convolutional Layers: Apply learned filters to extract local features (edges, curves).
- Pooling Layers: Downsample feature maps, making the network less sensitive to small shifts.
- Flatten Layer: Converts 2D feature maps into 1D vectors.
- Dense (Fully Connected) Layers: Combine features for prediction.
- Output Layer: Gives final prediction, often as probabilities per class.
Chapter 3: Example & How to Use a CNN to Study Images
Let’s see how to use a CNN to classify images of handwritten digits (MNIST dataset) in Python using TensorFlow/Keras.
- Import Libraries
import tensorflow as tf
from tensorflow.keras import layers, models
- Load and Prepare Data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train[..., None]/255.0
x_test = x_test[..., None]/255.0
- Build the CNN Model
model = models.Sequential([
layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
layers.MaxPooling2D((2,2)),
layers.Conv2D(64, (3,3), activation='relu'),
layers.MaxPooling2D((2,2)),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
- Train and Evaluate
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))
- Use the Model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Test accuracy: {test_acc:.3f}')
# Predict a digit:
import numpy as np
prediction = model.predict(x_test[0:1])
print("Predicted digit:", np.argmax(prediction))
The CNN automatically learns which patterns (edges, shapes) distinguish one class of image from another, making it ideal for visual recognition.
Chapter 4: Major Contributors
Created the first network with convolution and pooling, directly inspiring CNNs.
Developed LeNet and LeNet-5, showing CNNs can solve real-world problems like digit recognition.
Major advances in deep nets and optimization, key to modern deep learning (and CNNs).
Pushed backpropagation and deep learning; supervised breakthroughs like AlexNet.
Built and trained the breakthrough deep CNN that won ImageNet 2012 and sparked the deep learning boom.
Their work on receptive fields in vision laid the biological foundation for CNNs.
Chapter 5: References
-
Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.
Springer Link -
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
PDF -
Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012.
PDF - Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. [Free Online Book]
- Stanford CS231n: Convolutional Neural Networks for Visual Recognition
- TensorFlow: Convolutional Neural Network Tutorial