Convolutional Neural Networks: A Brief Book

Chapter 1: Early History

The roots of Convolutional Neural Networks (CNNs) trace back to neuroscience and early computer science. The concept was inspired by discoveries about how animals process visual information, and by early attempts to create machines that can learn.

1958 Perceptron
Frank Rosenblatt invents the perceptron, the first artificial neural network model with trainable weights, laying the groundwork for future neural nets.

1962 Biological Inspiration
David Hubel and Torsten Wiesel discover that neurons in the cat visual cortex respond to specific regions of the visual field—“receptive fields.” This idea of local processing is crucial for CNNs.

1980 Neocognitron
Kunihiko Fukushima proposes the Neocognitron, the first artificial vision system with convolution and pooling, inspired by biological vision.

1989 LeNet
Yann LeCun and collaborators develop the first practical Convolutional Neural Network for recognizing handwritten digits, using supervised learning.

1998 LeNet-5
LeCun's team publishes LeNet-5, a classic CNN used for automatic check and digit recognition.

2012 AlexNet
Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton train a deep CNN on GPUs, dramatically winning the ImageNet competition and launching the deep learning revolution.

Chapter 2: Design of Convolutional Neural Networks

CNNs are designed to process data with a grid-like topology, such as images, by learning local patterns and combining them into complex features.

Input
(28×28)

Conv Layer
Filters

Pooling

Conv Layer

Pooling

Flatten

Dense

Output
(10)

Core Building Blocks

Input Layer: Receives raw image data (pixels).
Convolutional Layers: Apply learned filters to extract local features (edges, curves).
Pooling Layers: Downsample feature maps, making the network less sensitive to small shifts.
Flatten Layer: Converts 2D feature maps into 1D vectors.
Dense (Fully Connected) Layers: Combine features for prediction.
Output Layer: Gives final prediction, often as probabilities per class.

Why this design? The combination of convolution and pooling layers enables CNNs to recognize complex objects by building up from simple shapes, while keeping parameters manageable.

Chapter 3: Example & How to Use a CNN to Study Images

Let’s see how to use a CNN to classify images of handwritten digits (MNIST dataset) in Python using TensorFlow/Keras.

Import Libraries

import tensorflow as tf
from tensorflow.keras import layers, models

Load and Prepare Data


(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train[..., None]/255.0
x_test = x_test[..., None]/255.0

Build the CNN Model


model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

Train and Evaluate


model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

Use the Model


test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Test accuracy: {test_acc:.3f}')
# Predict a digit:
import numpy as np
prediction = model.predict(x_test[0:1])
print("Predicted digit:", np.argmax(prediction))

How does it study images?
The CNN automatically learns which patterns (edges, shapes) distinguish one class of image from another, making it ideal for visual recognition.

Chapter 4: Major Contributors

Kunihiko Fukushima

Inventor of Neocognitron

Created the first network with convolution and pooling, directly inspiring CNNs.

Yann LeCun

Father of CNNs

Developed LeNet and LeNet-5, showing CNNs can solve real-world problems like digit recognition.

Yoshua Bengio

Deep Learning Pioneer

Major advances in deep nets and optimization, key to modern deep learning (and CNNs).

Geoffrey Hinton

Deep Learning Visionary

Pushed backpropagation and deep learning; supervised breakthroughs like AlexNet.

Alex Krizhevsky

Creator of AlexNet

Built and trained the breakthrough deep CNN that won ImageNet 2012 and sparked the deep learning boom.

David Hubel & Torsten Wiesel

Neuroscientists

Their work on receptive fields in vision laid the biological foundation for CNNs.

Chapter 5: References

Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.
Springer Link
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
PDF
Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012.
PDF
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. [Free Online Book]
Stanford CS231n: Convolutional Neural Networks for Visual Recognition
TensorFlow: Convolutional Neural Network Tutorial