1. History
The concept of artificial neural networks (ANNs) dates back to the 1940s. The first mathematical model of a neuron was the McCulloch-Pitts neuron (1943), which inspired further exploration in computational models of the brain. In the 1950s and 1960s, Frank Rosenblatt introduced the Perceptron, a single-layer neural network, which could learn simple patterns.
However, neural networks faced skepticism after the 1969 publication of "Perceptrons" by Minsky and Papert, which proved limitations of single-layer networks. Specifically, the perceptron was unable to solve problems that were not linearly separable, like the XOR function. This led to a decline in neural network research for over a decade, as many believed the approach was fundamentally limited.
The field saw a resurgence in the 1980s with the development of the backpropagation algorithm (Rumelhart, Hinton, and Williams, 1986), enabling the training of multi-layer networks.
In the 2000s, increased computational power, large datasets, and algorithmic advances led to the rise of deep learning, referring to neural networks with many layers. Landmark achievements in image recognition and speech recognition have since established deep learning as a dominant approach in AI.
2. Major Events & Contributors
- 1943: McCulloch & Pitts propose the first simplified brain cell model.
- 1958: Frank Rosenblatt invents the Perceptron model.
- 1969: Minsky & Papert highlight limitations of single-layer perceptrons.
- 1986: Rumelhart, Hinton & Williams popularize backpropagation.
- 1998: Yann LeCun et al. develop LeNet, an early convolutional neural network (CNN).
- 2006: Geoffrey Hinton introduces deep belief networks and unsupervised pre-training.
- 2012: Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton introduce deep CNNs, winning the ImageNet competition.
- 2014: Ian Goodfellow et al. introduce Generative Adversarial Networks (GANs).
- 2017: Vaswani et al. propose the Transformer architecture.
Key Contributors: Geoffrey Hinton, Yann LeCun, Yoshua Bengio, Andrew Ng, Demis Hassabis, Ian Goodfellow, Ilya Sutskever, Fei-Fei Li, and many others.
3. Structure of Deep Learning Neural Network
Deep learning neural networks are composed of multiple layers of interconnected nodes ("neurons"). Each layer transforms its input data using learned weights and activation functions, passing the result to the next layer. The basic structure includes:
- Input Layer: Receives raw data (e.g., pixels, text vectors).
- Hidden Layers: One or more layers that extract features through nonlinear transformations.
- Output Layer: Produces final predictions (e.g., classification labels).
Figure: Example of a feedforward deep neural network with two hidden layers.
Specialized architectures include Convolutional Neural Networks (CNNs) for spatial data (e.g., images), Recurrent Neural Networks (RNNs) for sequential data (e.g., language), and Transformers for attention-based processing.
4. Major Models
- LeNet (1998): Early CNN for handwritten digit recognition.
- AlexNet (2012): Deep CNN that revolutionized image classification.
- VGGNet (2014): Used very deep networks with small filters.
- GoogLeNet / Inception (2014): Introduced inception modules for efficient computation.
- ResNet (2015): Introduced residual connections, enabling very deep networks.
- GANs (2014): Generative Adversarial Networks for data generation.
- LSTM (1997): Long Short-Term Memory networks for sequence modeling.
- Transformer (2017): Attention-based model, foundation for modern NLP (e.g., BERT, GPT).
- BERT (2018): Bidirectional language model pre-training for NLP.
- GPT Series (2018-): Generative Pre-trained Transformers for language generation.
- AlphaGo (2016): DeepMind’s system combining deep learning and reinforcement learning for the game of Go.
5. Transformer
6. References
- McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics.
- Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review.
- Minsky, M., & Papert, S. (1969). Perceptrons. MIT Press.
- Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature.
- LeCun, Y., et al. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE.
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. NeurIPS.
- Goodfellow, I., et al. (2014). Generative adversarial nets. NeurIPS.
- Vaswani, A., et al. (2017). Attention is all you need. NeurIPS.
- LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature.