Deep Learning Neural Networks

An Overview of History, Structure, Major Events, Models, and References

1. History

The concept of artificial neural networks (ANNs) dates back to the 1940s. The first mathematical model of a neuron was the McCulloch-Pitts neuron (1943), which inspired further exploration in computational models of the brain. In the 1950s and 1960s, Frank Rosenblatt introduced the Perceptron, a single-layer neural network, which could learn simple patterns.

However, neural networks faced skepticism after the 1969 publication of "Perceptrons" by Minsky and Papert, which proved limitations of single-layer networks. Specifically, the perceptron was unable to solve problems that were not linearly separable, like the XOR function. This led to a decline in neural network research for over a decade, as many believed the approach was fundamentally limited.

The field saw a resurgence in the 1980s with the development of the backpropagation algorithm (Rumelhart, Hinton, and Williams, 1986), enabling the training of multi-layer networks.

In the 2000s, increased computational power, large datasets, and algorithmic advances led to the rise of deep learning, referring to neural networks with many layers. Landmark achievements in image recognition and speech recognition have since established deep learning as a dominant approach in AI.

2. Major Events & Contributors

Key Contributors: Geoffrey Hinton, Yann LeCun, Yoshua Bengio, Andrew Ng, Demis Hassabis, Ian Goodfellow, Ilya Sutskever, Fei-Fei Li, and many others.

3. Structure of Deep Learning Neural Network

Deep learning neural networks are composed of multiple layers of interconnected nodes ("neurons"). Each layer transforms its input data using learned weights and activation functions, passing the result to the next layer. The basic structure includes:

Input Hidden Output

Figure: Example of a feedforward deep neural network with two hidden layers.

Specialized architectures include Convolutional Neural Networks (CNNs) for spatial data (e.g., images), Recurrent Neural Networks (RNNs) for sequential data (e.g., language), and Transformers for attention-based processing.

4. Major Models

5. Transformer

Input Encoder Encoder Self-Attention Layer Decoder Output
Transformer Model Overview: The transformer consists of an encoder (left), self-attention (center), and decoder (right). The self-attention mechanism allows the model to weigh the importance of different words in the input sequence, enabling more effective learning of context and relationships.
I love AI models
Self-Attention: Each word can "attend" to every other word in the sequence, allowing the model to dynamically capture relationships regardless of their position.

6. References

  1. McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics.
  2. Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review.
  3. Minsky, M., & Papert, S. (1969). Perceptrons. MIT Press.
  4. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature.
  5. LeCun, Y., et al. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE.
  6. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. NeurIPS.
  7. Goodfellow, I., et al. (2014). Generative adversarial nets. NeurIPS.
  8. Vaswani, A., et al. (2017). Attention is all you need. NeurIPS.
  9. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature.