LeNet

From HandWiki
Short description: Convolutional neural network structure

LeNet is a convolutional neural network structure proposed by LeCun et al. in 1998.[1] In general, LeNet refers to LeNet-5 and is a simple convolutional neural network. Convolutional neural networks are a kind of feed-forward neural network whose artificial neurons can respond to a part of the surrounding cells in the coverage range and perform well in large-scale image processing.

Development history

LeNet-5 was one of the earliest convolutional neural networks and promoted the development of deep learning. Since 1988, after years of research and many successful iterations, the pioneering work has been named LeNet-5.

Yann LeCun in 2018

In 1989, Yann LeCun et al. at Bell Labs first applied the backpropagation algorithm to practical applications, and believed that the ability to learn network generalization could be greatly enhanced by providing constraints from the task's domain. He combined a convolutional neural network trained by backpropagation algorithms to read handwritten numbers and successfully applied it in identifying handwritten zip code numbers provided by the US Postal Service. This was the prototype of what later came to be called LeNet.[2] In the same year, LeCun described a small handwritten digit recognition problem in another paper, and showed that even though the problem is linearly separable, single-layer networks exhibited poor generalization capabilities. When using shift-invariant feature detectors on a multi-layered, constrained network, the model could perform very well. He believed that these results proved that minimizing the number of free parameters in the neural network could enhance the generalization ability of the neural network.[3]

In 1990, their paper described the application of backpropagation networks in handwritten digit recognition again. They only performed minimal preprocessing on the data, and the model was carefully designed for this task and it was highly constrained. The input data consisted of images, each containing a number, and the test results on the postal code digital data provided by the US Postal Service showed that the model had an error rate of only 1% and a rejection rate of about 9%.[4]

Their research continued for the next four years, and in 1994 MNIST database was developed, for which LeNet-1 was too small, hence a new NN LeNet-4 was trained on it.[5] A year later the AT&T Bell Labs collective introduced LeNet-5 and reviewed various methods on handwritten character recognition in paper, using standard handwritten digits to identify benchmark tasks. These models were compared and the results showed that the latest network outperformed other models.[6] By 1998 Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner were able to provided examples of practical applications of neural networks, such as two systems for recognizing handwritten characters online and models that could read millions of checks per day.[1]

The research achieved great success and aroused the interest of scholars in the study of neural networks. While the architecture of the best performing neural networks today are not the same as that of LeNet, the network was the starting point for a large number of neural network architectures, and also brought inspiration to the field.

Timeline
1989 Yann LeCun et al. proposed the original form of LeNet LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W. & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4):541-551.[2]
1989 Yann LeCun proves that minimizing the number of free parameters in neural networks can enhance the generalization ability of neural networks. LeCun, Y.(1989). Generalization and network design strategies. Technical Report CRG-TR-89-4, Department of Computer Science, University of Toronto.[3]
1990 Their paper describes the application of backpropagation networks in handwritten digit recognition once again LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W. & Jackel, L. D. (1990). Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems 2 (NIPS*89).[4]
1994 MNIST database and LeNet-4 developed
1995 LeNet-5 developed, various methods applied to handwritten character recognition reviewed and compared with standard handwritten digit recognition benchmarks. The results show that convolutional neural networks outperform all other models.
1998 Practical applications LeCun, Y.; Bottou, L.; Bengio, Y. & Haffner, P. (1998). Gradient-based learning applied to document recognition.Proceedings of the IEEE. 86(11): 2278 - 2324.[1]

Structure

As a representative of the early convolutional neural network, LeNet possesses the basic units of convolutional neural network, such as convolutional layer, pooling layer and full connection layer, laying a foundation for the future development of convolutional neural network. As shown in the figure (input image data with 32*32 pixels) : LeNet-5 consists of seven layers. In addition to input, every other layer can train parameters. In the figure, Cx represents convolution layer, Sx represents sub-sampling layer, Fx represents complete connection layer, and x represents layer index.[2][7][8]

Layer C1 is a convolution layer with six convolution kernels of 5x5 and the size of feature mapping is 28x28, which can prevent the information of the input image from falling out of the boundary of convolution kernel.

Layer S2 is the subsampling/pooling layer that outputs 6 feature graphs of size 14x14. Each cell in each feature map is connected to 2x2 neighborhoods in the corresponding feature map in C1.

Layer C3 is a convolution layer with 16 5-5 convolution kernels. The input of the first six C3 feature maps is each continuous subset of the three feature maps in S2, the input of the next six feature maps comes from the input of the four continuous subsets, and the input of the next three feature maps comes from the four discontinuous subsets. Finally, the input for the last feature graph comes from all feature graphs of S2.

Layer S4 is similar to S2, with size of 2x2 and output of 16 5x5 feature graphs.

Layer C5 is a convolution layer with 120 convolution kernels of size 5x5. Each cell is connected to the 5*5 neighborhood on all 16 feature graphs of S4. Here, since the feature graph size of S4 is also 5x5, the output size of C5 is 1*1. So S4 and C5 are completely connected. C5 is labeled as a convolutional layer instead of a fully connected layer, because if LeNet-5 input becomes larger and its structure remains unchanged, its output size will be greater than 1x1, i.e. not a fully connected layer.

F6 layer is fully connected to C5, and 84 feature graphs are output.

Features

  • Every convolutional layer includes three parts: convolution, pooling, and nonlinear activation functions
  • Using convolution to extract spatial features (Convolution was called receptive fields originally)
  • Subsampling average pooling layer
  • tanh activation function
  • Using MLP as the last classifier
  • Sparse connection between layers to reduce the complexity of computation

Application

Recognizing simple digit images is the most classic application of LeNet as it was created because of that.

Yann LeCun et al. created the initial form of LeNet in 1989. The paper Backpropagation Applied to Handwritten Zip Code Recognition[2] demonstrates how such constraints can be integrated into a backpropagation network through the architecture of the network. And it had been successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service.[2]

Development analysis

The LeNet-5 means the emergence of CNN and defines the basic components of CNN.[1] But it was not popular at that time because of the lack of hardware, especially GPUs and other algorithm, such as SVM can achieve similar effects or even exceed the LeNet.

Since the success of AlexNet in 2012, CNN has become the best choice for computer vision applications and many different types of CNN has been created, such as the R-CNN series. Nowadays, CNN models are quite different from LeNet, but they are all developed on the basis of LeNet.

A three layer tree architecture imitating LeNet-5 and consisting of only one convolutional layer, has achieved a similar success rate on the CIFAR-10 dataset.[9]

Increasing the number of filters for the LeNet architecture results in a power law decay of the error rate. These results indicate that a shallow network can achieve the same performance as deep learning architectures.[10]

References

  1. 1.0 1.1 1.2 1.3 Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. (1998). "Gradient-based learning applied to document recognition". Proceedings of the IEEE 86 (11): 2278–2324. doi:10.1109/5.726791. http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf. 
  2. 2.0 2.1 2.2 2.3 2.4 LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W.; Jackel, L. D. (December 1989). "Backpropagation Applied to Handwritten Zip Code Recognition". Neural Computation 1 (4): 541–551. doi:10.1162/neco.1989.1.4.541. ISSN 0899-7667. 
  3. 3.0 3.1 Lecun, Yann (June 1989). "Generalization and network design strategies". Technical Report CRG-TR-89-4 (Department of Computer Science, University of Toronto). http://yann.lecun.com/exdb/publis/pdf/lecun-89.pdf. 
  4. 4.0 4.1 LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W.; Jacker, L. D. (June 1990). "Handwritten digit recognition with a back-propagation network". Advances in Neural Information Processing Systems 2: 396–404. http://yann.lecun.com/exdb/publis/pdf/lecun-90c.pdf. 
  5. http://yann.lecun.com/exdb/publis/pdf/bottou-94.pdf
  6. https://www.eecis.udel.edu/~shatkay/Course/papers/NetworksAndCNNClasifiersIntroVapnik95.pdf
  7. "卷积神经网络之LeNet - Brook_icv - 博客园" (in zh-cn). https://www.cnblogs.com/wangguchangqing/p/10329402.html. 
  8. "深度学习 CNN 卷积神经网络 LeNet-5 详解" (in zh-CN). https://blog.csdn.net/happyorg/article/details/78274066. 
  9. Meir, Yuval; Ben-Noam, Itamar; Tzach, Yarden; Hodassman, Shiri; Kanter, Ido (2023-01-30). "Learning on tree architectures outperforms a convolutional feedforward network" (in en). Scientific Reports 13 (1): 962. doi:10.1038/s41598-023-27986-6. ISSN 2045-2322. PMID 36717568. Bibcode2023NatSR..13..962M. 
  10. Meir, Yuval; Tevet, Ofek; Tzach, Yarden; Hodassman, Shiri; Gross, Ronit D.; Kanter, Ido (2023-04-20). "Efficient shallow learning as an alternative to deep learning" (in en). Scientific Reports 13 (1): 5423. doi:10.1038/s41598-023-32559-8. ISSN 2045-2322. PMID 37080998. Bibcode2023NatSR..13.5423M.