MgNet

From HandWiki

The MgNet[1] is an abstract and unified mathematical framework which simultaneously recovers some residual neural network (ResNet)[2][3] type convolutional neural networks (CNNs) and multigrid methods[4][5] for solving discretized partial differential equations (PDEs). As a CNN model, MgNet can be obtained by making some very minor modifications of a classic geometric multigrid method. Actually, connections between ResNet and classical multigrid methods were acknowledged in the original paper of ResNet[2] from the viewpoint how residuals are applied in both methods. MgNet[1] makes such a connection more direct and clear, and it makes it possible to directly obtain a class of efficient CNN models by simply making some very minor modification of a typical multigrid cycle but keeping the identically same algorithm structure.

Main structure and connections with ResNet

One core concept in MgNet, motivated by our research in algebraic multigrid methods,[5] is the distinction between the so-called data and feature spaces (that are dual to each other). Based on this new concept, MgNet and a further research (Juncai He; Yuyan Chen; Jinchao Xu (2019). "Constrained Linear Data-feature Mapping for Image Classification". arXiv:1911.10428v1 [eess.IV].CS1 maint: uses authors parameter (link)) proposes the constrained data-feature mapping model in every grid as

[math]\displaystyle{ A \ast u=f, }[/math]

where [math]\displaystyle{ f }[/math] belongs to the data space and [math]\displaystyle{ u }[/math] belongs to the feature space such that

[math]\displaystyle{ u \ge 0 }[/math].

The feature extraction process can then be obtained through an iterative procedure for solving the above system in each grids. For example, if the single step residual correction scheme is applied for the above system, it becomes

[math]\displaystyle{ u^{i} = u^{i-1} + \sigma \circ B^{i} \ast \sigma(f^{} - A\ast u^{i-1}), \quad i = 1:\nu, }[/math]

with [math]\displaystyle{ u \approx u^{\nu} }[/math].

If the residual of the above iterative [math]\displaystyle{ r^i = f - A\ast u^i }[/math]is further considered, it becomes

[math]\displaystyle{ r^{i} = r^{i-1} - A\ast \sigma \circ B^i\ast\sigma(r^{i-1}), \quad i=1:\nu. }[/math]

This is almost the exact basic block scheme in Pre-act ResNet,[3] which has the form

[math]\displaystyle{ r^{i} = r^{i-1} - A^i \ast \sigma \circ B^i\ast\sigma(r^{i-1}), \quad i=1:\nu. }[/math]

The next figure shows the pseudocode of MgNet:

One thing important to note is that the special MgNet Algorithm 1 is identical to a multigrid cycle[4][5] if the boxed nonlinear operations are removed in the algorithm.

Summary

By revealing such a direct connection between CNN and multigrid method, this opens up a new door to the design and study of deep learning models from a more mathematical viewpoint and in particular the rich mathematical techniques developed for multigrid method can be applied in the study of deep learning.

References

  1. 1.0 1.1 He, Juncai; Xu, Jinchao (July 2019). "MgNet: A unified framework of multigrid and convolutional neural network" (in en). Science China Mathematics 62 (7): 1331–1354. doi:10.1007/s11425-019-9547-2. ISSN 1674-7283. 
  2. 2.0 2.1 Sun, Jian; Ren, Shaoqing; Zhang, Xiangyu; He, Kaiming (2015-12-10). "Deep Residual Learning for Image Recognition". arXiv:1512.03385v1 [cs.CV].
  3. 3.0 3.1 Sun, Jian; Ren, Shaoqing; Zhang, Xiangyu; He, Kaiming (2016-03-16). "Identity Mappings in Deep Residual Networks". arXiv:1603.05027v3 [cs.CV].
  4. 4.0 4.1 Xu, Jinchao. (1992-12-01). "Iterative Methods by Space Decomposition and Subspace Correction". SIAM Review 34 (4): 581–613. doi:10.1137/1034116. ISSN 0036-1445. 
  5. 5.0 5.1 5.2 Zikatanov, Ludmil; Xu, Jinchao (May 2017). "Algebraic multigrid methods *" (in en). Acta Numerica 26: 591–721. doi:10.1017/S0962492917000083. ISSN 0962-4929. https://www.cambridge.org/core/journals/acta-numerica/article/algebraic-multigrid-methods/8FFBCDA39DB9631667396C9CD1F223BF.