Filters, random fields, and maximum entropy model

From HandWiki

In the domain of physics and probability, the filters, random fields, and maximum entropy (FRAME) model[1][2] is a Markov random field model (or a Gibbs distribution) of stationary spatial processes, in which the energy function is the sum of translation-invariant potential functions that are one-dimensional non-linear transformations of linear filter responses. The FRAME model was originally developed by Song-Chun Zhu, Ying Nian Wu, and David Mumford for modeling stochastic texture patterns, such as grasses, tree leaves, brick walls, water waves, etc. This model is the maximum entropy distribution that reproduces the observed marginal histograms of responses from a bank of filters (such as Gabor filters or Gabor wavelets), where for each filter tuned to a specific scale and orientation, the marginal histogram is pooled over all the pixels in the image domain. The FRAME model is also proved to be equivalent to the micro-canonical ensemble,[3] which was named the Julesz ensemble. Gibbs sampler[4] is adopted to synthesize texture images by drawing samples from the FRAME model. The original FRAME model is homogeneous for texture modeling. Xie et al. proposed the sparse FRAME model,[5][6] which is an inhomogeneous generalization of the original FRAME model, for the purpose of modeling object patterns, such as animal bodies, faces, etc. It is a non-stationary Markov random field model that reproduces the observed statistical properties of filter responses at a subset of selected locations, scales and orientations. The sparse FRAME model can be considered a deformable template.

The deep FRAME model [7][8] is a deep generalization of the original FRAME model. Instead of using linear filters as in the original FRAME model, Lu et al. uses the filters at a certain convolutional layer of a pre-learned ConvNet.[7] Instead of relying on the pre-trained filters from an existing ConvNet, Xie et al. parameterized the energy function of the FRAME model by a ConvNet structure and learn all parameters from scratch.[8] The deep FRAME model is the first framework that integrates modern deep neural network from deep learning and Gibbs distribution from statistical physics. The deep FRAME models are further generalized to modeling video patterns,[9][10] 3D volumetric shape patterns [11]

References

  1. Zhu, Song-Chun; Wu, Ying Nian; Mumford, David. "Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling". International Journal of Computer Vision: 1998. 
  2. Zhu, Song Chun; Wu, Ying Nian; Mumford, David (November 1997). "Minimax Entropy Principle and Its Application to Texture Modeling". Neural Computation 9 (8): 1627–1660. doi:10.1162/neco.1997.9.8.1627. ISSN 0899-7667. 
  3. Ying Nian Wu; Song Chun Zhu; Xiuwen Liu (1999). "Equivalence of Julesz and Gibbs texture ensembles". Proceedings of the Seventh IEEE International Conference on Computer Vision. IEEE. pp. 1025–1032 vol.2. doi:10.1109/iccv.1999.790382. ISBN 0-7695-0164-8. 
  4. Smith, Grahame B. (1987), "Stuart Geman and Donald Geman, "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images";", Readings in Computer Vision (Elsevier): pp. 562–563, doi:10.1016/b978-0-08-051581-6.50056-8, ISBN 978-0-08-051581-6 
  5. Xie, Jianwen; Hu, Wenze; Zhu, Song-Chun; Wu, Ying Nian (2014-10-02). "Learning Sparse FRAME Models for Natural Image Patterns". International Journal of Computer Vision 114 (2–3): 91–112. doi:10.1007/s11263-014-0757-x. ISSN 0920-5691. 
  6. Xie, Jianwen; Lu, Yang; Zhu, Song-Chun; Wu, Ying Nian (July 2016). "Inducing wavelets into random fields via generative boosting". Applied and Computational Harmonic Analysis 41 (1): 4–25. doi:10.1016/j.acha.2015.08.004. ISSN 1063-5203. 
  7. 7.0 7.1 Lu, Yang; Zhu, Song-Chun; Wu, Ying Nian (2016). "Learning FRAME models using CNN filters". 30th AAAI Conference on Artificial Intelligence 30. doi:10.1609/aaai.v30i1.10238. 
  8. 8.0 8.1 Xie, Jianwen; Lu, Yang; Zhu, Song-Chun; Wu, Ying Nian (2016). "A theory of generative ConvNet". International Conference on Machine Learning. 
  9. Xie, Jianwen; Zhu, Song-Chun; Wu, Ying Nian (July 2017). "Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet". 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. pp. 1061–1069. doi:10.1109/cvpr.2017.119. ISBN 978-1-5386-0457-1. 
  10. Xie, Jianwen; Zhu, Song-Chun; Wu, Ying Nian (2019). "Learning energy-based spatial-temporal generative ConvNet for dynamic patterns". IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (2): 516–531. doi:10.1109/TPAMI.2019.2934852. PMID 31425020. 
  11. Xie, Jianwen; Zheng, Zilong; Gao, Ruiqi; Wang, Wenguan; Zhu, Song-Chun; Wu, Ying Nian (June 2018). "Learning Descriptor Networks for 3D Shape Synthesis and Analysis". 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. pp. 8629–8638. doi:10.1109/cvpr.2018.00900. ISBN 978-1-5386-6420-9.