# Physics:Energy based model

An **energy-based model** (EBM) is a form of generative model (GM) imported directly from statistical physics to learning. GMs learn an underlying data distribution by analyzing a sample dataset. Once trained, a GM can produce other datasets that also match the data distribution.^{[1]} EBMs provide a unified framework for many probabilistic and non-probabilistic approaches to such learning, particularly for training graphical and other structured models.^{[2]}

An EBM learns the characteristics of a target dataset and generates a similar but larger dataset. EBMs detect the latent variables of a dataset and generate new datasets with a similar distribution.^{[2]}

Target applications include natural language processing, robotics and computer vision.^{[2]}

## History

The term "energy-based models" was first coined in a JMLR paper ^{[3]} where the authors defined a generalisation of independent components analysis to the overcomplete setting using EBMs.
Other early work on EBMs proposed models that represented energy as a composition of latent and observable variables. EBMs surfaced in 2003.^{[4]}

## Approach

EBMs capture dependencies by associating an unnormalized probability scalar (*energy*) to each configuration of the combination of observed and latent variables. Inference consists of finding (values of) latent variables that minimize the energy given a set of (values of) the observed variables. Similarly, the model learns a function that associates low energies to correct values of the latent variables, and higher energies to incorrect values.^{[2]}

Traditional EBMs rely on stochastic gradient-descent (SGD) optimization methods that are typically hard to apply to high-dimension datasets. In 2019, OpenAI publicized a variant that instead used Langevin dynamics (LD). LD is an iterative optimization algorithm that introduces noise to the estimator as part of learning an objective function. It can be used for Bayesian learning scenarios by producing samples from a posterior distribution.^{[2]}

EBMs do not require that energies be normalized as probabilities. In other words, energies do not need to sum to 1. Since there is no need to estimate the normalization constant like probabilistic models do, certain forms of inference and learning with EBMs are more tractable and flexible.^{[2]}

Samples are generated implicitly via a Markov chain Monte Carlo approach.^{[5]} A replay buffer of past images is used with LD to initialize the optimization module.^{[2]}

## Characteristics

EBMs demonstrate useful properties:^{[2]}

- Simplicity and stability–The EBM is the only object that needs to be designed and trained. Separate networks need not be trained to ensure balance.
- Adaptive computation time–An EBM can generate sharp, diverse samples or (more quickly) coarse, less diverse samples. Given infinite time, this procedure produces true samples.
^{[1]} - Flexibility–In Variational Autoencoders (VAE) and flow-based models, the generator learns a map from a continuous space to a (possibly) discontinuous space containing different data modes. EBMs can learn to assign low energies to disjoint regions (multiple modes).
- Adaptive generation–EBM generators are implicitly defined by the probability distribution, and automatically adapt as the distribution changes (without training), allowing EBMs to address domains where generator training is impractical, as well as minimizing mode collapse and avoiding spurious modes from out-of-distribution samples.
^{[5]} - Compositionality–Individual models are unnormalized probability distributions, allowing models to be combined through product of experts or other hierarchical techniques.

## Experimental results

On image datasets such as CIFAR-10 and ImageNet 32x32, an EBM model generated high-quality images relatively quickly. It supported combining features learned from one type of image for generating other types of images. It was able to generalize using out-of-distribution datasets, outperforming flow-based and autoregressive models. EBM was relatively resistant to adversarial perturbations, behaving better than models explicitly trained against them with training for classification.^{[2]}

## Alternatives

EBMs compete with techniques such as variational autoencoders (VAEs) or Generative Adversarial Neural Networks (GANs).^{[2]}

## References

- ↑
^{1.0}^{1.1}"Implicit Generation and Generalization Methods for Energy-Based Models" (in en). 2019-03-21. https://openai.com/blog/energy-based-models/. - ↑
^{2.0}^{2.1}^{2.2}^{2.3}^{2.4}^{2.5}^{2.6}^{2.7}^{2.8}^{2.9}Rodriguez, Jesus (2019-04-01). "Generating Training Datasets Using Energy Based Models that Actually Scale" (in en). https://towardsdatascience.com/generating-training-datasets-using-energy-based-models-that-actually-scale-4e1f83bb9e00. - ↑ Teh, Yee Whye; Welling, Max; Osindero, Simon; Hinton, Geoffrey E. (December 2003). "Energy-Based Models for Sparse Overcomplete Representations".
*JMLR*. https://www.jmlr.org/papers/v4/teh03a.html. - ↑ LeCun, Yann (September 2003). "CBLL, Research Projects, Computational and Biological Learning Lab, Courant Institute, NYU". https://cs.nyu.edu/~yann/research/ebm/.
- ↑
^{5.0}^{5.1}Du, Yilun; Mordatch, Igor (2019-03-20). "Implicit Generation and Generalization in Energy-Based Models". arXiv:1903.08689 [cs.LG].

## External links

- "CIAR NCAP Summer School". http://www.cs.toronto.edu/~vnair/ciar/.
- Dayan, Peter; Hinton, Geoffrey; Neal, Radford; Zemel, Richard S. (1999), "Helmholtz Machine",
*Unsupervised Learning*(The MIT Press), doi:10.7551/mitpress/7011.003.0017, ISBN 978-0-262-28803-3 - Hinton, Geoffrey E. (August 2002). "Training Products of Experts by Minimizing Contrastive Divergence".
*Neural Computation***14**(8): 1771–1800. doi:10.1162/089976602760128018. ISSN 0899-7667. PMID 12180402. - Salakhutdinov, Ruslan; Hinton, Geoffrey (2009-04-15). "Deep Boltzmann Machines" (in en).
*Artificial Intelligence and Statistics*: 448–455. http://proceedings.mlr.press/v5/salakhutdinov09a.html.