# Restricted Boltzmann machine

__: Class of artificial neural network__

**Short description**Machine learning and data mining |
---|

A **restricted Boltzmann machine** (**RBM**) (also called a **restricted Sherrington–Kirkpatrick model with external field** or **restricted stochastic Ising–Lenz–Little model**) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.^{[1]}

RBMs were initially proposed under the name **Harmonium** by Paul Smolensky in 1986,^{[2]} and rose to prominence after Geoffrey Hinton and collaborators used fast learning algorithms for them in the mid-2000s. RBMs have found applications in dimensionality reduction,^{[3]} classification,^{[4]} collaborative filtering,^{[5]} feature learning,^{[6]} topic modelling^{[7]} and even many body quantum mechanics.^{[8]}^{[9]} They can be trained in either supervised or unsupervised ways, depending on the task.

As their name implies, RBMs are a variant of Boltzmann machines, with the restriction that their neurons must form a bipartite graph:
a pair of nodes from each of the two groups of units (commonly referred to as the "visible" and "hidden" units respectively) may have a symmetric connection between them; and there are no connections between nodes within a group. By contrast, "unrestricted" Boltzmann machines may have connections between hidden units. This restriction allows for more efficient training algorithms than are available for the general class of Boltzmann machines, in particular the gradient-based **contrastive divergence** algorithm.^{[10]}

Restricted Boltzmann machines can also be used in deep learning networks. In particular, deep belief networks can be formed by "stacking" RBMs and optionally fine-tuning the resulting deep network with gradient descent and backpropagation.^{[11]}

## Structure

The standard type of RBM has binary-valued (Boolean) hidden and visible units, and consists of a matrix of weights [math]\displaystyle{ W }[/math] of size [math]\displaystyle{ m\times n }[/math]. Each weight element [math]\displaystyle{ (w_{i,j}) }[/math] of the matrix is associated with the connection between the visible (input) unit [math]\displaystyle{ v_i }[/math] and the hidden unit [math]\displaystyle{ h_j }[/math]. In addition, there are bias weights (offsets) [math]\displaystyle{ a_i }[/math] for [math]\displaystyle{ v_i }[/math] and [math]\displaystyle{ b_j }[/math] for [math]\displaystyle{ h_j }[/math]. Given the weights and biases, the *energy* of a configuration (pair of boolean vectors) (*v*,*h*) is defined as

- [math]\displaystyle{ E(v,h) = -\sum_i a_i v_i - \sum_j b_j h_j -\sum_i \sum_j v_i w_{i,j} h_j }[/math]

or, in matrix notation,

- [math]\displaystyle{ E(v,h) = -a^{\mathrm{T}} v - b^{\mathrm{T}} h -v^{\mathrm{T}} W h. }[/math]

This energy function is analogous to that of a Hopfield network. As with general Boltzmann machines, the joint probability distribution for the visible and hidden vectors is defined in terms of the energy function as follows,^{[12]}

- [math]\displaystyle{ P(v,h) = \frac{1}{Z} e^{-E(v,h)} }[/math]

where [math]\displaystyle{ Z }[/math] is a partition function defined as the sum of [math]\displaystyle{ e^{-E(v,h)} }[/math] over all possible configurations, which can be interpreted as a normalizing constant to ensure that the probabilities sum to 1. The marginal probability of a visible vector is the sum of [math]\displaystyle{ P(v,h) }[/math] over all possible hidden layer configurations,^{[12]}

- [math]\displaystyle{ P(v) = \frac{1}{Z} \sum_{\{h\}} e^{-E(v,h)} }[/math],

and vice versa. Since the underlying graph structure of the RBM is bipartite (meaning there are no intra-layer connections), the hidden unit activations are mutually independent given the visible unit activations. Conversely, the visible unit activations are mutually independent given the hidden unit activations.^{[10]} That is, for *m* visible units and *n* hidden units, the conditional probability of a configuration of the visible units v, given a configuration of the hidden units h, is

- [math]\displaystyle{ P(v|h) = \prod_{i=1}^m P(v_i|h) }[/math].

Conversely, the conditional probability of h given v is

- [math]\displaystyle{ P(h|v) = \prod_{j=1}^n P(h_j|v) }[/math].

The individual activation probabilities are given by

- [math]\displaystyle{ P(h_j=1|v) = \sigma \left(b_j + \sum_{i=1}^m w_{i,j} v_i \right) }[/math] and [math]\displaystyle{ \,P(v_i=1|h) = \sigma \left(a_i + \sum_{j=1}^n w_{i,j} h_j \right) }[/math]

where [math]\displaystyle{ \sigma }[/math] denotes the logistic sigmoid.

The visible units of Restricted Boltzmann Machine can be multinomial, although the hidden units are Bernoulli.^{[clarification needed]} In this case, the logistic function for visible units is replaced by the softmax function

- [math]\displaystyle{ P(v_i^k = 1|h) = \frac{\exp(a_i^k + \Sigma_j W_{ij}^k h_j)} {\Sigma_{k'=1}^K \exp(a_i^{k'} + \Sigma_j W_{ij}^{k'} h_j)} }[/math]

where *K* is the number of discrete values that the visible values have. They are applied in topic modeling,^{[7]} and recommender systems.^{[5]}

### Relation to other models

Restricted Boltzmann machines are a special case of Boltzmann machines and Markov random fields.^{[13]}^{[14]}
Their graphical model corresponds to that of factor analysis.^{[15]}

## Training algorithm

Restricted Boltzmann machines are trained to maximize the product of probabilities assigned to some training set [math]\displaystyle{ V }[/math] (a matrix, each row of which is treated as a visible vector [math]\displaystyle{ v }[/math]),

- [math]\displaystyle{ \arg\max_W \prod_{v \in V} P(v) }[/math]

or equivalently, to maximize the expected log probability of a training sample [math]\displaystyle{ v }[/math] selected randomly from [math]\displaystyle{ V }[/math]:^{[13]}^{[14]}

- [math]\displaystyle{ \arg\max_W \mathbb{E} \left[ \log P(v)\right] }[/math]

The algorithm most often used to train RBMs, that is, to optimize the weight matrix [math]\displaystyle{ W }[/math], is the contrastive divergence (CD) algorithm due to Hinton, originally developed to train PoE (product of experts) models.^{[16]}^{[17]}
The algorithm performs Gibbs sampling and is used inside a gradient descent procedure (similar to the way backpropagation is used inside such a procedure when training feedforward neural nets) to compute weight update.

The basic, single-step contrastive divergence (CD-1) procedure for a single sample can be summarized as follows:

- Take a training sample v, compute the probabilities of the hidden units and sample a hidden activation vector h from this probability distribution.
- Compute the outer product of v and h and call this the
*positive gradient*. - From h, sample a reconstruction v' of the visible units, then resample the hidden activations h' from this. (Gibbs sampling step)
- Compute the outer product of v' and h' and call this the
*negative gradient*. - Let the update to the weight matrix [math]\displaystyle{ W }[/math] be the positive gradient minus the negative gradient, times some learning rate: [math]\displaystyle{ \Delta W = \epsilon (vh^\mathsf{T} - v'h'^\mathsf{T}) }[/math].
- Update the biases a and b analogously: [math]\displaystyle{ \Delta a = \epsilon (v - v') }[/math], [math]\displaystyle{ \Delta b = \epsilon (h - h') }[/math].

A Practical Guide to Training RBMs written by Hinton can be found on his homepage.^{[12]}

## Stacked Restricted Boltzmann Machine

- The difference between the Stacked Restricted Boltzmann Machines and RBM is that RBM has lateral connections within a layer that are prohibited to make analysis tractable. On the other hand, the Stacked Boltzmann consists of a combination of an unsupervised three-layer network with symmetric weights and a supervised fine-tuned top layer for recognizing three classes.
- The usage of Stacked Boltzmann is to understand Natural languages, retrieve documents, image generation, and classification. These functions are trained with unsupervised pre-training and/or supervised fine-tuning. Unlike the undirected symmetric top layer, with a two-way unsymmetric layer for connection for RBM. The restricted Boltzmann's connection is three-layers with asymmetric weights, and two networks are combined into one.
- Stacked Boltzmann does share similarities with RBM, the neuron for Stacked Boltzmann is a stochastic binary Hopfield neuron, which is the same as the Restricted Boltzmann Machine. The energy from both Restricted Boltzmann and RBM is given by Gibb's probability measure: [math]\displaystyle{ E = -\frac12\sum_{i,j}{w_{ij}{s_i}{s_j}}+\sum_i{\theta_i}{s_i} }[/math]. The training process of Restricted Boltzmann is similar to RBM. Restricted Boltzmann train one layer at a time and approximate equilibrium state with a 3-segment pass, not performing back propagation. Restricted Boltzmann uses both supervised and unsupervised on different RBM for pre-training for classification and recognition. The training uses contrastive divergence with Gibbs sampling: Δw
_{ij}= e*(p_{ij}- p'_{ij}) - The restricted Boltzmann's strength is it performs a non-linear transformation so it's easy to expand, and can give a hierarchical layer of features. The Weakness is that it has complicated calculations of integer and real-valued neurons. It does not follow the gradient of any function, so the approximation of Contrastive divergence to maximum likelihood is improvised.
^{[12]}

## Literature

- Fischer, Asja; Igel, Christian (2012), "An Introduction to Restricted Boltzmann Machines",
*Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications*, Lecture Notes in Computer Science (Berlin, Heidelberg: Springer Berlin Heidelberg)**7441**: pp. 14–36, doi:10.1007/978-3-642-33275-3_2, ISBN 978-3-642-33274-6

## See also

## References

- ↑ Sherrington, David; Kirkpatrick, Scott (1975), "Solvable Model of a Spin-Glass",
*Physical Review Letters***35**(35): 1792–1796, doi:10.1103/PhysRevLett.35.1792, Bibcode: 1975PhRvL..35.1792S - ↑ Smolensky, Paul (1986). "Chapter 6: Information Processing in Dynamical Systems: Foundations of Harmony Theory". in Rumelhart, David E.; McLelland, James L..
*Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations*. MIT Press. pp. 194–281. ISBN 0-262-68053-X. https://stanford.edu/~jlmcc/papers/PDP/Volume%201/Chap6_PDP86.pdf. - ↑ Hinton, G. E.; Salakhutdinov, R. R. (2006). "Reducing the Dimensionality of Data with Neural Networks".
*Science***313**(5786): 504–507. doi:10.1126/science.1127647. PMID 16873662. Bibcode: 2006Sci...313..504H. http://www.cs.toronto.edu/~hinton/science.pdf. Retrieved 2015-12-02. - ↑ Larochelle, H.; Bengio, Y. (2008). "Classification using discriminative restricted Boltzmann machines". Proceedings of the 25th international conference on Machine learning - ICML '08. pp. 536. doi:10.1145/1390156.1390224. ISBN 9781605582054. http://machinelearning.org/archive/icml2008/papers/601.pdf.
- ↑
^{5.0}^{5.1}Salakhutdinov, R.; Mnih, A.; Hinton, G. (2007). "Restricted Boltzmann machines for collaborative filtering". Proceedings of the 24th international conference on Machine learning - ICML '07. pp. 791. doi:10.1145/1273496.1273596. ISBN 9781595937933. - ↑ Coates, Adam; Lee, Honglak; Ng, Andrew Y. (2011). "An analysis of single-layer networks in unsupervised feature learning". International Conference on Artificial Intelligence and Statistics (AISTATS). http://cs.stanford.edu/~acoates/papers/coatesleeng_aistats_2011.pdf. Retrieved 2014-12-19.
- ↑
^{7.0}^{7.1}Ruslan Salakhutdinov and Geoffrey Hinton (2010). Replicated softmax: an undirected topic model .*Neural Information Processing Systems***23**. - ↑ Carleo, Giuseppe; Troyer, Matthias (2017-02-10). "Solving the quantum many-body problem with artificial neural networks" (in en).
*Science***355**(6325): 602–606. doi:10.1126/science.aag2302. ISSN 0036-8075. PMID 28183973. Bibcode: 2017Sci...355..602C. - ↑ Melko, Roger G.; Carleo, Giuseppe; Carrasquilla, Juan; Cirac, J. Ignacio (September 2019). "Restricted Boltzmann machines in quantum physics" (in en).
*Nature Physics***15**(9): 887–892. doi:10.1038/s41567-019-0545-1. ISSN 1745-2481. Bibcode: 2019NatPh..15..887M. - ↑
^{10.0}^{10.1}Miguel Á. Carreira-Perpiñán and Geoffrey Hinton (2005). On contrastive divergence learning.*Artificial Intelligence and Statistics*. - ↑ Hinton, G. (2009). "Deep belief networks".
*Scholarpedia***4**(5): 5947. doi:10.4249/scholarpedia.5947. Bibcode: 2009SchpJ...4.5947H. - ↑
^{12.0}^{12.1}^{12.2}^{12.3}Geoffrey Hinton (2010).*A Practical Guide to Training Restricted Boltzmann Machines*. UTML TR 2010–003, University of Toronto. - ↑
^{13.0}^{13.1}Sutskever, Ilya; Tieleman, Tijmen (2010). "On the convergence properties of contrastive divergence".*Proc. 13th Int'l Conf. On AI and Statistics (AISTATS)*. http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_SutskeverT10.pdf. - ↑
^{14.0}^{14.1}Asja Fischer and Christian Igel. Training Restricted Boltzmann Machines: An Introduction . Pattern Recognition 47, pp. 25-39, 2014 - ↑ María Angélica Cueto; Jason Morton; Bernd Sturmfels (2010). "Geometry of the restricted Boltzmann machine".
*Algebraic Methods in Statistics and Probability*(American Mathematical Society)**516**. Bibcode: 2009arXiv0908.4425A. - ↑ Geoffrey Hinton (1999). Products of Experts.
*ICANN 1999*. - ↑ Hinton, G. E. (2002). "Training Products of Experts by Minimizing Contrastive Divergence".
*Neural Computation***14**(8): 1771–1800. doi:10.1162/089976602760128018. PMID 12180402. http://www.cs.toronto.edu/~hinton/absps/tr00-004.pdf.

## External links

- Introduction to Restricted Boltzmann Machines. Edwin Chen's blog, July 18, 2011.
- "A Beginner's Guide to Restricted Boltzmann Machines". https://deeplearning4j.org/restrictedboltzmannmachine.html.. Deeplearning4j Documentation
- "Understanding RBMs". http://deeplearning4j.org/understandingRBMs.html.. Deeplearning4j Documentation
- Python implementation of Bernoulli RBM and tutorial
- SimpleRBM is a very small RBM code (24kB) useful for you to learn about how RBMs learn and work.
- Julia implementation of Restricted Boltzmann machines: https://github.com/cossio/RestrictedBoltzmannMachines.jl

Original source: https://en.wikipedia.org/wiki/Restricted Boltzmann machine.
Read more |