Neural networks

From HandWiki


Neural networks (or, with more precision, artificial NNs) are mathematical models that are inspired by the connections and the functioning of neurons in biological systems. NNs have given rise to a branch of research called neural computing, being used or tried out in many disciplines. Basically, NNs are based on two simple concepts, the topology of nodes and connections between them, and transfer functions which relate the input and output of each node. A node receives input data through its input connections, performs a very simple operation on these (weighted sum and some kind of thresholding function), and passes the result on its output connection(s), as final output or for use in other nodes. Recent interest in this class of algorithms (which includes cellular automata as a subset) was stimulated Hopfield86 by good results and excellent robustness on simple tasks. Many classification and pattern recognition problems can be expressed in terms of NNs. For introductory reading, see Beale91 or Bishop95.

The inherent simplicity of NNs suggests that massive parallelism and possibly special, very simple hardware can be taken advantage of in the implementation of NNs, e.g. semiconductors or optical elements. More relevant than implementation questions, however, appears to be the understanding of the virtues and pitfalls of NNs as algorithms. One of their important properties is that they can be trained, i.e. they can be given training samples of events of different classes, and by learning algorithms of various complications, can adjust the weights associated to all input connections until some overall function is maximized which characterizes the quality of the decision mechanism. The optimization is often viewd in analogy with the minimizing of a physical potential (Boltzmann machine); the function is then termed an ``energy function. Impressive results can be achieved on small-size classification problems, where NNs can learn up to a good performance level without more input than training samples; a common example is character recognition.

An optimization of the choice of input data and of network topology is usually left to trial and error. A frequently found suggestion is that input data should describe events exhaustively; this rule of thumb can be translated into the use as input of all variables that can be thought of as having problem-oriented relevance (and no more). Unnecessarily large and possibly inadequate neural networks can be avoided by pre-processing of data and/or (partial) feature extraction; in general, it is a useful suggestion to reduce and transform the variables of the training sample into fewer or new variables, with whatever a priori information may exist on them, before submitting them to a NN training algorithm. The variables should display translation- and scale-invariance with respect to the information to be extracted. Studies have shown that such variables are implicitly used (``found) by the training procedure, if they are linear combinations of the input variables, but not in general. Indeed, if the thresholding function is a simple step function, a feedforward network of more than one layer performs multiple piecewise linear transformations; decision boundaries are then multiple hyperplanes. For more involved thresholding functions (transfer functions or activation functions), sigmoid functions or tanh, the interpretation is more complicated.

NNs are often used as a way of optimizing a classification (or pattern recognition) procedure; this optimization aspect puts NNs close to other optimization tools ( Hepa img2.gif Minimization), which also define an objective function that has to be maximized. NNs also usually have more input than output nodes; they may thus also be viewed as performing a dimensionality reduction on input data, in a way more general than principal component analysis. Another possible interpretation of network outputs is that of probabilities; for a discussion, see Bishop95.

The trial-and-error approach is usually also taken for the initial choice of weights needed to launch the learning process. Robustness is demonstrated by showing that different starting values converge to the same or similar results.

Once trained, neural networks in many cases are robust with respect to incomplete data. Training may also be a continuing process, in that the network weights are updated periodically by new training samples; this is indicated if the characteristics of the input data are subject to slow evolution, or if training samples are not initially available, i.e. the network has to learn on the data.

Depending on the topology of interconnection and the time sequence of operations, networks can be classified (Humpert90), from simple one-directional networks with few layers acting in step (feedforward), of which the nodes or neurons are sometimes also called perceptrons, to the fully connected networks (Hopfield network).

For multiple practical applications, Hepa img1.gif e.g. Horn97.