Alternative explanations of the "grandmother" cell

From HandWiki

The presumed grandmother cell responds to a seemingly impossible large number of stimuli. If the stimulus is a photo of your grandmother looking straight into the camera, the cell fires. If the stimulus is a painting of your grandmother in profile, the same cell fires. If the stimulus is a sketch of your grandmother walking away, drawn by a street artist, the same cell fires.

This neuron activates when a person identifies a specific entity, such as his or her grandmother. These neurons would activate not only when presented with a family member, but also when presented with celebrities such as Halle Berry. The term was coined by Jerry Lettvin around 1969.[1] A similar concept had been proposed a few years earlier by Jerzy Konorski who called such cells gnostic units. Since grandmother cells are highly controversial it was very surprising when Kreiman announced in 2001 that he had discovered one. Since then, several explanations of such findings have been proposed which do not involve grandmother cells. One explanation is based on sparse coding and another on neural nets.

Biological implausibility

A neuron normally fires when excited by one very specific feature of a stimulus.[2] In a study performed by Logothetis and Pauls, this is seen with the spiking of a single macaque monkey neuron when presented with a paper clip bent in a certain way at a variety of angles. At a specific angle, the neuron fires rapidly. At different angles or shapes, it does not. A grandmother cell, by contrast, would seem to fire at many different shapes and angles. And it is very difficult to imagine what kind of mechanism "will bind ... information from disparate sensory sources" in a cell, which makes a grandmother cell "a tremendous connectivity puzzle".[3]

Discovery

In 2001, Kreiman announced that he had found a grandmother cell. This "neuron in the human amygdala, for example, fires to a caricature, a portrait, and a group photo of President Clinton, but not to 47 other pictures of famous men, presidents, and a variety of objects".[4][5] In 2004, Koch affirmed "such cells do exist".[6] In 2005, Quiroga and coworkers obtained similar results.[6] However, in 2007, Quiroga, Kreiman, Koch and Fried rescinded their initial views, trying to explain their results with sparse coding.[7]

Alternative explanations

Sparse coding

Sparse coding starts with the activation of moderately small sets of neurons in a small region of the brain. In each region of the brain, related stimuli have a different subset of available neurons to process them together. This reduces the overlap of the representation of two items. Each item is individually represented by a different neuron or a small set of neurons. This helps represent data more efficiently.

In a recent article Quiroga, Kreiman, Koch and Fried admitted they had in fact not found grandmother cells, rather they had found sparse coding.[7] The main question was how information was represented in the upper stages of the hierarchy, and still made accessible to perceptual, cognitive and mnemonic processes. This is why sparse coding became an alternative to grandmother cells. The idea of sparse coding is that very small numbers of neurons respond to specific features, objects or concepts in an obvious manner.

Being able to find one cell would be a difficult task. It seems implausible that grandmother cells have any extraordinary binding properties.[3] They are the properties of sparse coding instead. It should also be noted that it is difficult to detect sparse firing neurons. Single electrode recordings with moveable probes are used to detect firing, but it is common to miss firing cells.[7]

Waydo and Koch conducted a study where a two layered network was exposed to 40 different face images of ten individuals. Each output unit was constrained to fire to the smallest possible number of inputs. Also, the smallest number of units represented each image. They found that most of the units responded to a single individual. This suggests that a sparse neuronal representation could emerge in the Medial Temporal Lobe (MTL) while using unsupervised learning.[8]

Sparse coding often takes place in the hippocampus. The medial temporal lobe is critical for long term memory, particularly episodic memories. Both the hippocampus and the cortex may interact to produce sparse coding. This helps with numerous tasks, including word, object and facial identification.[9]

Sparse representations are also associated with parallel distributed processing which proves to have great success with rapid learning. It does not have success with generalization, though. Compared to grandmother cells, an individual neuron in sparse coding is involved in coding two things, and two neurons code for a specific stimulus.[9]

Overall, visual information goes through the ventral pathway. Evidence that was obtained from recordings of single-neuron activity in humans suggests that a subset of MTL neurons presents an invariant representation of perceived objects. This means that the neurons respond to abstract concepts rather than more basic metric details. This information favors sparse coding over grandmother cells, because the neurons fire only to very few stimuli, and are mostly silent with the exception of their preferred stimuli.[7]

Neural nets and distributive explanation

Grandmother cells can be explained in a much simpler way. As Munevar proposes, grandmother cells are in fact the output cells of neural nets rather than cells acting independently. The existence of grandmother cells clashes with the model of the brain as a distributive system and is improbable because such neurons would have powers of representation across visuals angles and contexts.[10] The distributive explanation argues that the grandmother cell is nothing but the single-neuron output stage of a neural network trained to recognize a person. The hypothetical grandmother cell does not have any extraordinary binding properties. They are properties of the neural network instead.

Although Kreiman's and Quiroga's results may seem to challenge the distributive model of the brain, the distributive theory offers a simple explanation of these phenomena.[4][6] Such cells serve as the output neurons for neural networks trained to recognize the famous persons in question.

Paul Churchland's accessible account of a network that distinguishes between sonar echoes of explosive underwater mines vs. those of rocks provides us with a better understanding of this idea.[11] Churchland's work is based on the work of Gorman and Sejnowski.[12] The network becomes trained as feedback is delivered to its hidden units. The feedback includes the error of the network in deciding whether every echo in the training set belongs to a mine or a rock.

The important factor here is that the recognition is done not by the output cell ("mine") but by the middle (hidden) layer. When that middle layer recognizes a mine echo, the single-cell output layer "lights up", which is exactly the sort of thing we find in those networks that recognize Clinton, Aniston or Berry.[11]

This account extends beyond the discrimination between two alternatives (mines and rocks) to that between different faces. Therefore, the distributive brain may have a single-cell output layer fire whenever a specific person, e.g. your grandmother, is presented to the eye; although many famous people, such as Jennifer Aniston or Halle Berry, may be of greater interest. Neural networks, then, explain very plausibly the otherwise paradoxical results of the recent grandmother cell experiments.[11] Neural nets are a form of parallel distributed processing.

Parallel distributed processing

In 1950 Frank Rosenblatt looked into a new approach in neural net technology. He used a circuit consisting of an array of input units. These units are connected through a set of intermediate neurons. In Rosenblatt's trials, he used binary words in both input and output, and he called his invention Perception. Perception was the most basic and revolutionary type of neural net. In 1969, the limitations of Perception were explained by Minsky and Papert.[13] Neural nets have been expanded by a variety of neuroscientists and psychologists. Churchland discusses neural nets in the context of facial coding. The example consisted of a bank teller who could not remember the face of a bank robber to provide a description. However, Churchland says the teller would likely recognize and discriminate the robber's face when she sees him again. This example contains not only the idea of neural nets, but also of vector coding.[11]

Cottrell, in research at the University of California at San Diego, developed the artificial neural network which had a 64 X 64 pixel grid with 256 activation levels. The network consisted of recognizable representations of real faces.[14] Each input cell had a radiating set of axonal end branches to every cell in the middle layer. The middle layer is hidden and projects the coding from 80 cells to the eight output cells. These eight output cells could discriminate faces from non-faces just like grandmother cells are thought to do. Cottrell had no idea how synaptic connections should be configured in the artificial neural network. Cottrell overcame this issue by using a biologically realistic adjustment.

The Artificial Neural Network (ANN) was initially trained with 64 images, but was eventually pushed outside the training boundary. In a training session the facial recognition was performing at 100%. When the ANN performed facial recognition testing outside of training, the network was still performing at 100%. Even when one-fifth of the face was covered with a black strip the face was still 100% effective as long as the strip did not cover the forehead. The ANN proves another way in which a small number of cells can seem to do a big number of tasks. Previously, grandmother cells supposedly have a unique ability to complete these tasks. The input layer is variously activated by 80 holistic features which vector codes for the second layer. This vector code is an activation which allows the third layer to identify a known individual correctly.[11] This is also the case in parallel distributed processing (PDP). PDP highlights the parallels between connectionist models and neural coding in the brain, while also dismissing localism.[9]

Churchland explains that neural nets contain three basic levels. The levels are the input layer, hidden layer and output layer. Munevar, following the PDP theory, believes that output cells can explain the presumed grandmother cells. PDP involves the performance of hundreds of millions of individual computation simultaneously, instead of in a lengthy sequence.[11] It takes 10 milliseconds for a visual impulse to go across a sign layer to layer transformation, and reaches the 3rd layer where buried information has been made explicit.

In PDP, grandmother cells are a thing to ridicule. Finkel called them infamous grandmother cells, and Connor said no one wants to be accused of believing in grandmother cells.[9] 50 years of studies involving PDP and grandmother cells have not really found any grandmother cells, but rather that the brain codes information with a form of distributed coding. Localist models ignore the neuroscience data taken to support distributed coding schemes, specifically single cell electrophysiological recording studies. Localist models are instead often supported by neuropsychological data. Biologically, the PDP model is superior and can be supported using the many different methods and studies. This includes studies of PDP in the brain, as discussed by Bowers.

See also

References

  1. Gross, CG (October 2002). "Genealogy of the "grandmother cell"". Neuroscientist 8 (5): 512–8. doi:10.1177/107385802237175. PMID 12374433. 
  2. Logothetis, N. K.; Pauls, J. (1995). "Psychophysical and Physiological Evidence for Viewer-Centered Object Representation in the Primate". Cerebral Cortex 5 (3): 270–288. doi:10.1093/cercor/5.3.270. 
  3. 3.0 3.1 Llinás, R. (2002). I of the Vortex: From Neurons to Self. MIT Press.
  4. 4.0 4.1 Kreiman, G. (2001). On the Neuronal Activity in the Human Brain During Visual Recognition, Imagery and Binocular Rivalry, Ph.D. thesis: California Institute of Technology.
  5. Kreiman, G.; Fried, I.; Koch, C. (2002). "Single-Neuron Correlates of Subjective Vision in the Human Medial Temporal Lobe". Proceedings of the National Academy of Sciences 99 (12): 8378–8383. doi:10.1073/pnas.072194099. PMID 12034865. 
  6. 6.0 6.1 6.2 Quiroga, R. Q.; Reddy, L.; Kreiman, G.; Koch, C.; Fried, I. (2005). "Invariant Visual Representation by Single Neurons in the Human Brain". Nature 435 (7045): 1102–1107. doi:10.1038/nature03687. 
  7. 7.0 7.1 7.2 7.3 Quiroga, R., Kreiman, G., Koch, C., and Fried, I. (2007). "Sparse but not 'Grandmother cell' coding in the medial temporal lobe." Elsevier.
  8. Waydo, S. and Koch, C (2008). Unsupervised Learning of Individuals and Categories from Images. Neural Comput. 20, 1-14.
  9. 9.0 9.1 9.2 9.3 Bowers, J. S. (2009). "On the biological plausibility of grandmother cells. Implications for neural network theories in psychology and neuroscience". Psychological Review 116 (1): 220–251. doi:10.1037/a0014462. PMID 19159155. 
  10. Munevar, G (2008). "A Distributive Explanation of "Grandmother" Cells". Proceedings of the XXII World Congress of Philosophy 34: 25–31. 
  11. 11.0 11.1 11.2 11.3 11.4 11.5 Churchland, P. M. (1996). The Engine of Reason, the Seat of the Soul. MIT Press.
  12. Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets." Neural Networks, Vol. 1.
  13. Harth, E (1997). "From Brains to Neural Nets to Brains". Neural Networks 10: 1241–1255. doi:10.1016/s0893-6080(97)00048-8. 
  14. Cottrell, G., and Metcalfe, J. (1991). "EMPATH: Face, Emotion and Gender Recognition Using Holons," in Lippman, R., Moody, J., and Touretzky, D, eds., Advances in Neural Information Processing Systems, Vol. 3. Morgan Kaufmann.