Chou's invariance theorem

From HandWiki

Chou's invariance theorem, named after Kuo-Chen Chou, was developed to address a problem raised in bioinformatics and cheminformatics related to multivariate statistics. Where a distance that would, in standard statistical theory, be defined as a Mahalanobis distance cannot be defined in this way because the relevant covariance matrix is singular. One effective approach to solve this problem would be to reduce the dimension of the multivariate space until the relevant covariance matrix is invertible or well defined. This can be achievable by simply omitting one or more of the original components until the matrix concerned is no longer singular. Chou's invariance theorem says that it does not matter which of the components or coordinates are selected for removal because exactly the same final value would be obtained.

Background

When using Mahalanobis distance or covariant discriminant[1] to calculate the similarity of two proteins based on their amino acid compositions, to avoid the divergence problem due to the normalization condition imposed to their 20 constituent components, a dimension-reduced operation is needed by leaving out one of the 20 components and making the remaining 19 components completely independent. However, which one of the 20 components should be removed? Will the result be different by removing a different component? The same problems also occur when the calculation is based on (20 + λ)-D (dimensional) pseudo amino acid composition, where λ is an integer. Generally speaking, to calculate the Mahalanobis distance or covariant discriminant between two vectors each with Ω normalized components, the dimension-reduced operation is needed and hence the aforementioned problems are always to occur. To address these problems, the Chou's Invariance Theorem was developed in 1995.

Essence

According to the Chou’s invariance theorem, the outcome of the Mahalanobis distance or covariant discriminant will remain the same regardless of which one of the components is left out. Accordingly, any one of the constituent normalized components can be left out to overcome the divergence problem without changing the final result for Mahalanobis distance or covariant discriminant.

Proof

The rigorous mathematical proof for the theorem was given in the appendix of a paper by Chou, [2] or appendix E of a review paper by Chou and Zhang [3]

Applications

The theorem has been used in predicting protein subcellular localization,[4] identifying apoptosis protein subcellular location,[5] predicting protein structural classification,[6][7] as well as identifying various other important attributes for proteins.

References

  1. "Protein subcellular location prediction". Protein Eng. 12 (2): 107–18. February 1999. doi:10.1093/protein/12.2.107. PMID 10195282. 
  2. Chou KC (April 1995). "A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space". Proteins 21 (4): 319–44. doi:10.1002/prot.340210406. PMID 7567954. 
  3. Chou, K.C.& Zhang, C.T. Review: Prediction of protein structural classes. Critical Reviews in Biochemistry and Molecular Biology, 1995, 30, 275-349. https://dx.doi.org/10.3109/10409239509083488
  4. "Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach". J. Protein Chem. 22 (4): 395–402. May 2003. doi:10.1023/A:1025350409648. PMID 13678304. 
  5. "Subcellular location prediction of apoptosis proteins". Proteins 50 (1): 44–8. January 2003. doi:10.1002/prot.10251. PMID 12471598. 
  6. Zhou GP (November 1998). "An intriguing controversy over protein structural class prediction". J. Protein Chem. 17 (8): 729–38. doi:10.1023/A:1020713915365. PMID 9988519. 
  7. "Some insights into protein structural class prediction". Proteins 44 (1): 57–9. July 2001. doi:10.1002/prot.1071. PMID 11354006.