Biology:5-step rules

From HandWiki

The 5-Steps Rule or 5-Step Rules for proteome or genome analysis was originally proposed by Kuo-Chen Chou in 2011 named by many scientists as “Chou’s 5-steps rule” or “Chou’s 5-step rules”, [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] that has been widely used for proteome and genome analyses as well as predicting posttranslational modification (PTM) sites in protein, RNA, and DNA sequences. [14]

According to this rule, to develop a practically more useful statistical prediction method or predictor for genome or proteome analysis, one should observe the following five guidelines. (1) Construct or select a valid benchmark dataset to train and test the predictor. (2) Formulate the biological sequence samples with an effective mathematical expression that can truly reflect their intrinsic correlation with the target to be predicted. (3) Introduce or develop a powerful algorithm (or engine) to operate the prediction. (4) Properly perform cross-validation tests to objectively evaluate the anticipated accuracy of the predictor. (5) Establish a user-friendly web-server for the predictor that is accessible to the public. Ever since then, the 5-steps rule has been used by many scientists in developing various predictors for proteome or genome analyses, particularly by those who are formulating biological sequences with PseAAC or PseKNC to develop various predictors for proteome or genome analyses. Papers presented for developing a new sequence-analyzing method or statistical predictor by observing the guidelines of Chou’s 5-strp rules have the following notable merits: (1) crystal clear in logic development, (2) completely transparent in operation, (3) easily to repeat the reported results by other investigators, (4) with high potential in stimulating other sequence-analyzing methods, and (5) very convenient to be used by the majority of experimental scientists. Moreover, the Chou’s 5-steps rule has been further extended to materials science for developing powerful method of detecting perovskite materials with higher Curie temperature as well.[15]

References

  1. "SPalmitoylC-PseAAC: A sequence-based model developed via Chou's 5-steps rule and general PseAAC for identifying S-palmitoylation sites in proteins.". Anal. Biochem. 568: 14–23. Mar 2019. doi:10.1016/j.ab.2018.12.019. PMID 30593778. 
  2. "iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding.". Anal. Biochem. 57: 53–61. Feb 2019. doi:10.1016/j.ab.2019.02.017. PMID 30822398. 
  3. "dForml(KNN)-PseAAC: Detecting Formylation sites from protein sequences using K-nearest neighbor algorithm via Chou's 5-step rule and Pseudo components.". J. Theor. Biol.. Mar 2019. doi:10.1016/j.jtbi.2019.03.011. PMID 30880183. 
  4. "Identifying DNase I hypersensitive sites using multi-features fusion and F-score features selection via Chou's 5-steps rule.". Biophysical Chemistry. Jul 2019. doi:10.1016/j.bpc.2019.106227. PMID 31325710. 
  5. "Prediction of lysine formylation sites using the composition of k-spaced amino acid pairs via Chou's 5-steps rule and general pseudo components.". Genomics. May 2019. doi:10.1016/j.ygeno.2019.05.027. PMID 31175975. 
  6. "iPhosH-PseAAC: Identify phosphohistidine sites in proteins by blending statistical moments and position relative features according to the Chou's 5-step rule and general pseudo amino acid composition.". IEEE/ACM Trans Comput Biol Bioinform. Jun 2019. doi:10.1109/TCBB.2019.2919025. PMID 31144645. 
  7. "iN6-methylat (5-step): identifying DNA N(6)-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule.". Mol Genet Genomics. May 2019. doi:10.1007/s00438-019-01570-y. PMID 31055655. 
  8. "iMotor-CNN: Identifying molecular functions of cytoskeleton motor proteins using 2D convolutional neural network via Chou's 5-step rule.". Anal Biochem. Jun 2019. doi:10.1016/j.ab.2019.03.017. PMID 30930199. 
  9. "dForml(KNN)-PseAAC: Detecting formylation sites from protein sequences using K-nearest neighbor algorithm via Chou's 5-step rule and pseudo components.". J Theor Biol. Jun 2019. doi:10.1016/j.jtbi.2019.03.011. PMID 30880183. 
  10. "iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding.". Anal Biochem. Apr 2019. doi:10.1016/j.ab.2019.02.017. PMID 30822398. 
  11. "iNR-2L: A two-level sequence-based predictor developed via Chou's 5-steps rule and general PseAAC for identifying nuclear receptors and their families.". Genomics. Feb 2019. doi:10.1016/j.ygeno.2019.02.006. PMID 30779939. 
  12. "SPrenylC-PseAAC: A sequence-based model developed via Chou's 5-steps rule and general PseAAC for identifying S-prenylation sites in proteins.". J Theor Biol. May 2019. doi:10.1016/j.jtbi.2019.02.007. PMID 30768975. 
  13. "SPalmitoylC-PseAAC: A sequence-based model developed via Chou's 5-steps rule and general PseAAC for identifying S-palmitoylation sites in proteins.". Anal Biochem. Mar 2019. doi:10.1016/j.ab.2018.12.019. PMID 30593778. 
  14. "iNR-2L: A two-level sequence-based predictor developed via Chou's 5-steps rule and general PseAAC for identifying nuclear receptors and their families.". Genomics. Feb 2019. doi:10.1016/j.ygeno.2019.02.006. PMID 30779939. 
  15. Zhan, X., Chen, M., Lu, W. (2018). Accelerated search for perovskite materials with higher Curie temperature based on the machine learning methods. Computational Materials Science 151, 41-48. https://doi.org/10.1016/j.commatsci.2018.04.031