Biology:Protein–DNA interaction site predictor

From HandWiki

Structural and physical properties of DNA provide important constraints on the binding sites formed on surfaces of DNA-binding proteins. Characteristics of such binding sites may be used for predicting DNA-binding sites from the structural and even sequence properties of unbound proteins. This approach has been successfully implemented for predicting the protein–protein interface. Here, this approach is adopted for predicting DNA-binding sites in DNA-binding proteins. First attempt to use sequence and evolutionary features to predict DNA-binding sites in proteins was made by Ahmad et al. (2004) and Ahmad and Sarai (2005).[1] Some methods use structural information to predict DNA-binding sites and therefore require a three-dimensional structure of the protein, while others use only sequence information and do not require protein structure in order to make a prediction.

Web servers

Structure- and sequence-based prediction of DNA-binding sites in DNA-binding proteins can be performed on several web servers listed below. DISIS predicts DNA binding sites directly from the amino acid sequence and hence is applicable for all known proteins. It is based on the chemical-physical properties of the residue and its environment, predicted structural features and evolutionary data. It uses machine learning algorithms.[2] DISIS2 receives the raw amino acid sequence and generates all features from it, such as secondary structure, solvent accessibility, disorder, b-value, protein-protein interaction, coiled coils, and evolutionary profiles, etc. The amount of predicted features is much larger than of DISIS (previous version). Finally, DISIS2 is able to predict DNA-binding residues from protein sequence of DNA-binding proteins. DNABindR predicts DNA binding sites from amino acid sequences using machine learning algorithms.[3] DISPLAR makes a prediction based on properties of protein structure. Knowledge of the protein structure is required [4] BindN makes a prediction based on chemical properties of the input protein sequence. Knowledge of the protein structure is not required.[5] BindN+ is an upgraded version of BindN that applies support vector machines (SVMs) to sequence-based prediction of DNA or RNA-binding residues from biochemical features and evolutionary information.[6] DP-Bind combines multiple methods to make a consensus prediction based on the profile of evolutionary conservation and properties of the input protein sequence. Profile of evolutionary conservation is automatically generated by the web-server. Knowledge of the protein structure is not required.[7] DBS-PSSM[1] and DBS-Pred[8] predict the DNA-binding in a protein from their sequence information.

See also

References

  1. 1.0 1.1 Shandar Ahmad; Akinori Sarai (2005). "PSSM based prediction of DNA-binding sites in proteins". BMC Bioinformatics 6 (33): 33. doi:10.1186/1471-2105-6-33. PMID 15720719. (This article also shows how prediction can be significantly sped up by generating alignments against limited data sets)
  2. Ofran , Y.; Mysore , V.; Rost B. (2007). "Prediction of DNA-binding residues from sequence". Bioinformatics 23 (13): i347-53. doi:10.1093/bioinformatics/btm174. PMID 17646316. 
  3. Yan, C., Terribilini, M., Wu, F., Jernigan, R.L., Dobbs, D., and Honavar V. Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinformatics, 2006, 7:262
  4. Tjong , H. and Zhou, H.-X. DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces. Nucleic Acids Research 35:1465-1477 (2007)
  5. L. Wang, and S. J. Brown. "BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences." Nucleic Acids Research. 2006 Jul 1;34(Web Server issue):W243-8. PMID 16845003
  6. Wang L, Huang C, Yang MQ, Yang JY. "BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features" BMC Systems Biology 2010 4(Suppl 1):S3 doi:10.1186/1752-0509-4-S1-S3
  7. Hwang, S , Gou, Z and Kuznetsov, I.B. "DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins" Bioinformatics 2007 23(5):634-636 PMID 17237068
  8. Ahmad, S.; Gromiha, M. M.; Sarai, A. (2004-01-22). "Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information". Bioinformatics (Oxford University Press (OUP)) 20 (4): 477–486. doi:10.1093/bioinformatics/btg432. ISSN 1367-4803. PMID 14990443.  (This article also uses amino acid composition analysis to predict DNA-binding proteins, and uses structure information to improve binding site prediction. The method is based on single sequences only and thousands of proteins can be processed in less than an hour). Standalone is also available.