Software:BioPerl

From HandWiki
Short description: Collection of Perl modules for bioinformatics
BioPerl
BioPerlLogo.png
Initial release11 June 2002 (2002-06-11)
Written inPerl
TypeBioinformatics
LicenseArtistic License and GPL
Websitebioperl.org

BioPerl[1][2] is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications. It has played an integral role in the Human Genome Project.[3]

Background

BioPerl is an active open source software project supported by the Open Bioinformatics Foundation. The first set of Perl codes of BioPerl was created by Tim Hubbard and Jong Bhak[citation needed] at MRC Centre Cambridge, where the first genome sequencing was carried out by Fred Sanger. MRC Centre was one of the hubs and birth places of modern bioinformatics as it had a large quantity of DNA sequences and 3D protein structures. Hubbard was using the th_lib.pl Perl library, which contained many useful Perl subroutines for bioinformatics. Bhak, Hubbard's first PhD student, created jong_lib.pl. Bhak merged the two Perl subroutine libraries into Bio.pl. The name BioPerl was coined jointly by Bhak and Steven Brenner at the Centre for Protein Engineering (CPE). In 1995, Brenner organized a BioPerl session at the Intelligent Systems for Molecular Biology conference, held in Cambridge. BioPerl had some users in coming months including Georg Fuellen who organized a training course in Germany. Fuellen's colleagues and students greatly extended BioPerl; this was further expanded by others, including Steve Chervitz who was actively developing Perl codes for his yeast genome database. The major expansion came when Cambridge student Ewan Birney joined the development team.[citation needed]

The first stable release was on 11 June 2002; the most recent stable (in terms of API) release is 1.7.2 from 07 September 2017. There are also developer releases produced periodically. Version series 1.7.x is considered to be the most stable (in terms of bugs) version of BioPerl and is recommended for everyday use.

In order to take advantage of BioPerl, the user needs a basic understanding of the Perl programming language including an understanding of how to use Perl references, modules, objects and methods.

Features and examples

BioPerl provides software modules for many of the typical tasks of bioinformatics programming. These include:

Example of accessing GenBank to retrieve a sequence:

use Bio::DB::GenBank;

$db_obj = Bio::DB::GenBank->new;

$seq_obj = $db_obj->get_Seq_by_acc( # Insert Accession Number );
  • Transforming formats of database/ file records

Example code for transforming formats

use Bio::SeqIO;

my $usage = "all2y.pl informat outfile outfileformat";
my $informat = shift or die $usage;
my $outfile = shift or die $usage;
my $outformat = shift or die $usage;

my $seqin = Bio::SeqIO->new( -fh  => *STDIN,  -format => $informat, );
my $seqout = Bio::SeqIO->new( -file  => ">$outfile",  -format => $outformat, );

while (my $inseq = $seqin->next_seq)
{
   $seqout->write_seq($inseq);
}
  • Manipulating individual sequences

Example of gathering statistics for a given sequence

use Bio::Tools::SeqStats;
$seq_stats = Bio::Tools::SeqStats->new($seqobj);

$weight = $seq_stats->get_mol_wt();
$monomer_ref = $seq_stats->count_monomers();

# for nucleic acid sequence
$codon_ref = $seq_stats->count_codons();

Usage

In addition to being used directly by end-users,[4] BioPerl has also provided the base for a wide variety of bioinformatic tools, including amongst others:

  • SynBrowse[5]
  • GeneComber[6]
  • TFBS[7]
  • MIMOX[8]
  • BioParser[9]
  • Degenerate primer design[10]
  • Querying the public databases[11]
  • Current Comparative Table[12]

New tools and algorithms from external developers are often integrated directly into BioPerl itself:

  • Dealing with phylogenetic trees and nested taxa[13]
  • FPC Web tools[14]

Advantages

BioPerl was one of the first biological module repositories that increased its usability. It has very easy to install modules, along with a flexible global repository. BioPerl uses good test modules for a large variety of processes.

Disadvantages

There are many ways to use BioPerl, from simple scripting to very complex object programming. This makes the language not clear and sometimes hard to understand. For as many modules that BioPerl has, some do not always work the way they are intended.[citation needed]

Related libraries in other programming languages

Several related bioinformatics libraries implemented in other programming languages exist as part of the Open Bioinformatics Foundation, including:

References

  1. Stajich, J. E.; Block, D.; Boulez, K.; Brenner, S.; Chervitz, S.; Dagdigian, C.; Fuellen, G.; Gilbert, J. et al. (2002). "The BioPerl Toolkit: Perl Modules for the Life Sciences". Genome Research 12 (10): 1611–1618. doi:10.1101/gr.361602. PMID 12368254. 
  2. "BioPerl publications - BioPerl". http://www.bioperl.org/wiki/BioPerl_publications.  A complete, up-to-date list of BioPerl references
  3. Lincoln Stein (1996). "How Perl saved the human genome project". The Perl Journal 1 (2). http://www.bioperl.org/wiki/How_Perl_saved_human_genome. Retrieved 2009-02-25. 
  4. "Methods for identifying and mapping recent segmental and gene duplications in eukaryotic genomes". Gene Mapping, Discovery, and Expression. Methods Mol Biol. 338. Totowa, N.J. : Humana Press. 2006. pp. 9–20. doi:10.1385/1-59745-097-9:9. ISBN 978-1-59745-097-3. https://archive.org/details/genemappingdisco00mino. 
  5. Pan, X.; Stein, L.; Brendel, V. (2005). "SynBrowse: A synteny browser for comparative sequence analysis". Bioinformatics 21 (17): 3461–3468. doi:10.1093/bioinformatics/bti555. PMID 15994196. 
  6. Shah, S. P.; McVicker, G. P.; MacKworth, A. K.; Rogic, S.; Ouellette, B. F. F. (2003). "GeneComber: Combining outputs of gene prediction programs for improved results". Bioinformatics 19 (10): 1296–1297. doi:10.1093/bioinformatics/btg139. PMID 12835277. 
  7. Lenhard, B.; Wasserman, W. W. (2002). "TFBS: Computational framework for transcription factor binding site analysis". Bioinformatics 18 (8): 1135–1136. doi:10.1093/bioinformatics/18.8.1135. PMID 12176838. 
  8. Huang, J.; Gutteridge, A.; Honda, W.; Kanehisa, M. (2006). "MIMOX: A web tool for phage display based epitope mapping". BMC Bioinformatics 7: 451. doi:10.1186/1471-2105-7-451. PMID 17038191. 
  9. Catanho, M.; Mascarenhas, D.; Degrave, W.; De Miranda, A. B. ?L. (2006). "BioParser". Applied Bioinformatics 5 (1): 49–53. doi:10.2165/00822942-200605010-00007. PMID 16539538. 
  10. Wei, X.; Kuhn, D. N.; Narasimhan, G. (2003). "Degenerate primer design via clustering". Proceedings. IEEE Computer Society Bioinformatics Conference 2: 75–83. PMID 16452781. 
  11. Croce, O.; Lamarre, M. L.; Christen, R. (2006). "Querying the public databases for sequences using complex keywords contained in the feature lines". BMC Bioinformatics 7: 45. doi:10.1186/1471-2105-7-45. PMID 16441875. 
  12. Landsteiner, B. R.; Olson, M. R.; Rutherford, R. (2005). "Current Comparative Table (CCT) automates customized searches of dynamic biological databases". Nucleic Acids Research 33 (Web Server issue): W770–W773. doi:10.1093/nar/gki432. PMID 15980582. 
  13. Llabrés, M.; Rocha, J.; Rosselló, F.; Valiente, G. (2006). "On the Ancestral Compatibility of Two Phylogenetic Trees with Nested Taxa". Journal of Mathematical Biology 53 (3): 340–364. doi:10.1007/s00285-006-0011-4. PMID 16823581. 
  14. Pampanwar, V.; Engler, F.; Hatfield, J.; Blundy, S.; Gupta, G.; Soderlund, C. (2005). "FPC Web Tools for Rice, Maize, and Distribution". Plant Physiology 138 (1): 116–126. doi:10.1104/pp.104.056291. PMID 15888684.