BioLinux

From HandWiki
Short description: Projects involved in bioinformatics software on Linux

BioLinux is a term used in a variety of projects involved in making access to bioinformatics software on a Linux platform easier using one or more of the following methods:

  • Provision of complete systems
  • Provision of bioinformatics software repositories
  • Addition of bioinformatics packages to standard distributions
  • Live DVD/CDs with bioinformatics software added
  • Community building and support systems

There are now various projects with similar aims, on both Linux systems and other Unices, and a selection of these are given below. There is also an overview in the Canadian Bioinformatics Helpdesk Newsletter[1] that details some of the Linux-based projects.

Package repositories

Apple/Mac

Many Linux packages are compatible with Mac OS X and there are several projects which attempt to make it easy to install selected Linux packages (including bioinformatics software) on a computer running Mac OS X. (source?)

BioArchLinux

BioArchLinux repository contain more than 3,770 packages for Arch Linux and Arch Linux based distribution.

Debian

Debian is another very popular Linux distribution in use in many academic institutions, and some bioinformaticians have made their own software packages available for this distribution in the deb format.

Red Hat

Package repositories are generally specific to the distribution of Linux the bioinformatician is using. A number of Linux variants are prevalent in bioinformatics work. Fedora is a freely-distributed version of the commercial Red Hat system. Red Hat is widely used in the corporate world as they offer commercial support and training packages. Fedora Core is a community supported derivative of Red Hat and is popular amongst those who like Red Hat's system but don't require commercial support. Many users of bioinformatics applications have produced RPMs (Red Hat's package format) designed to work with Fedora, which you can potentially also install on Red Hat Enterprise Linux systems. Other distributions such as Mandriva and SUSE use RPMs, so these packages may also work on these distributions.

Slackware

Slackware is one of the less used Linux distributions. It is popular with those who have better knowledge of the Linux operating system and who prefer the command line over the various GUIs available. Packages are in the tgz or tgx format. The most widely known live distribution based on Slackware is Slax and it has been used as a base for many of the bioinformatics distributions.

Live DVDs/CDs

Live DVDs or CDs are not an ideal way to provide bioinformatics computing, as they run from a CD/DVD drive. This means they are slower than a traditional hard disk installation and have limited ability to be configured. However, they can be suitable for providing ad hoc solutions where no other Linux access is available, and may even be used as the basis for a Linux installation.

Standard distributions with good bioinformatics support

In general, Linux distributions have a wide range of official packages available, but this does not usually include much in the way of scientific support. There are exceptions, such as those detailed below.

Gentoo Linux

Gentoo Linux provides over 156 bioinformatics applications (see Gentoo sci-biology herd in the main tree) in the form of ebuilds, which build the applications from source code. Additional 315 packages are in Gentoo science overlay (for testing).

Although a very flexible system with excellent community support, the requirement to install from source means that Gentoo systems are often slow to install, and require considerable maintenance. It is possible to reduce some of the compilation time by using a central server to generate binary packages. On the other hand, you can fine tune all to run at the highest speed utilizing the best of your processor (for example to actually use SSE and AVX and AVX2 CPU instructions). Binary-based distro's usually provide binaries using only i686 or even just i386 instruction sets.

FreeBSD

FreeBSD is not a Linux distribution, but a version of Unix that it is very similar. Its ports are analogous Gentoo's ebuilds. However, the project continuously builds pre-compiled binary packages for Tier-1 platforms such as x86 and ARM. Users can also choose to build and install any port from source in order to enable non-portable optimizations or other build options. The build-from-source option also allows the ports system to automate installation of software with a license that does not permit redistribution.

The ports collection contains over 31,000 ports, of which over 2,200 are in scientific categories, and over 240 are biology-related. New ports and updates are listed on the Fresh Ports[2] site.

pkgsrc

The pkgsrc package manager, originally forked from FreeBSD ports, is maintained by the NetBSD project, but aims to support all POSIX-compatible operating systems. It is well-tested on NetBSD, many Linux distributions, macOS, and SunOS derivatives. Like FreeBSD ports, pre-compiled binary packages are maintained for some platforms. Packages can be built from source on any platform, or if additional optimizations or options are desired. The pkgsrc collection contains over 19,000 packages, of which nearly 800 are in scientific categories, and over 60 are biology-related.

Debian

There are more than a hundred bioinformatics packages provided as part of the standard Debian installation. NEBC Bio-Linux[3] packages can also be installed on a standard Debian system as long as the bio-linux-base package is also installed. This creates a /usr/local/bioinf directory where our other packages install their software. Debian packages may also work on Ubuntu Linux or other Debian-derived installations.

Community building and support systems

Providing support and documentation should be an important part of any BioLinux project, so that scientists who are not IT specialists may quickly find answers to their specific problems. Support forums or mailing lists are also useful to disseminate knowledge within the research community. Some of these resources are linked to here.

See also

References