Software:Genozip
Original author(s) | Divon Lan |
---|---|
Initial release | 2020 |
Repository | https://github.com/divonlan/genozip |
Written in | C |
Platform | Linux, Mac, Windows, others |
Type | Bioinformatics |
License | Genozip non-commerical license |
Website | genozip |
Genozip[1][2][3] is a proprietary universal compressor for genomic files - it is optimized to compress FASTQ, SAM/BAM/CRAM, VCF/BCF, FASTA, GVF, PHYLIP and 23andMe files, but it can also compress any other file (including non-genomic files).
Genozip works by segmenting a source file into its individual data contexts, applying context-specific algorithms to exploit correlations between values within the same context or between contexts, and finally applying the appropriate compression codec to each context.[2][4]
Genozip is designed to be extensible: it may be extended either by adding new segmenters (to add support for compressing additional file formats) context-specific algorithms and/or codecs.[1]
Genozip is the first universal compressor of genomic file formats (i.e. able to compress all common genomic file formats), and as such it is frequently cited and benchmarked against in research papers related to compression of genomic data. [5][6][7][3]
References
- ↑ 1.0 1.1 Lan,D. et al. (2021) Genozip: a universal extensible genomic data compressor. Bioinformatics (Oxford University Press)
- ↑ 2.0 2.1 Abdullah,T (2020) Genozip- a new compression tool for VCF files. Bioinformatics Review
- ↑ 3.0 3.1 効率的なゲノムファイル(FASTQ、SAM/BAM/CRAM、VCF、GVF、FASTA、PHYLIP、23andMe)の圧縮器 Genozip
- ↑ Lan,D. et al. (2020) genozip: a fast and efficient compression tool for VCF files. Bioinformatics (Oxford University Press), 36, 4091–4092.
- ↑ Deorowicz,S. Danek,A. (2020) VCFShark: how to squeeze a VCF file. bioRxiv
- ↑ Lin,M.F.,el al. (2020) Sparse project VCF: efficient encoding of population genotype matrices. Bioinformatics (Oxford University Press)
- ↑ Shokrof,M. Abouelhoda,M.I. (2020) IonCRAM: a reference-based compression tool for ion torrent sequence files. BMC Bioinformatics (Springer Nature)