BLOOM (language model)

From HandWiki
Revision as of 07:21, 27 June 2023 by SpringEdit (talk | contribs) (update)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Short description: Open-access multilingual language model


BigScience Large Open-science Open-access Multilingual Language Model (BLOOM[1]) is a transformer-based language model. It was created by over 1000 AI researchers to provide a free large language model for everyone who wants to try. Trained on around 176 billion parameters over March through July 2022, it is considered an alternative to OpenAI's GPT-3 trained on 176 billion parameters. BLOOM uses a decoder-only transformer model architecture modified from Megatron-LM GPT-2.

The BLOOM project[2] was started by a co-founder of Hugging Face. Six main groups of people were involved, including HuggingFace's BigScience team, the Microsoft DeepSpeed team, the NVIDIA Megatron-LM team, the IDRIS/GENCI team, the PyTorch team, and the volunteers in the BigScience Engineering workgroup.

BLOOM was trained using data of 46 natural languages and 13 programming languages. In total, 1.6 TeraByte pre-processed text was converted into 350 Billion unique tokens as BLOOM's training datasets. [3]

References

  1. "BigScience Large Open-science Open-access Multilingual Language Model". https://huggingface.co/bigscience/bloom. 
  2. "The Technology Behind BLOOM Training". https://huggingface.co/blog/bloom-megatron-deepspeed. 
  3. Teven Le Scao; Wang, Thomas; Hesslow, Daniel; Saulnier, Lucile; Bekman, Stas; M Saiful Bari; Biderman, Stella; Elsahar, Hady; Muennighoff, Niklas; Phang, Jason; Press, Ofir; Raffel, Colin; Sanh, Victor; Shen, Sheng; Sutawika, Lintang; Tae, Jaesung; Zheng Xin Yong; Launay, Julien; Beltagy, Iz (2022). "What Language Model to Train if You Have One Million GPU Hours?". arXiv:2210.15424.