Company:Mistral AI

From HandWiki
Revision as of 20:02, 9 February 2024 by John Marlo (talk | contribs) (simplify)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Short description: French artificial intelligence company.
Mistral AI
TypePrivate
IndustryArtificial intelligence
Founded28 April 2023
FoundersArthur Mensch
Products
  • Mistral 7B
  • Mixtral 8x7B
  • Mistral Medium
Websitemistral.ai

Mistral AI is a French company in artificial intelligence. It was founded in April 2023 by researchers previously employed by Meta and Google: Arthur Mensch, Timothée Lacroix and Guillaume Lample.[1] It has raised 385 million euros, or about $415 million in October 2023.[2] In December 2023, it attained a valuation of more than $2 billion.[3][4][5]

It produces open large language models,[6] citing the foundational importance of open-source software, and as a response to proprietary models.[7]

As of December 2023, two models have been published, and are available as weights.[8] Another prototype (Mistral Medium) is available via API only.[9]

History

Mistral AI was co-founded in April 2023 by Arthur Mensch, Guillaume Lample and Timothée Lacroix. Prior to co-founding Mistral AI, Arthur Mensch worked at DeepMind, Google's artificial intelligence laboratory, while Guillaume Lample and Timothée Lacroix worked at Meta.[10]

In June 2023, the start-up carried out a first fundraising of 105 million euros (117 million US$) with investors including the American fund Lightspeed Venture Partners, Eric Schmidt, Xavier Niel and JCDecaux. The valuation is then estimated by the Financial Times at 240 million € (267 million US$).

On September 27, 2023, the company made its language processing model “Mistral 7B” available under the free Apache 2.0 license. This model has 7 billion parameters, a small size compared to its competitors.

On December 10, 2023, Mistral AI announced that it had raised 385 million € (428 million US$) as part of its second fundraising. This round of financing notably involves the Californian fund Andreessen Horowitz, BNP Paribas and the software publisher Salesforce.[11]

On December 11, 2023, the company released the “Mixtral 8x7B” model with 46.7 billion parameters but using only 12.9 billion per token thanks to the mixture of experts architecture. The model masters 5 languages (French, Spanish, Italian, English and German) and outperforms, according to its developers' tests, the "LLama 2 70B" model from Meta. A version trained to follow instructions and called “Mixtral 8x7B Instruct” is also offered.[12]

Models

Mistral 7B

Mistral 7B is a 7.3B parameter language model using the transformers architecture. Officially released on September 27, 2023 via a BitTorrent magnet link,[13] and Hugging Face.[14] The model was released under the Apache 2.0 license. The release blog post claimed the model outperforms LLaMA 2 13B on all benchmarks tested, and is on par with LLaMA 34B on many benchmarks tested.[15]

Mistral 7B uses a similar architecture to LLaMA, but with some changes to the attention mechanism. In particular it uses Grouped-query attention (GQA) intended for faster inference and Sliding Window Attention (SWA) intended to handle longer sequences.

Sliding Window Attention (SWA) reduces the computational cost and memory requirement for longer sequences. In sliding window attention, each token can only attend to a fixed number of tokens from the previous layer in a "sliding window" of 4096 tokens, with a total context length of 32768 tokens. At inference time, this reduces the cache availability, leading to higher latency and smaller throughput. To alleviate this issue, Mistral 7B uses a rolling buffer cache.

Mistral 7B uses grouped-query attention (GQA), which is a variant of the standard attention mechanism. Instead of computing attention over all the hidden states, it computes attention over groups of hidden states.[16]

Both a base model and "instruct" model were released with the later receiving additional tuning to follow chat-style prompts. The fine-tuned model is only intended for demonstration purposes, and does not have guardrails or moderation built-in.[15]

Mixtral 8x7B

Much like Mistral's first model, Mixtral 8x7B was released via BitTorrent on December 9, 2023,[6] and later Hugging Face and a blog post were released two days later.[12]

Unlike the previous Mistral model, Mixtral 8x7B uses a sparse mixture of experts architecture. The model has 8 distinct groups of "experts", giving the model a total of 46.7B usable parameters.[17][18] Each single token can only use 12.9B parameters, therefore giving the speed and cost that a 12.9B parameter model would incur.[12]

Mistral AI's testing shows the model beats both LLaMa 70B, and GPT-3.5 in most benchmarks.[19]

Mistral Medium

Unlike Mistral 7B and Mixtral 8x7B, Mistral Medium is a closed-source prototype only available through the Mistral API[20]. It is trained in various languages including English, French, Italian, German, Spanish and code with a score of 8.6 on MT-Bench[21]. It is Mistral's highest performing large language model, being ranked in performance above Claude and below GPT-4 on the LMSys ELO Arena benchmark [22].

The number of parameters, and architecture of Mistral Medium is not known as Mistral has not published public information about it.

References

  1. "France's unicorn start-up Mistral AI embodies its artificial intelligence hopes" (in en). Le Monde.fr. 2023-12-12. https://www.lemonde.fr/en/economy/article/2023/12/12/french-unicorn-start-up-mistral-ai-embodies-its-artificial-intelligence-hopes_6337125_19.html. 
  2. "Mistral, French A.I. Start-Up, Is Valued at $2 Billion in Funding Round". https://www.nytimes.com/2023/12/10/technology/mistral-ai-funding.html. 
  3. Fink, Charlie. "This Week In XR: Epic Triumphs Over Google, Mistral AI Raises $415 Million, $56.5 Million For Essential AI" (in en). https://www.forbes.com/sites/charliefink/2023/12/14/this-week-in-xr-epic-triumphs-over-google-mistral-ai-raises-415-million-565-million-for-essential-ai/. 
  4. "A French AI start-up may have commenced an AI revolution, silently". December 12, 2023. https://www.hindustantimes.com/business/a-french-ai-start-up-may-have-commenced-an-ai-revolution-silently-101702370816617.html. 
  5. "French AI start-up Mistral secures €2bn valuation". https://www.ft.com/content/ea29ddf8-91cb-45e8-86a0-f501ab7ad9bb. 
  6. 6.0 6.1 "Buzzy Startup Just Dumps AI Model That Beats GPT-3.5 Into a Torrent Link" (in en). 2023-12-12. https://gizmodo.com/mistral-artificial-intelligence-gpt-3-openai-1851091217. 
  7. "Bringing open AI models to the frontier" (in en-us). Mistral AI. 27 September 2023. https://mistral.ai/news/about-mistral-ai/. 
  8. "Open-weight models | Mistral AI Large Language Models" (in en). https://docs.mistral.ai/models/. 
  9. "Endpoints | Mistral AI Large Language Models" (in en). https://docs.mistral.ai/platform/endpoints/#medium. 
  10. https://www.lemonde.fr/en/economy/article/2023/12/12/french-unicorn-start-up-mistral-ai-embodies-its-artificial-intelligence-hopes_6337125_19.html
  11. https://www.lemondeinformatique.fr/actualites/lire-mistral-leve-385-meteuro-et-devient-une-licorne-francaise-92392.html
  12. 12.0 12.1 12.2 "Mixtral of experts" (in en-us). 2023-12-11. https://mistral.ai/news/mixtral-of-experts/. 
  13. Goldman, Sharon (2023-12-08). "Mistral AI bucks release trend by dropping torrent link to new open source LLM" (in en-US). https://venturebeat.com/ai/mistral-ai-bucks-release-trend-by-dropping-torrent-link-to-new-open-source-llm/. 
  14. Coldewey, Devin (27 September 2023). "Mistral AI makes its first large language model free for everyone". https://techcrunch.com/2023/09/27/mistral-ai-makes-its-first-large-language-model-free-for-everyone/. 
  15. 15.0 15.1 "Mistral 7B" (in en-us). Mistral AI. 27 September 2023. https://mistral.ai/news/announcing-mistral-7b/. 
  16. Jiang, Albert Q.; Sablayrolles, Alexandre; Mensch, Arthur; Bamford, Chris; Chaplot, Devendra Singh; Casas, Diego de las; Bressand, Florian; Lengyel, Gianna et al. (2023-10-10). "Mistral 7B" (in en). https://arxiv.org/abs/2310.06825v1. 
  17. "Mixture of Experts Explained". https://huggingface.co/blog/moe. 
  18. Marie, Benjamin (2023-12-15). "Mixtral-8x7B: Understanding and Running the Sparse Mixture of Experts" (in en). https://towardsdatascience.com/mixtral-8x7b-understanding-and-running-the-sparse-mixture-of-experts-0e3fc7fde818. 
  19. Franzen, Carl (2023-12-11). "Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance" (in en-US). https://venturebeat.com/ai/mistral-shocks-ai-community-as-latest-open-source-model-eclipses-gpt-3-5-performance/. 
  20. "Pricing and rate limits | Mistral AI Large Language Models" (in en). https://docs.mistral.ai/platform/pricing/. 
  21. AI, Mistral (2023-12-11). "La plateforme" (in en-us). https://mistral.ai/news/la-plateforme/. 
  22. "LMSys Chatbot Arena Leaderboard - a Hugging Face Space by lmsys". https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard.