Organization:Foundation model

From HandWiki
Revision as of 00:10, 7 February 2024 by CodeMe (talk | contribs) (correction)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Short description: Artificial intelligence model paradigm

A foundation model is an AI model that is trained on broad data such that it can be applied across a wide range of use cases.[1] Foundation models have transformed AI, powering prominent chatbots and generative AI.[1] The Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) created and popularized the term.[2]

Foundation models are general-purpose technologies that can support a diverse range of use cases. Building foundation models is often highly resource-intensive, with the most expensive models costing hundreds of millions of dollars to pay for the underlying data and compute.[3] Early examples of foundation models were pre-trained language models (LMs) like Google's BERT[4] and OpenAI's "GPT-n" series. Beyond text, foundation models have been developed across a range of modalities—including DALL-E and Flamingo[5] for images, MusicGen[6] for music, and RT-2[7] for robotic control. Foundation models constitute a broad shift in AI development: foundation models are being built for astronomy,[8] radiology,[9] robotics,[10] genomics,[11] music,[12] coding,[13] and mathematics.[14]

Definitions

The Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) coined the term "foundation model" in August 2021 to mean "any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks".[15] This was based on their observation that preexisting terms, while overlapping, were not adequate, stating that "'(large) language model' was too narrow given [the] focus is not only language; 'self-supervised model' was too specific to the training objective; and 'pretrained model' suggested that the noteworthy action all happened after 'pretraining."[16] After considering many terms, they settled on "foundation model" to emphasize the intended function (i.e., amenability to subsequent further development) rather than modality, architecture, or implementation. The term “foundation model” was chosen over “foundational model”[17] because “foundational” implies that these models provide fundamental principles in a way that “foundation” does not.[18]

As governments regulate foundation models, new legal definitions have emerged.

  • In the United States, the Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence defines a foundation model as “an AI model that is trained on broad data; generally uses self-supervision; contains at least tens of billions of parameters; is applicable across a wide range of contexts”.
  • In the European Union, the European Parliament’s negotiated position on the E.U. AI Act defines a foundation model as an “AI model that is trained on broad data at scale, is designed for generality of output, and can be adapted to a wide range of distinctive tasks”.
  • In the United Kingdom, the Competition and Markets Authority’s AI Foundation Models: Initial Report [1] defines a foundation model as “a type of AI technology that are trained on vast amounts of data that can be adapted to a wide range of tasks and operations.”

Overall, while many of these definitions stick close to the original Stanford CRFM definition, they do introduce some subtle distinctions. For example, the U.S. definition is the sole definition to make reference to the size of a foundation model. In contrast, the E.U. definition includes mention of whether the model is designed for generality of output. Nonetheless, all definitions share that foundation models must be trained on a broad range of data with potential applications in many domains.

Personalizing foundation models

Since foundation models are pre-trained on a massive dataset, they are not capable of handling specific "personal" concepts that a user may be interested in. A series of methods were designed to augment a foundation model with personal, specific items without retraining the full model. For example, for few-shot image retrieval it was shown how to adapt a vision-language foundation model (CLIP) by adding new concept to its vocabulary.[19] For text-to-image generation, an approach called textual inversion[20] can be similarly used to teach the system new concept that can later be generated in conjunction with the concepts that the foundation model is already familiar with.

Opportunities and risks

A 2021 arXiv report listed foundation models' capabilities in regards to "language, vision, robotics, reasoning, and human interaction", technical principles, such as "model architectures, training procedures, data, systems, security, evaluation, and theory", their applications, for example in law, healthcare, and education and their potential impact on society, including "inequity, misuse, economic and environmental impact, legal and ethical considerations".[15]

An article about foundation models in The Economist notes that "some worry that the technology's heedless spread will further concentrate economic and political power".[21]

References

  1. 1.0 1.1 1.2 Competition and Markets Authority (2023). AI Foundation Models: Initial Report. Available at: https://assets.publishing.service.gov.uk/media/65081d3aa41cc300145612c0/Full_report_.pdf
  2. "Introducing the Center for Research on Foundation Models (CRFM)". Stanford HAI. 18 August 2021. https://hai.stanford.edu/news/introducing-center-research-foundation-models-crfm. 
  3. Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark, and Raymond Perrault, “The AI Index 2023 Annual Report,” AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2023.
  4. Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What we know about how BERT works". arXiv:2002.12327 [cs.CL].
  5. Tackling multiple tasks with a single visual language model, 28 April 2022, https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model, retrieved 13 June 2022 
  6. Copet, Jade; Kreuk, Felix; Gat, Itai; Remez, Tal; Kant, David; Synnaeve, Gabriel; Adi, Yossi; Défossez, Alexandre (2023-11-07). "Simple and Controllable Music Generation". arXiv:2306.05284 [cs.SD].
  7. "Speaking robot: Our new AI model translates vision and language into robotic actions" (in en-us). 2023-07-28. https://blog.google/technology/ai/google-deepmind-rt2-robotics-vla-model/. 
  8. Nguyen, Tuan Dung; Ting, Yuan-Sen; Ciucă, Ioana; O'Neill, Charlie; Sun, Ze-Chang; Jabłońska, Maja; Kruk, Sandor; Perkowski, Ernest; Miller, Jack (2023-09-12). "AstroLLaMA: Towards Specialized Foundation Models in Astronomy". arXiv:2309.06126 [astro-ph.IM].
  9. Tu, Tao; Azizi, Shekoofeh; Driess, Danny; Schaekermann, Mike; Amin, Mohamed; Chang, Pi-Chuan; Carroll, Andrew; Lau, Chuck; Tanno, Ryutaro (2023-07-26). "Towards Generalist Biomedical AI". arXiv:2307.14334 [cs.CL].
  10. Ahn, Michael; Brohan, Anthony; Brown, Noah; Chebotar, Yevgen; Cortes, Omar; David, Byron; Finn, Chelsea; Fu, Chuyuan; Gopalakrishnan, Keerthana (2022-08-16). "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances". arXiv:2204.01691 [cs.RO].
  11. Zvyagin, Maxim; Brace, Alexander; Hippe, Kyle; Deng, Yuntian; Zhang, Bin; Bohorquez, Cindy Orozco; Clyde, Austin; Kale, Bharat; Perez-Rivera, Danilo (2022-10-11). "GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics". bioRxiv 10.1101/2022.10.10.511571.
  12. Engineering, Spotify (2023-10-13). "LLark: A Multimodal Foundation Model for Music" (in en-US). https://research.atspotify.com/2023/10/llark-a-multimodal-foundation-model-for-music/. 
  13. Li, Raymond; Allal, Loubna Ben; Zi, Yangtian; Muennighoff, Niklas; Kocetkov, Denis; Mou, Chenghao; Marone, Marc; Akiki, Christopher; Li, Jia (2023-05-09). "StarCoder: may the source be with you!". arXiv:2305.06161 [cs.CL].
  14. Azerbayev, Zhangir; Schoelkopf, Hailey; Paster, Keiran; Santos, Marco Dos; McAleer, Stephen; Jiang, Albert Q.; Deng, Jia; Biderman, Stella; Welleck, Sean (2023-11-30). "Llemma: An Open Language Model For Mathematics". arXiv:2310.10631 [cs.CL].
  15. 15.0 15.1 Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette et al. (18 August 2021). On the Opportunities and Risks of Foundation Models (Report). 
  16. "Reflections on Foundation Models". 18 October 2021. https://hai.stanford.edu/news/reflections-foundation-models. 
  17. Bommasani, Rishi; Liang, Percy (2021-10-18). "Reflections on Foundation Models". https://crfm.stanford.edu/2021/10/18/reflections.html. 
  18. Marcus, Gary (2021-09-11). "Has AI found a new Foundation?" (in en). https://thegradient.pub/has-ai-found-a-new-foundation/. 
  19. Cohen, Niv; Gal, Rinon; Meirom, Eli A.; Chechik, Gal; Atzmon, Yuval (2022-10-23). ""This is My Unicorn, Fluffy": Personalizing Frozen Vision-Language Representations". Computer Vision – ECCV 2022. Lecture Notes in Computer Science. 13680. Berlin, Heidelberg: Springer-Verlag. pp. 558–577. doi:10.1007/978-3-031-20044-1_32. ISBN 978-3-031-20043-4. https://doi.org/10.1007/978-3-031-20044-1_32. 
  20. Gal, Rinon; Alaluf, Yuval; Atzmon, Yuval; Patashnik, Or; Bermano, Amit H.; Chechik, Gal; Cohen-Or, Daniel (2022-08-02). "An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion". arXiv:2208.01618 [cs.CV].
  21. "Huge "foundation models" are turbo-charging AI progress". The Economist. ISSN 0013-0613. https://www.economist.com/interactive/briefing/2022/06/11/huge-foundation-models-are-turbo-charging-ai-progress.