Fine-tuning (deep learning)

Short description: Machine learning technique

In deep learning, fine-tuning is an approach to transfer learning in which the weights of a pre-trained model are trained on new data.^[1] Fine-tuning can be done on the entire neural network, or on only a subset of its layers, in which case the layers that are not being fine-tuned are "frozen" (not updated during the backpropagation step).^[2] A model may also be augmented with "adapters" that consist of far fewer parameters than the original model, and fine-tuned in a parameter–efficient way by tuning the weights of the adapters and leaving the rest of the model's weights frozen.^[3]

For some architectures, such as convolutional neural networks, it is common to keep the earlier layers (those closest to the input layer) frozen because they capture lower-level features, while later layers often discern high-level features that can be more related to the task that the model is trained on.^[2]^[4]

Models that are pre-trained on large and general corpora are usually fine-tuned by reusing the model's parameters as a starting point and adding a task-specific layer trained from scratch.^[5] Fine-tuning the full model is common as well and often yields better results, but it is more computationally expensive.^[6]

Fine-tuning is typically accomplished with supervised learning, but there are also techniques to fine-tune a model using weak supervision.^[7] Fine-tuning can be combined with a reinforcement learning from human feedback-based objective to produce language models like ChatGPT (a fine-tuned version of GPT-3) and Sparrow.^[8]^[9]

Robustness

Fine-tuning can degrade a model's robustness to distribution shifts.^[10]^[11] One mitigation is to linearly interpolate a fine-tuned model's weights with the weights of the original model, which can greatly increase out-of-distribution performance while largely retaining the in-distribution performance of the fine-tuned model.^[12]

Variants

Low-rank adaptation

Low-rank adaptation (LoRA) is an adapter-based technique for efficiently finetuning models. The basic idea is to design a low-rank matrix that is then added to the original matrix.^[13] An "adapter" in this context is a collection of low-rank matrices, which when added to a base model, produces a finetuned model. It allows for performance that approaches full-model fine-tuning with less space requirement. A language model with billions of parameters may be LoRA fine-tuned with only several millions of parameters.

LoRA-based fine-tuning has become popular in the Stable Diffusion community.^[14] Support for LoRA is being integrated into the Diffusers library from Hugging Face.^[15] Support for LoRA and similar techniques is also available for a wide range of other models through Hugging Face's Parameter-Efficient Fine-Tuning (PEFT) package.^[16]

Applications

Natural language processing

Fine-tuning is common in natural language processing (NLP), especially in the domain of language modeling. Large language models like OpenAI's series of GPT foundation models can be fine-tuned on data for specific downstream NLP tasks (tasks that use a pre-trained model) to improve performance over the unmodified pre-trained model.^[6]

Commercial models

Commercially-offered large language models can sometimes be fine-tuned if the provider offers a fine-tuning API. As of June 19, 2023, language model fine-tuning APIs are offered by OpenAI and Microsoft Azure's Azure OpenAI Service for a subset of their models, as well as by Google Cloud Platform for some of their PaLM models, and by others.^[17]^[18]^[19] Not all commercial models currently support fine-tuning.

References

↑ Quinn, Joanne (2020). Dive into deep learning: tools for engagement. Thousand Oaks, California. p. 551. ISBN 978-1-5443-6137-6. https://d2l.ai/chapter_computer-vision/fine-tuning.html#steps. Retrieved January 10, 2023.
↑ ^2.0 ^2.1 "CS231n Convolutional Neural Networks for Visual Recognition". https://cs231n.github.io/transfer-learning/.
↑ Liu, Haokun; Tam, Derek; Muqeeth, Mohammed; Mohta, Jay; Huang, Tenghao; Bansal, Mohit; Raffel, Colin A (2022). "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning". in Koyejo, S.; Mohamed, S.; Agarwal, A. et al.. Advances in Neural Information Processing Systems. 35. Curran Associates, Inc.. pp. 1950–1965. https://proceedings.neurips.cc/paper_files/paper/2022/file/0cde695b83bd186c1fd456302888454c-Paper-Conference.pdf.
↑ Zeiler, Matthew D; Fergus, Rob (2013). "Visualizing and Understanding Convolutional Networks". ECCV.
↑ Dodge, Jesse; Ilharco, Gabriel; Schwartz, Roy; Farhadi, Ali; Hajishirzi, Hannaneh; Smith, Noah (2020). Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping.
↑ ^6.0 ^6.1 Dingliwal, Saket; Shenoy, Ashish; Bodapati, Sravan; Gandhe, Ankur; Gadde, Ravi Teja; Kirchhoff, Katrin (2021). "Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems". InterSpeech.
↑ Yu, Yue; Zuo, Simiao; Jiang, Haoming; Ren, Wendi; Zhao, Tuo; Zhang, Chao (2020). Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach.
↑ "Introducing ChatGPT". https://openai.com/blog/chatgpt.
↑ Glaese, Amelia; McAleese, Nat; Trębacz, Maja; Aslanides, John; Firoiu, Vlad; Ewalds, Timo; Rauh, Maribeth; Weidinger, Laura et al. (2022). Improving alignment of dialogue agents via targeted human judgements.
↑ Radford, Alec; Kim, Jong Wook; Hallacy, Chris; Ramesh, Aditya; Goh, Gabriel; Agarwal, Sandhini; Sastry, Girish; Askell, Amanda; Mishkin, Pamela; Clark, Jack; Krueger, Gretchen; Sutskever, Ilya (2021). "Learning Transferable Visual Models From Natural Language Supervision". arXiv:2103.00020 [cs.CV].
↑ Kumar, Ananya; Raghunathan, Aditi; Jones, Robbie; Ma, Tengyu; Liang, Percy (2022). "Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution". ICLR.
↑ Wortsman, Mitchell; Ilharco, Gabriel; Kim, Jong Wook; Li, Mike; Kornblith, Simon; Roelofs, Rebecca; Gontijo-Lopes, Raphael; Hajishirzi, Hannaneh; Farhadi, Ali; Namkoong, Hongseok; Schmidt, Ludwig (2022). "Robust fine-tuning of zero-shot models". arXiv:2109.01903 [cs.CV].
↑ Hu, Edward J.; Shen, Yelong; Wallis, Phillip; Allen-Zhu, Zeyuan; Li, Yuanzhi; Wang, Shean; Wang, Lu; Chen, Weizhu (2022-01-28). "LoRA: Low-Rank Adaptation of Large Language Models" (in en). ICLR. https://openreview.net/forum?id=nZeVKeeFYf9.
↑ Ryu, Simo (February 13, 2023). "Using Low-rank adaptation to quickly fine-tune diffusion models". https://github.com/cloneofsimo/lora.
↑ Cuenca, Pedro; Paul, Sayak (January 26, 2023). "Using LoRA for Efficient Stable Diffusion Fine-Tuning". https://huggingface.co/blog/lora.
↑ "Parameter-Efficient Fine-Tuning using 🤗 PEFT". https://huggingface.co/blog/peft.
↑ "Fine-tuning". OpenAI. https://platform.openai.com/docs/guides/fine-tuning.
↑ "Learn how to customize a model for your application". Microsoft. https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/fine-tuning.
↑ "Tune text foundation models". https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Fine-tuning (deep learning). Read more

[d2l-1] Quinn, Joanne (2020). Dive into deep learning: tools for engagement. Thousand Oaks, California. p. 551. ISBN 978-1-5443-6137-6. https://d2l.ai/chapter_computer-vision/fine-tuning.html#steps. Retrieved January 10, 2023.

[cs231n-2] 2.0 ^2.1 "CS231n Convolutional Neural Networks for Visual Recognition". https://cs231n.github.io/transfer-learning/.

[3] Liu, Haokun; Tam, Derek; Muqeeth, Mohammed; Mohta, Jay; Huang, Tenghao; Bansal, Mohit; Raffel, Colin A (2022). "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning". in Koyejo, S.; Mohamed, S.; Agarwal, A. et al.. Advances in Neural Information Processing Systems. 35. Curran Associates, Inc.. pp. 1950–1965. https://proceedings.neurips.cc/paper_files/paper/2022/file/0cde695b83bd186c1fd456302888454c-Paper-Conference.pdf.

[4] Zeiler, Matthew D; Fergus, Rob (2013). "Visualizing and Understanding Convolutional Networks". ECCV.

[5] Dodge, Jesse; Ilharco, Gabriel; Schwartz, Roy; Farhadi, Ali; Hajishirzi, Hannaneh; Smith, Noah (2020). Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping.

[amazon-6] 6.0 ^6.1 Dingliwal, Saket; Shenoy, Ashish; Bodapati, Sravan; Gandhe, Ankur; Gadde, Ravi Teja; Kirchhoff, Katrin (2021). "Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems". InterSpeech.

[7] Yu, Yue; Zuo, Simiao; Jiang, Haoming; Ren, Wendi; Zhao, Tuo; Zhang, Chao (2020). Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach.

[8] "Introducing ChatGPT". https://openai.com/blog/chatgpt.

[9] Glaese, Amelia; McAleese, Nat; Trębacz, Maja; Aslanides, John; Firoiu, Vlad; Ewalds, Timo; Rauh, Maribeth; Weidinger, Laura et al. (2022). Improving alignment of dialogue agents via targeted human judgements.

[10] Radford, Alec; Kim, Jong Wook; Hallacy, Chris; Ramesh, Aditya; Goh, Gabriel; Agarwal, Sandhini; Sastry, Girish; Askell, Amanda; Mishkin, Pamela; Clark, Jack; Krueger, Gretchen; Sutskever, Ilya (2021). "Learning Transferable Visual Models From Natural Language Supervision". arXiv:2103.00020 [cs.CV].

[11] Kumar, Ananya; Raghunathan, Aditi; Jones, Robbie; Ma, Tengyu; Liang, Percy (2022). "Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution". ICLR.

[12] Wortsman, Mitchell; Ilharco, Gabriel; Kim, Jong Wook; Li, Mike; Kornblith, Simon; Roelofs, Rebecca; Gontijo-Lopes, Raphael; Hajishirzi, Hannaneh; Farhadi, Ali; Namkoong, Hongseok; Schmidt, Ludwig (2022). "Robust fine-tuning of zero-shot models". arXiv:2109.01903 [cs.CV].

[13] Hu, Edward J.; Shen, Yelong; Wallis, Phillip; Allen-Zhu, Zeyuan; Li, Yuanzhi; Wang, Shean; Wang, Lu; Chen, Weizhu (2022-01-28). "LoRA: Low-Rank Adaptation of Large Language Models" (in en). ICLR. https://openreview.net/forum?id=nZeVKeeFYf9.

[14] Ryu, Simo (February 13, 2023). "Using Low-rank adaptation to quickly fine-tune diffusion models". https://github.com/cloneofsimo/lora.

[15] Cuenca, Pedro; Paul, Sayak (January 26, 2023). "Using LoRA for Efficient Stable Diffusion Fine-Tuning". https://huggingface.co/blog/lora.

[16] "Parameter-Efficient Fine-Tuning using 🤗 PEFT". https://huggingface.co/blog/peft.

[17] "Fine-tuning". OpenAI. https://platform.openai.com/docs/guides/fine-tuning.

[18] "Learn how to customize a model for your application". Microsoft. https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/fine-tuning.

[19] "Tune text foundation models". https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

Anonymous

Search

Fine-tuning (deep learning)

Namespaces

More

Page actions

Contents

Robustness

Variants

Low-rank adaptation

Applications

Natural language processing

Commercial models

See also

References

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Fine-tuning (deep learning)

Robustness

Variants

Low-rank adaptation

Applications

Natural language processing

Commercial models

See also

References

Navigation

Wiki tools

Page tools

Other projects

Categories