Double descent

An example of the double descent phenomenon in a two-layer neural network: When the ratio of parameters to data points is increased, the test error falls first, then rises, then falls again.^[1] The vertical line marks the boundary between the underparametrized regime (more data points than parameters) and the overparameterized regime (more parameters than data points).

In statistics and machine learning, double descent is the phenomenon where a statistical model with a small number of parameters and a model with an extremely large number of parameters have a small error, but a model whose number of parameters is about the same as the number of data points used to train the model will have a large error.^[2]

History

Early observations of double descent in specific models date back to 1989,^[3]^[4] while the double descent phenomenon as a broader concept shared by many models gained popularity around 2019.^[5]^[6] The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant error (an extrapolation of bias-variance tradeoff),^[7] and the empirical observations in the 2010s that some modern machine learning models tend to perform better with larger models.^[5]^[8]

Modelling of double descent

A model of double descent at the thermodynamic limit has been analyzed by the replica method, and the result has been confirmed numerically.^[9] The scaling behavior of double descent has been found to follow a broken neural scaling law^[10] functional form.

References

↑ Schaeffer, Rylan; Khona, Mikail; Robertson, Zachary; Boopathy, Akhilan; Pistunova, Kateryna; Rocks, Jason W.; Fiete, Ila Rani; Koyejo, Oluwasanmi (2023-03-24). "Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle". arXiv:2303.14151v1 [cs.LG].
↑ "Deep Double Descent" (in en). 2019-12-05. https://openai.com/blog/deep-double-descent/.
↑ Vallet, F.; Cailton, J.-G.; Refregier, Ph (June 1989). "Linear and Nonlinear Extension of the Pseudo-Inverse Solution for Learning Boolean Functions" (in en). Europhysics Letters 9 (4): 315. doi:10.1209/0295-5075/9/4/003. ISSN 0295-5075. https://dx.doi.org/10.1209/0295-5075/9/4/003.
↑ Loog, Marco; Viering, Tom; Mey, Alexander; Krijthe, Jesse H.; Tax, David M. J. (2020-05-19). "A brief prehistory of double descent" (in en). Proceedings of the National Academy of Sciences 117 (20): 10625–10626. doi:10.1073/pnas.2001875117. ISSN 0027-8424. PMID 32371495. PMC 7245109. https://pnas.org/doi/full/10.1073/pnas.2001875117.
↑ ^5.0 ^5.1 Belkin, Mikhail; Hsu, Daniel; Ma, Siyuan; Mandal, Soumik (2019-08-06). "Reconciling modern machine learning practice and the bias-variance trade-off". Proceedings of the National Academy of Sciences 116 (32): 15849–15854. doi:10.1073/pnas.1903070116. ISSN 0027-8424. PMID 31341078.
↑ Viering, Tom; Loog, Marco (2023-06-01). "The Shape of Learning Curves: A Review". IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (6): 7799–7819. doi:10.1109/TPAMI.2022.3220744. ISSN 0162-8828. https://ieeexplore.ieee.org/document/9944190/.
↑ Eric (2023-01-10). "The bias-variance tradeoff is not a statistical concept" (in en). https://www.ericjwang.com/2023/01/10/bias-variance.html.
↑ Preetum Nakkiran; Gal Kaplun; Yamini Bansal; Tristan Yang; Boaz Barak; Ilya Sutskever (29 December 2021). "Deep double descent: where bigger models and more data hurt". Theory and Experiment (IOP Publishing Ltd and SISSA Medialab srl) 2021 (12): 124003. doi:10.1088/1742-5468/ac3a74. Bibcode: 2021JSMTE2021l4003N.
↑ Advani, Madhu S.; Saxe, Andrew M.; Sompolinsky, Haim (2020-12-01). "High-dimensional dynamics of generalization error in neural networks". Neural Networks 132: 428–446. doi:10.1016/j.neunet.2020.08.022. ISSN 0893-6080. PMC 7685244. https://www.sciencedirect.com/science/article/pii/S0893608020303117.
↑ Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). "Broken Neural Scaling Laws". International Conference on Learning Representations (ICLR), 2023.

External links

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Double descent. Read more

[1] Schaeffer, Rylan; Khona, Mikail; Robertson, Zachary; Boopathy, Akhilan; Pistunova, Kateryna; Rocks, Jason W.; Fiete, Ila Rani; Koyejo, Oluwasanmi (2023-03-24). "Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle". arXiv:2303.14151v1 [cs.LG].

[2] "Deep Double Descent" (in en). 2019-12-05. https://openai.com/blog/deep-double-descent/.

[3] Vallet, F.; Cailton, J.-G.; Refregier, Ph (June 1989). "Linear and Nonlinear Extension of the Pseudo-Inverse Solution for Learning Boolean Functions" (in en). Europhysics Letters 9 (4): 315. doi:10.1209/0295-5075/9/4/003. ISSN 0295-5075. https://dx.doi.org/10.1209/0295-5075/9/4/003.

[4] Loog, Marco; Viering, Tom; Mey, Alexander; Krijthe, Jesse H.; Tax, David M. J. (2020-05-19). "A brief prehistory of double descent" (in en). Proceedings of the National Academy of Sciences 117 (20): 10625–10626. doi:10.1073/pnas.2001875117. ISSN 0027-8424. PMID 32371495. PMC 7245109. https://pnas.org/doi/full/10.1073/pnas.2001875117.

[:0-5] 5.0 ^5.1 Belkin, Mikhail; Hsu, Daniel; Ma, Siyuan; Mandal, Soumik (2019-08-06). "Reconciling modern machine learning practice and the bias-variance trade-off". Proceedings of the National Academy of Sciences 116 (32): 15849–15854. doi:10.1073/pnas.1903070116. ISSN 0027-8424. PMID 31341078.

[6] Viering, Tom; Loog, Marco (2023-06-01). "The Shape of Learning Curves: A Review". IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (6): 7799–7819. doi:10.1109/TPAMI.2022.3220744. ISSN 0162-8828. https://ieeexplore.ieee.org/document/9944190/.

[7] Eric (2023-01-10). "The bias-variance tradeoff is not a statistical concept" (in en). https://www.ericjwang.com/2023/01/10/bias-variance.html.

[8] Preetum Nakkiran; Gal Kaplun; Yamini Bansal; Tristan Yang; Boaz Barak; Ilya Sutskever (29 December 2021). "Deep double descent: where bigger models and more data hurt". Theory and Experiment (IOP Publishing Ltd and SISSA Medialab srl) 2021 (12): 124003. doi:10.1088/1742-5468/ac3a74. Bibcode: 2021JSMTE2021l4003N.

[9] Advani, Madhu S.; Saxe, Andrew M.; Sompolinsky, Haim (2020-12-01). "High-dimensional dynamics of generalization error in neural networks". Neural Networks 132: 428–446. doi:10.1016/j.neunet.2020.08.022. ISSN 0893-6080. PMC 7685244. https://www.sciencedirect.com/science/article/pii/S0893608020303117.

[10] Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). "Broken Neural Scaling Laws". International Conference on Learning Representations (ICLR), 2023.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

Anonymous

Search

Double descent

Namespaces

More

Page actions

Contents

History

Modelling of double descent

References

Further reading

External links

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Double descent

History

Modelling of double descent

References

Further reading

External links

Navigation

Wiki tools

Page tools

Other projects

Categories