Software:GPT-J

GPT-J
	Logo
Developer(s)	EleutherAI
Initial release	June 9, 2021; 4 years ago
Type	Large language model; Generative pre-trained transformer; Foundation model;
License	Apache License 2.0

GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021.^[1] As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.^[2] The model is available on GitHub, but the web interface no longer communicates with the model. Development stopped in 2021.^[3]

Architecture

GPT-J is a GPT-3-like model with 6 billion parameters.^[4] Like GPT-3, it is an autoregressive, decoder-only transformer model designed to solve natural language processing (NLP) tasks by predicting how a piece of text will continue.^[1]

Its architecture differs from GPT-3 in three main ways.^[1]

The attention and feedforward neural network were computed in parallel during training, allowing for greater efficiency.
The GPT-J model uses rotary position embeddings, which has been found to be a superior method of injecting positional information into transformers.^[5]^[6]
GPT-J uses dense attention instead of efficient sparse attention, as used in GPT-3.

Beyond that, the model has 28 transformer layers and 16 attention heads. Its vocabulary size is 50257 tokens, the same size as GPT-2's.^[2] It has a context window size of 2048 tokens.^[7]

It was trained on the Pile dataset,^[2]^[4] using the Mesh Transformer JAX library in JAX to handle the parallelization scheme.^[2]^[8]

Performance

GPT-J was designed to generate English text from a prompt. It was not designed for translating or generating text in other languages or for performance without first fine-tuning the model for a specific task.^[2] Nonetheless, GPT-J performs reasonably well even without fine-tuning, even in translation (at least from English to French).^[9]

When neither is fine-tuned, GPT-J-6B performs almost as well as the 6.7 billion parameter GPT-3 (Curie) on a variety of tasks.^[4] It even outperforms the 175 billion parameter GPT-3 (Davinci) on code generation tasks.^[10] With fine-tuning, it outperforms an untuned GPT-3 (Davinci) on a number of tasks.^[1]

Like all LLMs, it is not programmed to give factually accurate information, only to generate text based on probability.^[2]

Applications

The untuned GPT-J is available on EleutherAI's website,^[11] NVIDIA's Triton Inference Server,^[12] and NLP Cloud's website.^[13] Cerebras^[1] and Amazon Web Services^[14]^[15] offer services to fine-tune the GPT-J model for company-specific tasks. Graphcore offers both fine-tuning and hosting services for the untuned GPT-J, as well as offering to host the fine-tuned models after they are produced.^[16] CoreWeave offers hosting services for both the untuned GPT-J and fine-tuned variants.^[17]^[18]

In March 2023, Databricks released Dolly, an Apache-licensed, instruction-following model created by fine-tuning GPT-J on the Stanford Alpaca dataset.^[19] NovelAI's Sigurd^[20] and Genji-JP 6B^[21] models are both fine-tuned versions of GPT-J. They also offer further fine-tuning services to produce and host custom models.^[22]

EleutherAI has received praise from Cerebras,^[1] GPT-3 Demo,^[4] NLP Cloud,^[13] and Databricks^[19] for making the model open-source, and its open-source status is often cited as a major advantage when choosing which model to use.^[10]^[16]^[23]

References

↑ ^1.0 ^1.1 ^1.2 ^1.3 ^1.4 ^1.5 Vassilieva, Natalia (22 June 2022). "Cerebras Makes It Easy to Harness the Predictive Power of GPT-J". Cerebras. https://www.cerebras.net/blog/cerebras-makes-it-easy-to-harness-the-predictive-power-of-gpt-j.
↑ ^2.0 ^2.1 ^2.2 ^2.3 ^2.4 ^2.5 "GPT-J 6B". Hugging Face. 3 May 2023. https://huggingface.co/EleutherAI/gpt-j-6b.
↑ Wang, Ben (2025-01-25), kingoflolz/mesh-transformer-jax, https://github.com/kingoflolz/mesh-transformer-jax/, retrieved 2025-01-27
↑ ^4.0 ^4.1 ^4.2 ^4.3 "GPT-J". GPT-3 Demo. https://gpt3demo.com/apps/gpt-j-6b.
↑ Biderman, Stella; Black, Sid; Foster, Charles; Gao, Leo; Hallahan, Eric; He, Horace; Wang, Ben; Wang, Phil (20 April 2021). "Rotary Embeddings: A Relative Revolution". EleutherAI. https://blog.eleuther.ai/rotary-embeddings/. "In general we have found that across a large suite of setups including regular, linear, and local self-attention, it either matches or surpasses all other methods currently available for injecting positional information into transformers."
↑ Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (9 August 2022). "RoFormer: Enhanced Transformer with Rotary Position Embedding". arXiv:2104.09864 [cs.CL].
↑ "GPT-J". Hugging Face. https://huggingface.co/docs/transformers/model_doc/gptj.
↑ Wang, Ben; Komatsuzaki, Aran (May 2021). "Mesh Transformer JAX". https://github.com/kingoflolz/mesh-transformer-jax.
↑ Forefront (14 October 2021). "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". Forefront. https://forefrontai.medium.com/gpt-j-6b-an-introduction-to-the-largest-open-source-gpt-model-forefront-6962eccdfee1.
↑ ^10.0 ^10.1 "GPT-J Reviews". https://slashdot.org/software/p/GPT-J/.
↑ "Test the EAI models". 2021. https://6b.eleuther.ai/.
↑ Timonin, Denis; Hsueh, Bo Yang; Singal, Dhruv; Nguyen, Vinh (3 August 2022). "Deploying GPT-J and T5 with NVIDIA Triton Inference Server". https://developer.nvidia.com/blog/deploying-gpt-j-and-t5-with-fastertransformer-and-triton-inference-server/.
↑ ^13.0 ^13.1 Vettier, Pauline (16 September 2021). "NLP Cloud now supports GPT-J, the open-source GPT-3 alternative" (Press release). Grenoble, France: NLP Cloud. Retrieved 30 June 2023.
↑ Awrahman, Zmnako; Tsitiridou, Anastasia Pachni; Patel, Dhawalkumar; Huilgol, Rahul; Bains, Roop; Stobieniecka, Wioletta (12 June 2023). "Fine-tune GPT-J using an Amazon SageMaker Hugging Face estimator and the model parallel library". https://aws.amazon.com/blogs/machine-learning/fine-tune-gpt-j-using-an-amazon-sagemaker-hugging-face-estimator-and-the-model-parallel-library/.
↑ Schmid, Philipp (11 January 2022). "Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker". https://huggingface.co/blog/gptj-sagemaker.
↑ ^16.0 ^16.1 Liguori, Sofia (9 June 2023). "Fine-Tune GPT-J: A Cost-Effective GPT-4 Alternative for Many NLP Tasks". Graphcore. https://www.graphcore.ai/posts/fine-tuned-gpt-j-a-cost-effective-alternative-to-gpt-4-for-nlp-tasks.
↑ "GPT-J-6B". 23 June 2023. https://docs.coreweave.com/coreweave-machine-learning-and-ai/how-to-guides-and-tutorials/examples/one-click-model-guides/gpt-j-6b.
↑ Hjelm, Max. "CoreWeave Powers a World of Possibility with GPT-J". https://www.coreweave.com/blog/coreweave-powers-a-world-of-possibility-with-gpt-j.
↑ ^19.0 ^19.1 Conover, Mike; Hayes, Matt; Mathur, Ankit; Meng, Xiangrui; Xie, Jianwei; Wan, Jun; Ghodsi, Ali; Wendell, Patrick et al. (24 March 2023). "Hello Dolly: Democratizing the magic of ChatGPT with open models". https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html.
↑ NovelAI (9 May 2022). "The faces of NovelAI's AI Models: Part 1". https://blog.novelai.net/the-faces-of-novelais-ai-models-part-1-6c93576fa48b.
↑ NovelAI (3 November 2021). "Data Efficient Language Transfer with GPT-J". https://blog.novelai.net/data-efficient-language-transfer-with-gpt-j-45daedaaf35a.
↑ NovelAI (29 July 2021). "Introducing Custom AI Modules". https://blog.novelai.net/custom-ai-modules-dbc527d66081.
↑ Shiraly, Karthik (26 February 2023). "See GPT-J vs. GPT-3 Go Head-to-Head on Popular Language Tasks". Width.ai. https://www.width.ai/post/gpt-j-vs-gpt-3.

Template:Artificial intelligence navbox

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/GPT-J. Read more

[Cerebras-1] 1.0 ^1.1 ^1.2 ^1.3 ^1.4 ^1.5 Vassilieva, Natalia (22 June 2022). "Cerebras Makes It Easy to Harness the Predictive Power of GPT-J". Cerebras. https://www.cerebras.net/blog/cerebras-makes-it-easy-to-harness-the-predictive-power-of-gpt-j.

[Model_Card-2] 2.0 ^2.1 ^2.2 ^2.3 ^2.4 ^2.5 "GPT-J 6B". Hugging Face. 3 May 2023. https://huggingface.co/EleutherAI/gpt-j-6b.

[3] Wang, Ben (2025-01-25), kingoflolz/mesh-transformer-jax, https://github.com/kingoflolz/mesh-transformer-jax/, retrieved 2025-01-27

[GPT-3_Demo-4] 4.0 ^4.1 ^4.2 ^4.3 "GPT-J". GPT-3 Demo. https://gpt3demo.com/apps/gpt-j-6b.

[5] Biderman, Stella; Black, Sid; Foster, Charles; Gao, Leo; Hallahan, Eric; He, Horace; Wang, Ben; Wang, Phil (20 April 2021). "Rotary Embeddings: A Relative Revolution". EleutherAI. https://blog.eleuther.ai/rotary-embeddings/. "In general we have found that across a large suite of setups including regular, linear, and local self-attention, it either matches or surpasses all other methods currently available for injecting positional information into transformers."

[6] Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (9 August 2022). "RoFormer: Enhanced Transformer with Rotary Position Embedding". arXiv:2104.09864 [cs.CL].

[7] "GPT-J". Hugging Face. https://huggingface.co/docs/transformers/model_doc/gptj.

[8] Wang, Ben; Komatsuzaki, Aran (May 2021). "Mesh Transformer JAX". https://github.com/kingoflolz/mesh-transformer-jax.

[Medium-9] Forefront (14 October 2021). "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". Forefront. https://forefrontai.medium.com/gpt-j-6b-an-introduction-to-the-largest-open-source-gpt-model-forefront-6962eccdfee1.

[Slashdot-10] 10.0 ^10.1 "GPT-J Reviews". https://slashdot.org/software/p/GPT-J/.

[11] "Test the EAI models". 2021. https://6b.eleuther.ai/.

[12] Timonin, Denis; Hsueh, Bo Yang; Singal, Dhruv; Nguyen, Vinh (3 August 2022). "Deploying GPT-J and T5 with NVIDIA Triton Inference Server". https://developer.nvidia.com/blog/deploying-gpt-j-and-t5-with-fastertransformer-and-triton-inference-server/.

[NLP_Cloud-13] 13.0 ^13.1 Vettier, Pauline (16 September 2021). "NLP Cloud now supports GPT-J, the open-source GPT-3 alternative" (Press release). Grenoble, France: NLP Cloud. Retrieved 30 June 2023.

[14] Awrahman, Zmnako; Tsitiridou, Anastasia Pachni; Patel, Dhawalkumar; Huilgol, Rahul; Bains, Roop; Stobieniecka, Wioletta (12 June 2023). "Fine-tune GPT-J using an Amazon SageMaker Hugging Face estimator and the model parallel library". https://aws.amazon.com/blogs/machine-learning/fine-tune-gpt-j-using-an-amazon-sagemaker-hugging-face-estimator-and-the-model-parallel-library/.

[15] Schmid, Philipp (11 January 2022). "Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker". https://huggingface.co/blog/gptj-sagemaker.

[Graphcore-16] 16.0 ^16.1 Liguori, Sofia (9 June 2023). "Fine-Tune GPT-J: A Cost-Effective GPT-4 Alternative for Many NLP Tasks". Graphcore. https://www.graphcore.ai/posts/fine-tuned-gpt-j-a-cost-effective-alternative-to-gpt-4-for-nlp-tasks.

[17] "GPT-J-6B". 23 June 2023. https://docs.coreweave.com/coreweave-machine-learning-and-ai/how-to-guides-and-tutorials/examples/one-click-model-guides/gpt-j-6b.

[18] Hjelm, Max. "CoreWeave Powers a World of Possibility with GPT-J". https://www.coreweave.com/blog/coreweave-powers-a-world-of-possibility-with-gpt-j.

[Databricks-19] 19.0 ^19.1 Conover, Mike; Hayes, Matt; Mathur, Ankit; Meng, Xiangrui; Xie, Jianwei; Wan, Jun; Ghodsi, Ali; Wendell, Patrick et al. (24 March 2023). "Hello Dolly: Democratizing the magic of ChatGPT with open models". https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html.

[20] NovelAI (9 May 2022). "The faces of NovelAI's AI Models: Part 1". https://blog.novelai.net/the-faces-of-novelais-ai-models-part-1-6c93576fa48b.

[21] NovelAI (3 November 2021). "Data Efficient Language Transfer with GPT-J". https://blog.novelai.net/data-efficient-language-transfer-with-gpt-j-45daedaaf35a.

[22] NovelAI (29 July 2021). "Introducing Custom AI Modules". https://blog.novelai.net/custom-ai-modules-dbc527d66081.

[23] Shiraly, Karthik (26 February 2023). "See GPT-J vs. GPT-3 Go Head-to-Head on Popular Language Tasks". Width.ai. https://www.width.ai/post/gpt-j-vs-gpt-3.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

Anonymous

Search

Software:GPT-J

Namespaces

More

Page actions

Contents

Architecture

Performance

Applications

References

Navigation

Navigation

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Software:GPT-J

Architecture

Performance

Applications

References

Navigation

Wiki tools

Page tools

Other projects

Categories