Software:SGLang

SGLang
	File:SGLang logo.png
Developer(s)	LMSYS
Initial release	January 17, 2024; 2 years ago
Repository	github.com/sgl-project/sglang
Written in	Python, Rust, CUDA, C++
Type	Large language model inference engine
License	Apache License 2.0
Website	sglang.io

SGLang (short for Structured Generation Language) is an open-source framework for programming and serving large language models and multimodal models. It was introduced by researchers affiliated with LMSYS^[1] and other institutions as a system combining a Python-embedded language for structured generation with a runtime for high-throughput inference.^[2]^[3]^[4]

The project is designed for low latency and high-throughput inference workloads, and its documentation describes support for features such as structured outputs, speculative decoding, continuous batching, quantization, and compatibility with OpenAI-style APIs.^[5]

History

SGLang was publicly introduced in January 2024 by researchers affiliated with Stanford, UC Berkeley, Texas A&M, and Shanghai Jiao Tong University.^[2] Its academic description later appeared in the proceedings of NeurIPS 2024.^[3] In January 2026, TechCrunch reported that contributors associated with the project had formed the startup RadixArk to commercialize services around SGLang while continuing its open-source development.^[6]^[7]

Architecture

According to the NeurIPS paper, SGLang consists of two main components: a front-end language embedded in Python and a back-end runtime for executing language model programs efficiently.^[3] The front end provides primitives for generation, selection, and parallel control flow, while the runtime uses a set of optimizations intended to reduce repeated computation and improve throughput.^[3]

Among the techniques described by the project are RadixAttention for reusing key–value cache state across multiple generation calls, compressed finite-state machines for faster constrained decoding, and speculative execution for API-based models.^[3] The current documentation also describes support for serving both language models and multimodal models across a range of hardware back ends.^[5]

References

↑ "LMSYS". GitHub, Inc.. https://github.com/lm-sys.
↑ ^2.0 ^2.1 "Fast and Expressive LLM Inference with RadixAttention and SGLang". January 17, 2024. https://www.lmsys.org/blog/2024-01-17-sglang/.
↑ ^3.0 ^3.1 ^3.2 ^3.3 ^3.4 Zheng, Lianmin; Yin, Liangsheng; Xie, Zhiqiang; Sun, Chuyue; Huang, Jeff; Yu, Cody Hao; Cao, Shiyi; Kozyrakis, Christos et al. (2024). "SGLang: Efficient Execution of Structured Language Model Programs". Advances in Neural Information Processing Systems 37. https://proceedings.neurips.cc/paper_files/paper/2024/file/724be4472168f31ba1c9ac630f15dec8-Paper-Conference.pdf. Retrieved April 19, 2026.
↑ "SGLang". April 25, 2024. https://sky.cs.berkeley.edu/project/sglang/.
↑ ^5.0 ^5.1 "SGLang Documentation". https://docs.sglang.io/.
↑ Hu, Krystal (January 21, 2026). "Sources: Project SGLang spins out as RadixArk with $400M valuation as inference market explodes". https://techcrunch.com/2026/01/21/sources-project-sglang-spins-out-as-radixark-with-400m-valuation-as-inference-market-explodes/.
↑ R, Vignesh (January 23, 2026). "From Berkeley lab to $400M startup: SGLang becomes RadixArk". https://techfundingnews.com/radixark-sglang-spinoff-400m-valuation-ai-inference/.

External links

Template:Large language models

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/SGLang. Read more

[1] "LMSYS". GitHub, Inc.. https://github.com/lm-sys.

[lmsys-launch-2] 2.0 ^2.1 "Fast and Expressive LLM Inference with RadixAttention and SGLang". January 17, 2024. https://www.lmsys.org/blog/2024-01-17-sglang/.

[neurips-3] 3.0 ^3.1 ^3.2 ^3.3 ^3.4 Zheng, Lianmin; Yin, Liangsheng; Xie, Zhiqiang; Sun, Chuyue; Huang, Jeff; Yu, Cody Hao; Cao, Shiyi; Kozyrakis, Christos et al. (2024). "SGLang: Efficient Execution of Structured Language Model Programs". Advances in Neural Information Processing Systems 37. https://proceedings.neurips.cc/paper_files/paper/2024/file/724be4472168f31ba1c9ac630f15dec8-Paper-Conference.pdf. Retrieved April 19, 2026.

[4] "SGLang". April 25, 2024. https://sky.cs.berkeley.edu/project/sglang/.

[docs-5] 5.0 ^5.1 "SGLang Documentation". https://docs.sglang.io/.

[techcrunch-6] Hu, Krystal (January 21, 2026). "Sources: Project SGLang spins out as RadixArk with $400M valuation as inference market explodes". https://techcrunch.com/2026/01/21/sources-project-sglang-spins-out-as-radixark-with-400m-valuation-as-inference-market-explodes/.

[7] R, Vignesh (January 23, 2026). "From Berkeley lab to $400M startup: SGLang becomes RadixArk". https://techfundingnews.com/radixark-sglang-spinoff-400m-valuation-ai-inference/.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Anonymous

Search

Software:SGLang

Namespaces

More

Page actions

Contents

History

Architecture

See also

References

External links

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Software:SGLang

History

Architecture

See also

References

External links

Navigation

Wiki tools

Page tools

Other projects

Categories