Software:SGLang
| Developer(s) | LMSYS |
|---|---|
| Initial release | January 17, 2024 |
| Repository | github |
| Written in | Python, Rust, CUDA, C++ |
| Type | Large language model inference engine |
| License | Apache License 2.0 |
| Website | sglang |
SGLang (short for Structured Generation Language) is an open-source framework for programming and serving large language models and multimodal models. It was introduced by researchers affiliated with LMSYS[1] and other institutions as a system combining a Python-embedded language for structured generation with a runtime for high-throughput inference.[2][3][4]
The project is designed for low latency and high-throughput inference workloads, and its documentation describes support for features such as structured outputs, speculative decoding, continuous batching, quantization, and compatibility with OpenAI-style APIs.[5]
History
SGLang was publicly introduced in January 2024 by researchers affiliated with Stanford, UC Berkeley, Texas A&M, and Shanghai Jiao Tong University.[2] Its academic description later appeared in the proceedings of NeurIPS 2024.[3] In January 2026, TechCrunch reported that contributors associated with the project had formed the startup RadixArk to commercialize services around SGLang while continuing its open-source development.[6][7]
Architecture
According to the NeurIPS paper, SGLang consists of two main components: a front-end language embedded in Python and a back-end runtime for executing language model programs efficiently.[3] The front end provides primitives for generation, selection, and parallel control flow, while the runtime uses a set of optimizations intended to reduce repeated computation and improve throughput.[3]
Among the techniques described by the project are RadixAttention for reusing key–value cache state across multiple generation calls, compressed finite-state machines for faster constrained decoding, and speculative execution for API-based models.[3] The current documentation also describes support for serving both language models and multimodal models across a range of hardware back ends.[5]
See also
- Lists of open-source artificial intelligence software
- List of software developed at universities
- llama.cpp
- OpenVINO
- Open Neural Network Exchange
- TensorRT-LLM
- vLLM
- Comparison of deep learning software
- Comparison of machine learning software
References
- ↑ "LMSYS". GitHub, Inc.. https://github.com/lm-sys.
- ↑ 2.0 2.1 "Fast and Expressive LLM Inference with RadixAttention and SGLang". January 17, 2024. https://www.lmsys.org/blog/2024-01-17-sglang/.
- ↑ 3.0 3.1 3.2 3.3 3.4 Zheng, Lianmin; Yin, Liangsheng; Xie, Zhiqiang; Sun, Chuyue; Huang, Jeff; Yu, Cody Hao; Cao, Shiyi; Kozyrakis, Christos et al. (2024). "SGLang: Efficient Execution of Structured Language Model Programs". Advances in Neural Information Processing Systems 37. https://proceedings.neurips.cc/paper_files/paper/2024/file/724be4472168f31ba1c9ac630f15dec8-Paper-Conference.pdf. Retrieved April 19, 2026.
- ↑ "SGLang". April 25, 2024. https://sky.cs.berkeley.edu/project/sglang/.
- ↑ 5.0 5.1 "SGLang Documentation". https://docs.sglang.io/.
- ↑ Hu, Krystal (January 21, 2026). "Sources: Project SGLang spins out as RadixArk with $400M valuation as inference market explodes". https://techcrunch.com/2026/01/21/sources-project-sglang-spins-out-as-radixark-with-400m-valuation-as-inference-market-explodes/.
- ↑ R, Vignesh (January 23, 2026). "From Berkeley lab to $400M startup: SGLang becomes RadixArk". https://techfundingnews.com/radixark-sglang-spinoff-400m-valuation-ai-inference/.
External links
Template:Large language models
