Lyra (codec)
Filename extension | .lyra |
---|---|
Developed by | |
Initial release | 2021 |
Latest release | 1.3.2 (December 20, 2022 ) |
Type of format | speech codec |
Open format? | Yes (Apache-2.0) |
Lyra is a lossy audio codec developed by Google that is designed for compressing speech at very low bitrates. Unlike most other audio formats, it compresses data using a machine learning-based algorithm.
Features
The Lyra codec is designed to transmit speech in real-time when bandwidth is severely restricted, such as over slow or unreliable network connections.[1] It runs at fixed bitrates of 3.2, 6, and 9 kbit/s and it is intended to provide better quality than codecs that use traditional waveform-based algorithms at similar bitrates.[2][3] Instead, compression is achieved via a machine learning algorithm that encodes the input with feature extraction, and then reconstructs an approximation of the original using a generative model.[1] This model was trained on thousands of hours of speech recorded in over 70 languages to function with various speakers.[2] Because generative models are more computationally complex than traditional codecs, a simple model that processes different frequency ranges in parallel is used to obtain acceptable performance.[4] Lyra imposes 20 ms of latency due to its frame size.[3] Google's reference implementation is available for Android and Linux.[4]
Quality
Lyra's initial version performed significantly better than traditional codecs at similar bitrates.[1][4][5] Ian Buckley at MakeUseOf said, "It succeeds in creating almost eerie levels of audio reproduction with bitrates as low as 3 kbps." Google claims that it reproduces natural-sounding speech, and that Lyra at 3 kbit/s beats Opus at 8 kbit/s.[2] Tsahi Levent-Levi writes that Satin, Microsoft's AI-based codec, outperforms it at higher bitrates.[5]
History
In December 2017, Google researchers published a preprint paper on replacing the Codec 2 decoder with a WaveNet neural network. They found that a neural network is able to extrapolate features of the voice not described in the Codec 2 bitstream and give better audio quality, and that the use of conventional features makes the neural network calculation simpler compared to a purely waveform-based network. Lyra version 1 would reuse this overall framework of feature extraction, quantization, and neural synthesis.[6]
Lyra was first announced in February 2021,[2] and in April, Google released the source code of their reference implementation.[1] The initial version had a fixed bitrate of 3 kbit/s and around 90 ms latency.[1][2] The encoder calculates a log mel spectrogram and performs vector quantization to store the spectrogram in a data stream. The decoder is a WaveNet neural network that takes the spectrogram and reconstructs the input audio.[2]
A second version (v2/1.2.0), released in September 2022, improved sound quality, latency, and performance, and permitted multiple bitrates. V2 uses a "SoundStream" structure where both the encoder and decoder are neural networks, a kind of autoencoder. A residual vector quantizer is used to turn the feature values into transferrable data.[3]
Support
Implementations
Google's implementation is available on GitHub under the Apache License.[1][7] Written in C++, it is optimized for 64-bit ARM but also runs on x86, on either Android or Linux.[4]
Applications
Google Duo uses Lyra to transmit sound for video chats when bandwidth is limited.[1][5]
References
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 Buckley, Ian (2021-04-08). "Google Makes Its Lyra Low Bitrate Speech Codec Public" (in en-US). https://www.makeuseof.com/google-lyra-speech-codec-public/.
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 "Lyra: A New Very Low-Bitrate Codec for Speech Compression" (in en). 25 February 2021. http://ai.googleblog.com/2021/02/lyra-new-very-low-bitrate-codec-for.html.
- ↑ 3.0 3.1 3.2 "Lyra V2 - a better, faster, and more versatile speech codec". https://opensource.googleblog.com/2022/09/lyra-v2-a-better-faster-and-more-versatile-speech-codec.html.
- ↑ 4.0 4.1 4.2 4.3 "Google Duo uses a new codec for better call quality over poor connections" (in en-US). 2021-04-09. https://www.xda-developers.com/google-duo-lyra-codec-better-call-quality/.
- ↑ 5.0 5.1 5.2 Levent-Levi, Tsahi (2021-04-19). "Lyra, Satin and the future of voice codecs in WebRTC" (in en-US). https://bloggeek.me/lyra-satin-webrtc-voice-codecs/.
- ↑ Kleijn, W. B.; Lim, F. S.; Luebs, A.; Skoglund, J.; Stimberg, F.; Wang, Q.; Walters, T. C. (April 2018). "Wavenet based low rate speech coding". 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE. pp. 676–680. https://arxiv.org/abs/1712.01120.
- ↑ ((Google)) (2021). "Lyra: A Very Low-Bitrate Codec for Speech Compression". https://github.com/google/lyra.
External links
- Lyra: A New Very Low-Bitrate Codec for Speech Compression Google blog post with a demonstration comparing codecs
See also
- Satin (codec), an AI-based codec developed by Microsoft
- Comparison of audio coding formats
- Speech coding
- Videotelephony
Original source: https://en.wikipedia.org/wiki/Lyra (codec).
Read more |