Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

📄 Abstract - Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

Autoregressive "language" models (LMs) trained on raw waveforms can be repurposed for lossless audio compression, but prior work is limited to 8-bit audio, leaving open whether such approaches work for practical settings (16/24-bit) and can compete with existing codecs. We benchmark LM-based compression on full-fidelity audio across diverse domains (music, speech, bioacoustics), sampling rates (16kHz-48kHz), and bit depths (8, 16, 24-bit). Standard sample-level tokenization becomes intractable at higher bit depths due to vocabulary size (65K for 16-bit; 16.7M for 24-bit). We propose Trilobyte, a byte-level tokenization schema for full resolution audio, improving vocabulary scaling from $O(2^{b})$ to $O(1)$ and enabling the first tractable 24-bit LM-based lossless compression. While LMs consistently outperform FLAC and yield state-of-the-art compression at 8-bit and 16-bit, we observe that compression gains become more modest as bit depth increases beyond 8-bit.

全保真音频无损压缩的语言建模基准测试 / Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

1️⃣ 一句话总结

这项研究评估了基于语言模型的无损音频压缩方法，发现它在8位和16位音频上能超越传统压缩格式（如FLAC），并提出了一种名为Trilobyte的新编码方案，首次实现了对24位高保真音频的可行压缩，但压缩效果会随着比特深度的增加而减弱。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要