MOSS-Audio-Tokenizer-Nano-ONNX

This repository provides the ONNX exports of MOSS-Audio-Tokenizer-Nano, the lightweight audio tokenizer used by MOSS-TTS-Nano. It is intended for torch-free deployment with ONNX Runtime and ONNX Runtime Web.

Overview

The Nano variant is a lightweight tokenizer with about 20M parameters, designed to reduce deployment cost while preserving strong perceptual quality.

MOSS-Audio-Tokenizer-Nano supports:

  • 48 kHz, stereo audio
  • 12.5 Hz token rate
  • 16 RVQ codebooks
  • high-fidelity reconstruction across variable bitrates

This ONNX repository is designed for lightweight inference pipelines such as:

  • local CPU deployment with onnxruntime
  • browser deployment with onnxruntime-web
  • companion audio encoding/decoding for MOSS-TTS-Nano-100M-ONNX

Supported Backends

Backend Runtime Use Case
ONNX Runtime (CPU) onnxruntime Local CPU inference
ONNX Runtime Web onnxruntime-web Browser-based deployment

Repository Contents

File Description
moss_audio_tokenizer_encode.onnx Encoder graph for waveform -> discrete audio codes
moss_audio_tokenizer_encode.data External weights for the encoder graph
moss_audio_tokenizer_decode_full.onnx Full decoder graph for audio codes -> waveform
moss_audio_tokenizer_decode_step.onnx Streaming decoder-step graph for incremental decode
moss_audio_tokenizer_decode_shared.data External weights shared by the decoder graphs
codec_browser_onnx_meta.json Metadata for browser / ONNX runtime integration

Quick Start

huggingface-cli download OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano-ONNX \
    --local-dir weights/MOSS-Audio-Tokenizer-Nano-ONNX

This repository is typically used together with OpenMOSS-Team/MOSS-TTS-Nano-100M-ONNX for fully torch-free MOSS-TTS-Nano deployment.

Main Repositories

Repository Description
OpenMOSS/MOSS-TTS-Nano MOSS-TTS-Nano source code and inference pipeline
OpenMOSS-Team/MOSS-TTS-Nano PyTorch MOSS-TTS-Nano weights
OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano PyTorch MOSS-Audio-Tokenizer-Nano weights
OpenMOSS-Team/MOSS-TTS-Nano-100M-ONNX Companion ONNX TTS weights

About MOSS-Audio-Tokenizer-Nano

MOSS-Audio-Tokenizer-Nano serves as the lightweight codec backbone for MOSS-TTS-Nano. It keeps the same unified audio-token interface used across the MOSS-TTS family while reducing inference cost for CPU and browser deployment scenarios.

For the original PyTorch implementation, setup instructions, and more background, see:

Citation

If you use the MOSS-TTS work in your research or product, please cite:

@misc{openmoss2026mossttsnano,
  title={MOSS-TTS-Nano},
  author={OpenMOSS Team},
  year={2026},
  howpublished={GitHub repository},
  url={https://github.com/OpenMOSS/MOSS-TTS-Nano}
}
@misc{gong2026mossttstechnicalreport,
  title={MOSS-TTS Technical Report},
  author={Yitian Gong and Botian Jiang and Yiwei Zhao and Yucheng Yuan and Kuangwei Chen and Yaozhou Jiang and Cheng Chang and Dong Hong and Mingshu Chen and Ruixiao Li and Yiyang Zhang and Yang Gao and Hanfu Chen and Ke Chen and Songlin Wang and Xiaogui Yang and Yuqian Zhang and Kexin Huang and ZhengYuan Lin and Kang Yu and Ziqi Chen and Jin Wang and Zhaoye Fei and Qinyuan Cheng and Shimin Li and Xipeng Qiu},
  year={2026},
  eprint={2603.18090},
  archivePrefix={arXiv},
  primaryClass={cs.SD},
  url={https://arxiv.org/abs/2603.18090}
}
@misc{gong2026mossaudiotokenizerscalingaudiotokenizers,
  title={MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models},
  author={Yitian Gong and Kuangwei Chen and Zhaoye Fei and Xiaogui Yang and Ke Chen and Yang Wang and Kexin Huang and Mingshu Chen and Ruixiao Li and Qingyuan Cheng and Shimin Li and Xipeng Qiu},
  year={2026},
  eprint={2602.10934},
  archivePrefix={arXiv},
  primaryClass={cs.SD},
  url={https://arxiv.org/abs/2602.10934}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano-ONNX