GPA v1.5 ONNX Runtime Assets

TL;DR This repo is the ONNX runtime asset bundle for GPA v1.5. It intentionally contains runtime assets only. Use it together with the open-source runtime code in GPA_1.5/onnx_runtime.

What This Repo Contains

This asset bundle is prepared for local CLI inference, FastAPI service deployment, browser UI testing, voice registration, and runtime validation.

Expected top-level structure:

GPA-v1.5-onnx-runtime/
├── build/
├── genai_fp16_qwen/
├── genai_int4_qwen/
├── model/
└── voice/
    └── spark_tokenizer_model/

Important files and directories:

model/runtime_manifest.json
model/reference/default_global_tokens.npy
genai_fp16_qwen/model.onnx
genai_int4_qwen/model.onnx
voice/spark_tokenizer_model/config.yaml
voice/spark_tokenizer_model/model.safetensors

Code and Asset Mapping

Need	Location
Native GPA v1.5 checkpoint	`AutoArk-AI/GPA-v1.5`
ONNX runtime assets	This repo: `AutoArk-AI/GPA-v1.5-onnx-runtime`
Runtime code	`GPA_1.5/onnx_runtime`
Runtime guide	`GPA_1.5/onnx_runtime/README.md`

Recommended Local Layout

Download this repo into a sibling asset path:

GPA-v1.5/
GPA-v1.5-HF/
  GPA-v1.5/
  GPA-v1.5-onnx-runtime/

The runtime code automatically looks for:

GPA-v1.5-HF/GPA-v1.5-onnx-runtime

If your assets live elsewhere, point the runtime to them explicitly:

export ARK_AUDIO_RUNTIME_ASSET_ROOT=/absolute/path/to/GPA-v1.5-onnx-runtime

Download

git clone https://github.com/AutoArk/GPA.git GPA-v1.5
mkdir -p GPA-v1.5-HF

huggingface-cli download AutoArk-AI/GPA-v1.5-onnx-runtime \
  --local-dir GPA-v1.5-HF/GPA-v1.5-onnx-runtime

Environment Setup

Using a dedicated virtual environment is recommended:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r GPA-v1.5/GPA_1.5/onnx_runtime/requirements.runtime.txt

If your checkout directory is named differently, update the path accordingly.

Quick Validation

After downloading, sanity-check the local layout:

test -d GPA-v1.5-HF/GPA-v1.5-onnx-runtime/model
test -d GPA-v1.5-HF/GPA-v1.5-onnx-runtime/build
test -d GPA-v1.5-HF/GPA-v1.5-onnx-runtime/voice/spark_tokenizer_model
test -f GPA-v1.5-HF/GPA-v1.5-onnx-runtime/model/runtime_manifest.json
test -f GPA-v1.5-HF/GPA-v1.5-onnx-runtime/voice/spark_tokenizer_model/model.safetensors

CLI Smoke Tests

Generate speech with the int4 main model and int8 decoder:

python GPA-v1.5/GPA_1.5/onnx_runtime/infer_ark_audio_onnx.py \
  --runtime-root GPA-v1.5-HF/GPA-v1.5-onnx-runtime \
  --task tts \
  --tts_text "Hello, this is a short GPA speech synthesis check." \
  --tts_out_wav tmp_docs/onnx_smoke/readme_tts_int4_int8.wav \
  --main-model-precision int4 \
  --decoder-precision int8

Transcribe the generated audio:

python GPA-v1.5/GPA_1.5/onnx_runtime/infer_ark_audio_onnx.py \
  --runtime-root GPA-v1.5-HF/GPA-v1.5-onnx-runtime \
  --task asr \
  --asr_audio tmp_docs/onnx_smoke/readme_tts_int4_int8.wav \
  --main-model-precision int4

For higher-fidelity testing on suitable hardware, switch the main model precision to fp16.

FastAPI Service and Browser UI

The GitHub runtime code includes a FastAPI service with a built-in browser UI:

cd GPA-v1.5/GPA_1.5/onnx_runtime
uvicorn service:app --host 127.0.0.1 --port 8024

Open http://127.0.0.1:8024/ in your browser. The UI supports TTS, ASR, default voice checks, voice registration, and runtime memory monitoring.

API Endpoints

The FastAPI service exposes:

UI: GET /
Observability: GET /api/health, GET /api/memory, GET /api/voices
Core inference: POST /api/tts, POST /api/asr
Voice management: POST /api/voices/register-path, POST /api/voices/register-upload
OpenAI-compatible routes: GET /v1/models, GET /v1/audio/voices, POST /v1/audio/speech, POST /v1/audio/transcriptions

Voice Registration

Voice registration uses the bundled tokenizer assets under:

GPA-v1.5-HF/GPA-v1.5-onnx-runtime/voice/spark_tokenizer_model

If you want to replace those assets, set:

export ARK_AUDIO_TOKENIZER_MODEL_DIR=/absolute/path/to/spark_tokenizer_model

Successful registration writes metadata under the runtime code directory:

GPA_1.5/onnx_runtime/voices/

Each registered voice stores TTS control tokens at:

GPA_1.5/onnx_runtime/voices/items/<voice_id>/global_tokens.npy

That file can be passed to the CLI with --voice-global-token.

Notes

The asset bundle exposes fp16 and int4 main-model variants.
The default TTS text tokenizer mode is han_char.
The default packaged layout includes tokenizer assets for voice registration.
The repo is designed to stay free of service code, export scripts, test outputs, and local deployment residue.

License

This asset bundle is released under the Apache 2.0 license.

Citation

If you find GPA useful for your research or projects, please cite us:

@misc{cai2026unifyingspeechrecognitionsynthesis,
      title={Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers},
      author={Runyuan Cai and Yu Lin and Yiming Wang and Chunlin Fu and Xiaodong Zeng},
      year={2026},
      eprint={2601.10770},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2601.10770},
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for AutoArk-AI/GPA-v1.5-onnx-runtime

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Quantized

(296)

this model

Paper for AutoArk-AI/GPA-v1.5-onnx-runtime

Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers

Paper • 2601.10770 • Published Jan 15 • 4