TL;DR This repo is the ONNX runtime asset bundle for GPA v1.5. It intentionally contains runtime assets only. Use it together with the open-source runtime code in
GPA_1.5/onnx_runtime.
What This Repo Contains
This asset bundle is prepared for local CLI inference, FastAPI service deployment, browser UI testing, voice registration, and runtime validation.
Expected top-level structure:
GPA-v1.5-onnx-runtime/
βββ build/
βββ genai_fp16_qwen/
βββ genai_int4_qwen/
βββ model/
βββ voice/
βββ spark_tokenizer_model/
Important files and directories:
model/runtime_manifest.jsonmodel/reference/default_global_tokens.npygenai_fp16_qwen/model.onnxgenai_int4_qwen/model.onnxvoice/spark_tokenizer_model/config.yamlvoice/spark_tokenizer_model/model.safetensors
Code and Asset Mapping
| Need | Location |
|---|---|
| Native GPA v1.5 checkpoint | AutoArk-AI/GPA-v1.5 |
| ONNX runtime assets | This repo: AutoArk-AI/GPA-v1.5-onnx-runtime |
| Runtime code | GPA_1.5/onnx_runtime |
| Runtime guide | GPA_1.5/onnx_runtime/README.md |
Recommended Local Layout
Download this repo into a sibling asset path:
GPA-v1.5/
GPA-v1.5-HF/
GPA-v1.5/
GPA-v1.5-onnx-runtime/
The runtime code automatically looks for:
GPA-v1.5-HF/GPA-v1.5-onnx-runtime
If your assets live elsewhere, point the runtime to them explicitly:
export ARK_AUDIO_RUNTIME_ASSET_ROOT=/absolute/path/to/GPA-v1.5-onnx-runtime
Download
git clone https://github.com/AutoArk/GPA.git GPA-v1.5
mkdir -p GPA-v1.5-HF
huggingface-cli download AutoArk-AI/GPA-v1.5-onnx-runtime \
--local-dir GPA-v1.5-HF/GPA-v1.5-onnx-runtime
Environment Setup
Using a dedicated virtual environment is recommended:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r GPA-v1.5/GPA_1.5/onnx_runtime/requirements.runtime.txt
If your checkout directory is named differently, update the path accordingly.
Quick Validation
After downloading, sanity-check the local layout:
test -d GPA-v1.5-HF/GPA-v1.5-onnx-runtime/model
test -d GPA-v1.5-HF/GPA-v1.5-onnx-runtime/build
test -d GPA-v1.5-HF/GPA-v1.5-onnx-runtime/voice/spark_tokenizer_model
test -f GPA-v1.5-HF/GPA-v1.5-onnx-runtime/model/runtime_manifest.json
test -f GPA-v1.5-HF/GPA-v1.5-onnx-runtime/voice/spark_tokenizer_model/model.safetensors
CLI Smoke Tests
Generate speech with the int4 main model and int8 decoder:
python GPA-v1.5/GPA_1.5/onnx_runtime/infer_ark_audio_onnx.py \
--runtime-root GPA-v1.5-HF/GPA-v1.5-onnx-runtime \
--task tts \
--tts_text "Hello, this is a short GPA speech synthesis check." \
--tts_out_wav tmp_docs/onnx_smoke/readme_tts_int4_int8.wav \
--main-model-precision int4 \
--decoder-precision int8
Transcribe the generated audio:
python GPA-v1.5/GPA_1.5/onnx_runtime/infer_ark_audio_onnx.py \
--runtime-root GPA-v1.5-HF/GPA-v1.5-onnx-runtime \
--task asr \
--asr_audio tmp_docs/onnx_smoke/readme_tts_int4_int8.wav \
--main-model-precision int4
For higher-fidelity testing on suitable hardware, switch the main model precision to fp16.
FastAPI Service and Browser UI
The GitHub runtime code includes a FastAPI service with a built-in browser UI:
cd GPA-v1.5/GPA_1.5/onnx_runtime
uvicorn service:app --host 127.0.0.1 --port 8024
Open http://127.0.0.1:8024/ in your browser. The UI supports TTS, ASR, default voice checks, voice registration, and runtime memory monitoring.
API Endpoints
The FastAPI service exposes:
- UI:
GET / - Observability:
GET /api/health,GET /api/memory,GET /api/voices - Core inference:
POST /api/tts,POST /api/asr - Voice management:
POST /api/voices/register-path,POST /api/voices/register-upload - OpenAI-compatible routes:
GET /v1/models,GET /v1/audio/voices,POST /v1/audio/speech,POST /v1/audio/transcriptions
Voice Registration
Voice registration uses the bundled tokenizer assets under:
GPA-v1.5-HF/GPA-v1.5-onnx-runtime/voice/spark_tokenizer_model
If you want to replace those assets, set:
export ARK_AUDIO_TOKENIZER_MODEL_DIR=/absolute/path/to/spark_tokenizer_model
Successful registration writes metadata under the runtime code directory:
GPA_1.5/onnx_runtime/voices/
Each registered voice stores TTS control tokens at:
GPA_1.5/onnx_runtime/voices/items/<voice_id>/global_tokens.npy
That file can be passed to the CLI with --voice-global-token.
Notes
- The asset bundle exposes
fp16andint4main-model variants. - The default TTS text tokenizer mode is
han_char. - The default packaged layout includes tokenizer assets for voice registration.
- The repo is designed to stay free of service code, export scripts, test outputs, and local deployment residue.
License
This asset bundle is released under the Apache 2.0 license.
Citation
If you find GPA useful for your research or projects, please cite us:
@misc{cai2026unifyingspeechrecognitionsynthesis,
title={Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers},
author={Runyuan Cai and Yu Lin and Yiming Wang and Chunlin Fu and Xiaodong Zeng},
year={2026},
eprint={2601.10770},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2601.10770},
}