Instructions to use reiscook/pocket-tts-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use reiscook/pocket-tts-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir pocket-tts-mlx reiscook/pocket-tts-mlx
- Pocket-TTS
How to use reiscook/pocket-tts-mlx with Pocket-TTS:
from pocket_tts import TTSModel import scipy.io.wavfile tts_model = TTSModel.load_model("reiscook/pocket-tts-mlx") voice_state = tts_model.get_state_for_audio_prompt( "hf://kyutai/tts-voices/alba-mackenna/casual.wav" ) audio = tts_model.generate_audio(voice_state, "Hello world, this is a test.") # Audio is a 1D torch tensor containing PCM data. scipy.io.wavfile.write("output.wav", tts_model.sample_rate, audio.numpy()) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Pocket TTS - MLX Weights
MLX-converted weights for Kyutai's Pocket TTS model. Optimized for Apple Silicon inference via MLX (Python) and MLX-Swift.
Weight Variants
| Directory | Size | Description | RTF (M2 Air) |
|---|---|---|---|
bf16/ |
224MB | bfloat16 baseline | ~3x realtime |
int8/ |
148MB | 8-bit quantized FlowLM, bf16 Mimi | ~5x realtime |
int4/ |
107MB | 4-bit quantized FlowLM, bf16 Mimi | ~6x realtime |
All variants include FlowLM + Mimi decoder in a single unified mlx_model.safetensors file.
Voice Embeddings
8 pre-extracted voice embeddings from the Kyutai release:
voice/alba.safetensorsvoice/azelma.safetensorsvoice/cosette.safetensorsvoice/eponine.safetensorsvoice/fantine.safetensorsvoice/javert.safetensorsvoice/jean.safetensorsvoice/marius.safetensors
See GitHub repo for full source code. (coming soon)
Source
Converted from kyutai/pocket-tts official weights.
- Downloads last month
- 33
Hardware compatibility
Log In to add your hardware
Quantized