| --- |
| license: apache-2.0 |
| language: |
| - en |
| - de |
| base_model: Qwen/Qwen3-4B |
| tags: |
| - mimi |
| - tool-calling |
| - function-calling |
| - agent |
| - gguf |
| - fine-tuned |
| - wllama |
| - browser-inference |
| - on-device-ai |
| - local-ai |
| - privacy-first |
| model-index: |
| - name: MIMI Pro |
| results: |
| - task: |
| type: function-calling |
| name: Tool Calling |
| dataset: |
| type: gorilla-llm/Berkeley-Function-Calling-Leaderboard |
| name: BFCL V4 |
| metrics: |
| - type: accuracy |
| value: 60.8 |
| name: Simple Function Calling (Python) |
| verified: false |
| - type: accuracy |
| value: 57.5 |
| name: Multiple Sequential Calls |
| verified: false |
| - type: accuracy |
| value: 90 |
| name: Irrelevance Detection |
| verified: false |
| pipeline_tag: text-generation |
| --- |
| |
| # MIMI Pro |
|
|
| MIMI Pro is a 4-billion parameter AI agent model optimized for structured tool calling and autonomous task execution β designed to run entirely on-device, in the browser, with zero cloud dependencies. |
|
|
| Part of the MIMI Model Family by [Mimi Tech AI](https://mimitechai.com). |
|
|
| > **π¬ V1 β Experimental Release.** This model is fine-tuned for the MIMI Agent's custom tool-calling format. For standard tool calling, the base [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) may perform equally well or better with native `<tool_call>` prompting. V2 with official BFCL scores and Qwen3-native format support is in development. |
|
|
| ## Performance |
|
|
| ### BFCL V4 Benchmark (Partial β Single-Turn, 20 samples/category) |
|
|
| | Category | MIMI Pro V1 | Base Qwen3-4B | Notes | |
| |---|---|---|---| |
| | Simple Python | 60.8% (400 tests) | **80.0%** (20 tests) | Base outperforms | |
| | Simple Java | 21.0% (100 tests) | **60.0%** (20 tests) | Base outperforms | |
| | Multiple (Sequential) | 57.5% (200 tests) | **75.0%** (20 tests) | Base outperforms | |
| | Parallel | 2.0% (200 tests) | **75.0%** (20 tests) | Fine-tune degraded | |
| | Irrelevance | 90% (20 tests) | **100%** (20 tests) | Both strong | |
| | Live Simple | β | **90.0%** (20 tests) | Base only | |
|
|
| > β οΈ **Important Context:** The previously reported "97.7% accuracy" was a **training validation metric** (token-level accuracy on the training/eval split), not a standardized benchmark score. The table above shows actual BFCL V4 results. We are working on a full official evaluation. |
|
|
| ### Training Metrics (Internal) |
|
|
| | Metric | Value | |
| |---|---| |
| | Training Token Accuracy | 97.66% | |
| | Eval Token Accuracy | 97.29% | |
| | Training Loss | 0.084 | |
| | Parameters | 4.02 Billion | |
| | Quantized Size | 2.3 GB (Q4_K_M) | |
|
|
| ## Architecture |
|
|
| MIMI Pro is built on [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B), fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using [Unsloth](https://github.com/unslothai/unsloth) on NVIDIA DGX Spark. |
|
|
| **Key Design Decisions:** |
| - Custom tool-calling format optimized for the MIMI Agent browser environment |
| - 19 tool types covering web search, code execution, file operations, browser automation |
| - Trained on NVIDIA DGX Spark (Grace Blackwell GB10, 128 GB unified memory) |
|
|
| **Known Limitations of V1:** |
| - Fine-tuning with aggressive hyperparameters (LoRA r=64, 3 epochs, LR 2e-4) caused some capability degradation vs. the base model, particularly for parallel tool calling |
| - The custom `{"tool": ..., "parameters": ...}` format diverges from Qwen3's native `<tool_call>` format |
| - V2 will address these issues with conservative fine-tuning and Qwen3-native format support |
|
|
| ## Supported Tools |
|
|
| | Category | Tools | |
| |---|---| |
| | π Web | web_search, browse_url, browser_action | |
| | π» Code | execute_python, create_file, edit_file | |
| | π¬ Research | deep_research, generate_document | |
| | π System | read_file, list_directory, run_terminal | |
| | π§ Reasoning | Multi-step orchestration | |
| |
| ## Quick Start |
| |
| ### Browser (wllama/WebAssembly) |
| |
| ```javascript |
| import { Wllama } from '@anthropic-ai/wllama'; |
| |
| const wllama = new Wllama(wasmPaths); |
| await wllama.loadModelFromUrl( |
| 'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf', |
| { n_ctx: 4096 } |
| ); |
|
|
| const response = await wllama.createChatCompletion([ |
| { role: 'system', content: 'You are MIMI, an AI agent with tool access.' }, |
| { role: 'user', content: 'Search for the latest AI news and summarize it' } |
| ]); |
| ``` |
| |
| ### llama.cpp |
| |
| ```bash |
| ./llama-cli -m mimi-qwen3-4b-q4km.gguf \ |
| -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \ |
| -n 512 --temp 0.6 |
| ``` |
| |
| ### Python |
| |
| ```python |
| from llama_cpp import Llama |
| llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096) |
| output = llm.create_chat_completion(messages=[ |
| {"role": "system", "content": "You are MIMI, an AI agent with tool access."}, |
| {"role": "user", "content": "Search for the latest AI news"} |
| ]) |
| ``` |
| |
| ## Output Format |
|
|
| MIMI Pro V1 uses a custom format (V2 will support Qwen3-native `<tool_call>` format): |
|
|
| ```json |
| {"tool": "web_search", "parameters": {"query": "latest AI news March 2026", "limit": 5}} |
| ``` |
|
|
| ## The MIMI Model Family |
|
|
| | Model | Parameters | Size | Target Device | Status | |
| |---|---|---|---|---| |
| | MIMI Nano | 0.6B | ~400 MB | Any device, IoT | π Coming | |
| | MIMI Small | 1.7B | ~1.0 GB | Mobile & tablets | π Coming | |
| | **MIMI Pro** | **4.02B** | **2.3 GB** | **Desktop & laptop** | **β
Available** | |
| | MIMI Max | 8B | ~4.5 GB | Workstations | π Coming | |
|
|
| All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly. |
|
|
| ## Training Details |
|
|
| ```yaml |
| method: LoRA (PEFT) via Unsloth |
| base_model: Qwen/Qwen3-4B |
| lora_rank: 64 |
| lora_alpha: 128 |
| lora_dropout: 0.05 |
| target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj] |
| learning_rate: 2.0e-04 |
| epochs: 3 |
| effective_batch_size: 8 |
| max_seq_length: 2048 |
| optimizer: adamw_8bit |
| precision: bf16 |
| gradient_checkpointing: true |
| packing: true |
| dataset: 1,610 curated tool-calling examples (178K tokens) |
| hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory) |
| ``` |
|
|
| ## Why MIMI? |
|
|
| - π **Privacy First** β Your data never leaves your device. Period. |
| - π° **Zero Cost** β No API keys, no subscriptions, no per-token billing. |
| - β‘ **Fast** β Runs at native speed via WebAssembly, no server round-trips. |
| - π **Works Offline** β Once downloaded, no internet required. |
| - π§ **Tool Native** β Purpose-built for autonomous tool calling. |
|
|
| ## Limitations |
|
|
| - V1 uses a custom tool-calling format (not Qwen3-native `<tool_call>`) |
| - Parallel tool calling (multiple simultaneous calls) is degraded vs. base model |
| - Context window: 4,096 tokens (training config). Base architecture supports 32K. |
| - Requires ~3 GB RAM for inference in browser. |
| - Q4_K_M quantization trades minimal quality for 3.5x size reduction. |
|
|
| ## Roadmap |
|
|
| - [x] **V1** β Custom format, 19 tools, browser-optimized (current release) |
| - [ ] **V2** β Qwen3-native `<tool_call>` format, official BFCL V4 scores, conservative fine-tuning |
| - [ ] **Model Family** β Nano (0.6B), Small (1.7B), Max (8B) releases |
| - [ ] **Multi-Turn** β Agentic conversation chains with tool result feedback |
|
|
| ## About Mimi Tech AI |
|
|
| [Mimi Tech AI](https://mimitechai.com) builds on-device AI β no cloud, no data leaks, full user control. |
|
|
| - π [mimitechai.com](https://mimitechai.com) |
| - π [GitHub](https://github.com/MimiTechAi) |
| - πΌ [LinkedIn](https://linkedin.com/company/mimitechai) |
| - π’ [NVIDIA Connect Program](https://www.nvidia.com/en-us/industries/nvidia-connect-program/) Member |
|
|
| ## License |
|
|
| Apache 2.0 β free for commercial and personal use. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{mimitechai2026mimi, |
| title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling}, |
| author={Bemler, Michael and Soppa, Michael}, |
| year={2026}, |
| publisher={Mimi Tech AI}, |
| url={https://huggingface.co/MimiTechAI/mimi-pro} |
| } |
| ``` |
|
|