Rio 2.5 Open

Rio 2.5 Open is a frontier-class reasoning model developed by IplanRIO, the municipal IT company of Rio de Janeiro's city government. Built through distillation on top of Qwen3-30B-A3B-Thinking-2507 using reasoning traces from our to be announced Rio 2.5 model, Rio 2.5 Open achieves state-of-the-art results across mathematics, STEM, and code benchmarks — surpassing its base model by significant margins and competing with models far larger than itself.

Rio 2.5 Open features SwiReasoning, a training-free inference framework based on Shi et al. (2025) that dynamically switches between explicit chain-of-thought and latent-space reasoning, guided by entropy-based confidence signals. This enables both higher accuracy and dramatically improved token efficiency. This model was explicitly trained to maximize the efficiency gained via latent reasoning.

Key Features

  • 30B total / 3B active parameters (Mixture-of-Experts)
  • 262,144 token context window
  • SwiReasoning integration — dynamic explicit/latent reasoning switching for Pareto-superior accuracy and efficiency
  • Distilled from Qwen3-30B-A3B-Thinking-2507 with traces from Rio 2.5
  • Multilingual — strong performance in Portuguese, English, Chinese, and dozens of other languages
  • MIT License — fully open for commercial and research use

Benchmark Results

Mathematics & STEM

Model GPQA Diamond LiveCodeBench Composite Math* AIME 2025 AIME 2026 I HMMT 2025 I HMMT 2025 II BRUMO 2025 CMIMC 2025 SMT 2025
Rio 2.5 Open 77.20% 69.60% 87.53% 93.33% 89.17% 83.33% 90.83% 88.33% 83.75% 83.96%
Rio 2.5 Open (w/o latent) 75.80% 69.40% 82.17% 90.00% 83.33% 76.67% 84.17% 85.83% 75.00% 80.19%
Qwen3-30B-A3B-2507 (base) 73.40% 66.00% 76.08% 82.50% 76.67% 70.83% 75.83% 85.00% 66.25% 75.47%
Qwen3-235B-A22B-2507 81.10% 74.10% 86.83% 91.67% 87.50% 83.33% 89.17% 87.50% 83.75% 84.91%
Kimi K2.5 Thinking 87.60% 85.00% 93.12% 95.83% 93.33% 93.33% 89.17% 98.33% 91.25% 90.57%
DeepSeek V3.2 82.40% 83.30% 90.93% 94.17% 91.67% 92.50% 90.00% 96.67% 83.75% 87.74%
GLM 4.6 81.00% 82.80% 91.69% 91.67% 91.67% 93.33% 91.67% 94.17% 88.75% 90.57%
GPT OSS 120B 80.10% 77.97% 89.17% 90.00% 89.17% 90.00% 90.00% 91.67% 85.62% 87.74%
GPT OSS 20B 71.50% 70.26% 82.34% 89.17% 85.00% 76.67% 83.33% 86.67% 72.50% 83.02%

*Composite Math is the average across all other mathematics benchmarks in this table.

Rio Model Family Comparison

Model GPQA Diamond LiveCodeBench Composite Math* AIME 2025
Rio 3.0 Open 85.10% 76.00% 91.78% 96.67%
Rio 2.5 Open 77.20% 69.60% 87.53% 93.33%
Rio 3.0 Open Mini 71.90% 63.50% 78.11% 89.17%

Gains Over Base Model (Qwen3-30B-A3B-Thinking-2507)

Benchmark Base Model Rio 2.5 Open Δ
GPQA Diamond 73.40% 77.20% +3.80%
LiveCodeBench 66.00% 69.60% +3.60%
Composite Math 76.08% 87.53% +11.45%
AIME 2025 82.50% 93.33% +10.83%
AIME 2026 I 76.67% 89.17% +12.50%
HMMT 2025 I 70.83% 83.33% +12.50%
BRUMO 2025 85.00% 88.33% +3.33%
CMIMC 2025 66.25% 83.75% +17.50%
SMT 2025 75.47% 83.96% +8.49%

SwiReasoning: Latent/Explicit Reasoning

Rio 2.5 Open integrates SwiReasoning (Shi et al., 2025), a training-free inference framework that dynamically alternates between two reasoning modes:

  • Explicit reasoning — standard chain-of-thought in natural language, where the model commits tokens to a single reasoning path
  • Latent reasoning — continuous reasoning in hidden space, where the model explores multiple implicit paths simultaneously without emitting tokens

The switching is governed by block-wise confidence estimated from entropy trends in the next-token distribution. When confidence is low (entropy trending upward), the model enters latent mode to explore alternatives. When confidence recovers, it switches back to explicit mode to commit to a solution.

This approach achieves a Pareto-superior trade-off: higher accuracy at unlimited budgets and dramatically better token efficiency under constrained budgets.

The benchmark table above includes (w/o latent) rows showing performance with standard explicit-only reasoning, demonstrating the consistent gains from SwiReasoning across all benchmarks.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prefeitura-rio/Rio-2.5-Open"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "Write a poem about Rio de Janeiro."

messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=81920,
    temperature=0.6,
    top_p=0.95,
)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

Using with vLLM

vllm serve prefeitura-rio/Rio-2.5-Open \
    --tensor-parallel-size 4 \
    --max-model-len 262144 \
    --trust-remote-code

Using with SGLang

python -m sglang.launch_server \
    --model-path prefeitura-rio/Rio-2.5-Open \
    --tp 4 \
    --context-length 262144 \
    --trust-remote-code

Model Details

Developer IplanRIO — Empresa Municipal de Informática e Planejamento S.A.
Base Model Qwen3-30B-A3B-Thinking-2507
Architecture Mixture-of-Experts (MoE) Transformer
Total Parameters ~30B
Active Parameters ~3B
Context Length 262,144 tokens
Default Max Output Length 81,920 tokens
Training Method Distillation
Inference Enhancement SwiReasoning (latent/explicit switching)
License MIT
Languages Multilingual (en, pt, zh, ja, ko, fr, de, es, ar, and more)

Citation

If you use SwiReasoning, please also cite:

@misc{shi2025swireasoning,
    title={SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs},
    author={Dachuan Shi et al.},
    year={2025},
    eprint={2510.05069},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Acknowledgments

Rio 2.5 Open is built upon the exceptional work of the Qwen Team and their Qwen3 model family. We also acknowledge the authors of SwiReasoning for their innovative inference framework.

Developed in Rio de Janeiro 🇧🇷 by IplanRIO.

Downloads last month
23
Safetensors
Model size
31B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prefeitura-rio/Rio-2.5-Open

Finetuned
(33)
this model

Paper for prefeitura-rio/Rio-2.5-Open