Qwen3 From Scratch Distillation Checkpoints

This repository contains chapter 8 distillation checkpoints for the rasbt/qwen3-from-scratch model from Build a Reasoning Model (From Scratch).

These files are raw PyTorch state_dict checkpoints intended for use with the reasoning_from_scratch package.

 

Available Checkpoints

 

Usage Example

The checkpoints in this repository are intended for use with the reasoning_from_scratch package.

For a DeepSeek-R1 distillation checkpoint, you can download it via:

from reasoning_from_scratch.qwen3 import download_qwen3_distill_checkpoints

download_qwen3_distill_checkpoints(
    distill_type="deepseek_r1",
    step="06682",
    out_dir="qwen3",
)

For a Qwen3 235B A22B distillation checkpoint, use:

from reasoning_from_scratch.qwen3 import download_qwen3_distill_checkpoints

download_qwen3_distill_checkpoints(
    distill_type="qwen3_235b_a22b",
    step="05746",
    out_dir="qwen3",
)

Once downloaded, you can load a checkpoint and stream text as follows:

from pathlib import Path
import torch

from reasoning_from_scratch.ch02 import (
    get_device,
    generate_text_basic_stream_cache,
)
from reasoning_from_scratch.qwen3 import (
    download_qwen3_distill_checkpoints,
    download_qwen3_small,
    Qwen3Model,
    Qwen3Tokenizer,
    QWEN_CONFIG_06_B,
)

device = get_device()
local_dir = Path("qwen3")

checkpoint_path = download_qwen3_distill_checkpoints(
    distill_type="deepseek_r1",
    step="06682",
    out_dir=local_dir,
)
download_qwen3_small(kind="reasoning", tokenizer_only=True, out_dir=local_dir)

tokenizer = Qwen3Tokenizer(
    tokenizer_file_path=local_dir / "tokenizer-reasoning.json",
    apply_chat_template=True,
    add_generation_prompt=True,
    add_thinking=True,
)
model = Qwen3Model(QWEN_CONFIG_06_B)
state_dict = torch.load(checkpoint_path, map_location=device)
model.load_state_dict(state_dict)
model.to(device)
model.eval()

prompt = "Solve: If x + 7 = 19, what is x?"
input_ids = torch.tensor(tokenizer.encode(prompt), device=device).unsqueeze(0)

for token in generate_text_basic_stream_cache(
    model=model,
    token_ids=input_ids,
    max_new_tokens=256,
    eos_token_id=tokenizer.eos_token_id,
):
    token_id = token.squeeze(0).item()
    print(tokenizer.decode([token_id]), end="", flush=True)
print()

 

Notes

  • These are the exact epoch checkpoints used in the chapter 8 results table in ch08_main.ipynb.
  • Both checkpoint families should be used with the reasoning tokenizer for consistency with chapter 8.

 

Download Helper Reference

These are the supported distill_type values for download_qwen3_distill_checkpoints(...):

  • DeepSeek-R1 distillation data:
download_qwen3_distill_checkpoints(
    distill_type="deepseek_r1",
    step="06682",
    out_dir="qwen3",
)

Available DeepSeek-R1 saved steps: 06682, 13364, 20046.

  • Qwen3 235B A22B distillation data:
download_qwen3_distill_checkpoints(
    distill_type="qwen3_235b_a22b",
    step="05746",
    out_dir="qwen3",
)

Available Qwen3 235B A22B saved steps: 05746, 11492, 17238.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support