opus-4b-cube-py-step175-2026-05-01

LoRA adapter (rank 32) trained with RL on a custom Opus-Magnum-style motion-planning task using the python answer representation with cube (3-tuple, x+y+z=0) coordinates. Snapshot at effective training step 175.

Source training run

tinker checkpoint: tinker://3f960bea-2c1d-50c4-9b86-b3cebf7da240:train:0/sampler_weights/000050
effective step: 175 (resumed from step 125 of the original 4B-cube run 7a73d1ec; this is step 50 of the resume session)
distances: 1, 2, 3
task types: move, transmute, move2 (no bond)
learning rate: 1e-5
group size: 8, groups per batch: 16
renderer: qwen3_5_disable_thinking
coord mode: cube
representation: python

Curriculum progression

checkpoints	distances	task types
b0–b60	1,2,3,4	move, transmute, bond
b60–b125	1,2,3,4	move, transmute (bond dropped — 0% solve)
b125–b175	1,2,3	move, transmute, move2 (d4 dropped, move2 added)

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "Qwen/Qwen3.5-4B"
adapter = "GoodStartLabs/opus-4b-cube-py-step175-2026-05-01"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

Downloads last month: 71

Video Preview

Reinforcement Learning

Model tree for GoodStartLabs/opus-4b-cube-py-step175-2026-05-01

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Adapter

(227)

this model