opus-4b-cube-py-step175-2026-05-01

LoRA adapter (rank 32) trained with RL on a custom Opus-Magnum-style motion-planning task using the python answer representation with cube (3-tuple, x+y+z=0) coordinates. Snapshot at effective training step 175.

training curves

Source training run

  • tinker checkpoint: tinker://3f960bea-2c1d-50c4-9b86-b3cebf7da240:train:0/sampler_weights/000050
  • effective step: 175 (resumed from step 125 of the original 4B-cube run 7a73d1ec; this is step 50 of the resume session)
  • distances: 1, 2, 3
  • task types: move, transmute, move2 (no bond)
  • learning rate: 1e-5
  • group size: 8, groups per batch: 16
  • renderer: qwen3_5_disable_thinking
  • coord mode: cube
  • representation: python

Curriculum progression

checkpoints distances task types
b0–b60 1,2,3,4 move, transmute, bond
b60–b125 1,2,3,4 move, transmute (bond dropped — 0% solve)
b125–b175 1,2,3 move, transmute, move2 (d4 dropped, move2 added)

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "Qwen/Qwen3.5-4B"
adapter = "GoodStartLabs/opus-4b-cube-py-step175-2026-05-01"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
Downloads last month
71
Video Preview
loading

Model tree for GoodStartLabs/opus-4b-cube-py-step175-2026-05-01

Finetuned
Qwen/Qwen3.5-4B
Adapter
(227)
this model