--- library_name: transformers license: apache-2.0 base_model: Qwen/Qwen3-4B tags: - axolotl - generated_from_trainer datasets: - joeyzero/OpenThought-144k-Backfill-0.2 - joeyzero/dolphin-r1-backfill-0.0.2 model-index: - name: Qwen3-4B-Reasoning-Backfill-V0.1 results: [] thumbnail: "https://cdn-uploads.huggingface.co/production/uploads/6334832f86c3fdcdc7acbe8e/DcjT8q4QvTjfCJWEa8kC7.png" ---
INSTRUCTION to a fixed SOLUTION while preserving the original answer. It focuses entirely on the route, not rewriting the destination, producing stepwise “thinking” traces that align with the target output. The goal is to enable reasoning backfill for legacy or non-reasoning datasets where collecting thought processes directly is impractical, such as older chat logs or instruction corpora. These traces can help bootstrap process-supervision signals, support teacher-style models, and deepen auditability of output behavior.
I would love to try this with larger or more stylized models to see how feasible it is to produce stylized or limited-domain reasoning traces. If you’re a compute partner interested in scaling this line of work to larger backfill models, let’s talk.
adamw_bnb_8bit · lr 2.5e-5The model expects the input and output to be wrapped in <|instruction_start|><|instruction_end|> and <|solution_start><|solution_end|> tokens respectively. The model will should render the output in <|thinking_start|><|thinking_end|> tokens to help with parsing and identification of malformed outputs.
<|im_start|>system Your role as an assistant involves thoroughly reconstructing a plausible reasoning process that leads from a user-provided INSTRUCTION to a user-provided SOLUTION. You must not alter the SOLUTION. Each step should include concrete decisions and validations, such as: interpreting the INSTRUCTION, extracting constraints, selecting an approach, justifying key choices, verifying that intermediate results remain consistent with the provided SOLUTION, refining any errors, and a final consistency check noting any residual ambiguities. Use domain-appropriate specifics and avoid filler. Do not introduce new facts that would change the SOLUTION. Now, given an INSTRUCTION and a SOLUTION, reconstruct the Thought and present the Solution per the above guidelines. Your reasoning should begin and end with '<|thinking_start|>' and '<|thinking_end>' respectively. <|im_end|> <|im_start|>user <|instruction_start|> What are some tips for reducing stress at work? Your response should contain at least 4 bullet points. Use markdown bullets like "* point". Include the keyword "mindfulness" twice. <|instruction_end|> <|solution_start|> ‐ Practice mindfulness during breaks ... ‐ Prioritize tasks and set boundaries ... ‐ Incorporate mindfulness into routine activities ... ‐ Stay physically active ... <|solution_end|> <|im_end|> <|im_start|>assistant <|thinking_start|>
temperature: 0.7–1.0
top_p: 0.9
min_p: 0.05
max_tokens: as needed for trace length
stop: ["<|im_end|>"]
For tighter adherence, drop temperature toward 0.5–0.7.