Question about the origin of alamios/DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B

#1
by s1ngledoge - opened

Hi,

Thank you for sharing alamios/DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B.

I am looking into using this model and wanted to better understand how it relates to Qwen/Qwen2.5-Coder-0.5B before I proceed.

Could you please clarify whether alamios/DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B was built directly on top of Qwen/Qwen2.5-Coder-0.5B through a straightforward fine-tuning step, or whether there were any intermediate checkpoints, additional training rounds, merges, distillation steps, or other released models involved in between?

From a practical standpoint, I am mainly trying to understand whether it should be treated as a direct derivative of Qwen/Qwen2.5-Coder-0.5B, or as a model that has gone through extra modification stages beyond a simple direct fine-tuning path.

This would help me make better compatibility assumptions before building on top of it.

Thank you very much for your time. I would really appreciate any clarification you can provide.

Best,
Qu

Hello,

This is a simple fine-tune of Qwen2.5-Coder-0.5B on DeepSeek-R1-Distill outputs to be used for speculative decoding, as stated in the model card.

Sign up or log in to comment