K1 is a model that underwent post-training using datasets such as iannicity/KIMI-K2.5-1000000x, iannicity/Hunter-Alpha-SFT, and stepfun-ai/Step-3.5-Flash-SFT.
It subsequently incorporated GRPO-based reinforcement learning training derived from Chinese logical problems, resulting in relatively consistent reasoning capabilities and enhanced cognition in complex scenarios.
I have also observed that GRPO produces side effects such as improved numerical computation ability, which is related to its influence on layers such as MLP.
If you like my work, you are welcome to support me by buying me a coffee on Ko-fi.
Every bit of your support directly helps me continue creating and allows me to spend more time producing better work:
- Downloads last month
- 133
Model tree for win10/K1-31B-v5
Base model
google/gemma-4-31B-it