K1 is a model that underwent post-training using datasets such as iannicity/KIMI-K2.5-1000000x, iannicity/Hunter-Alpha-SFT, and stepfun-ai/Step-3.5-Flash-SFT.

It subsequently incorporated GRPO-based reinforcement learning training derived from Chinese logical problems, resulting in relatively consistent reasoning capabilities and enhanced cognition in complex scenarios.

I have also observed that GRPO produces side effects such as improved numerical computation ability, which is related to its influence on layers such as MLP.

If you like my work, you are welcome to support me by buying me a coffee on Ko-fi.

Every bit of your support directly helps me continue creating and allows me to spend more time producing better work:

https://ko-fi.com/ogodwin10

Downloads last month: 133

Safetensors

Model size

33B params

Tensor type

BF16

Model tree for win10/K1-31B-v5

Base model

google/gemma-4-31B-it

Finetuned

huihui-ai/Huihui-gemma-4-31B-it-abliterated-v2

Finetuned

(1)

this model

Quantizations

2 models