|
|
--- |
|
|
license: gemma |
|
|
library_name: transformers |
|
|
pipeline_tag: image-text-to-text |
|
|
base_model: google/gemma-3-4b-it |
|
|
--- |
|
|
|
|
|
# Gemma 3 4b unslop experiment v3 |
|
|
|
|
|
An unslop finetune of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) |
|
|
|
|
|
Next version is here: [gemma-3-4b-it-unslop-GSPO](https://huggingface.co/electroglyph/gemma-3-4b-it-unslop-GSPO) |
|
|
|
|
|
### Changes from my previous test |
|
|
|
|
|
- Temperature during training was at 1.0 this time around, model is a lot less weird |
|
|
- Rewards changed a little bit. I allowed a small number of sentences with 4+ commas instead of penalizing them all. This has cut down on the number of paranthetical phrases without completely eliminating them. |
|
|
- Lexical diversity score is a bit fancier this time. First I calculated MTLD for 600+ books I have and looked at the mean score. It was almost exactly 100.0, so that's the baseline I aimed for. MTLD of 80-120 all received full points (to avoid too much GRPO chaos), but deviations further than that get increasingly penalized. |
|
|
- I've uploaded a UD-Q4_K_XL GGUF with settings that I grabbed from Unsloth's quant using my lil utility: [quant_clone](https://github.com/electroglyph/quant_clone) |
|
|
|
|
|
|
|
|
### Training technique: |
|
|
|
|
|
Basically the same as last time plus the minor changes above. |
|
|
|
|
|
training code: [train.py](./train.py) |