electroglyph
/

gemma-3-4b-it-unslop-GRPO-v3

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

gemma-3-4b-it-unslop-GRPO-v3 / README.md

electroglyph's picture

Update README.md

784a049 verified 4 months ago

|

history blame contribute delete

1.31 kB

	---
	license: gemma
	library_name: transformers
	pipeline_tag: image-text-to-text
	base_model: google/gemma-3-4b-it
	---

	# Gemma 3 4b unslop experiment v3

	An unslop finetune of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)

	Next version is here: [gemma-3-4b-it-unslop-GSPO](https://huggingface.co/electroglyph/gemma-3-4b-it-unslop-GSPO)

	### Changes from my previous test

	- Temperature during training was at 1.0 this time around, model is a lot less weird
	- Rewards changed a little bit. I allowed a small number of sentences with 4+ commas instead of penalizing them all. This has cut down on the number of paranthetical phrases without completely eliminating them.
	- Lexical diversity score is a bit fancier this time. First I calculated MTLD for 600+ books I have and looked at the mean score. It was almost exactly 100.0, so that's the baseline I aimed for. MTLD of 80-120 all received full points (to avoid too much GRPO chaos), but deviations further than that get increasingly penalized.
	- I've uploaded a UD-Q4_K_XL GGUF with settings that I grabbed from Unsloth's quant using my lil utility: [quant_clone](https://github.com/electroglyph/quant_clone)


	### Training technique:

	Basically the same as last time plus the minor changes above.

	training code: [train.py](./train.py)