MasterControlAIML
/

DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora-gguf

Update README.md

85f4060 verified 10 months ago

8.4 kB


	---
	# 🦄 Model Card
	base_model: unsloth/Qwen2.5-3B-Instruct
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- qwen2
	- trl
	- grpo # Gradient Reward Policy Optimization
	license: apache-2.0
	language:
	- en
	---

	# 📦 Uploaded Model

	\| Field \| Value \|
	\|-----------------------\|--------------------------------------------\|
	\| Developed by \| MasterControlAIML \|
	\| License \| Apache 2.0 \|
	\| Finetuned from \| `unsloth/Qwen2.5-3B-Instruct` \|
	\| Training Framework\| [Unsloth](https://github.com/unslothai/unsloth) × Hugging Face TRL \|

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="190"/>](https://github.com/unslothai/unsloth)

	---

	## 🚀 What’s New?
	> The protein-shake sequel to MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured—now with more neurons, zero SFT, and a league of reward functions.

	\| Upgrade \| Explanation \|
	\|--------------------\|------------------------------------------------------------------------------\|
	\| Bigger Backbone\| 1.5 B → 3 B Qwen 2.5 for bigger reasoning muscles. \|
	\| Pure RL \| No supervised fine-tuning—policy learned only from reward signals (GRPO). \|
	\| LM-as-Judge \| Separate LLM rates each candidate for correctness, JSON validity, style… \|
	\| 2× Faster Train\| Unsloth’s flash-attention & fused ops = less VRAM, more speed. \|

	---

	## 🛠️ Intended Use
	* Convert messy prose, logs, or audit notes into a pristine JSON document that follows a complex, nested schema.
	* Drop-in replacement for any pipeline using the older DeepSeek-R1 1.5 B structurer—just swap the checkpoint and enjoy the headroom.

	---

	## 🔧 How to Use (Reasoning + JSON)
	The snippet below:

	1. Primes the model with the exact Pydantic schema, so it outputs the right keys.
	2. Makes the model think step-by-step (reasoning) but still wraps the final JSON in an easy-to-parse container.
	3. Uses the correct repo name: `MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora`.

	```python
	# ─────────────────────────────────────────────────────────────────────────────
	# QUICK-START
	# Structured-data extraction with reasoning + JSON output
	# ─────────────────────────────────────────────────────────────────────────────
	from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
	import torch, json, textwrap, inspect
	from pydantic import BaseModel
	from typing import List, Optional

	MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora"

	# 1️⃣ Inline schema (keeps the LLM on-rails) ─────────────────────────────────
	class MultipleChoice(BaseModel):
	question: str
	options: List[str]
	selected: str

	class FormField(BaseModel):
	fieldName: str
	value: str
	notes: Optional[str] = ""

	class Calculation(BaseModel):
	formula: str
	result: str
	notes: Optional[str] = ""

	class Metadata(BaseModel):
	reportDate: str
	auditorId: Optional[str] = None
	comments: Optional[str] = None

	class Content(BaseModel):
	paragraphs: List[str]
	tables: List["Table"] # assume Table defined elsewhere
	checkboxes: List["Checkbox"] # 〃
	multipleChoice: List[MultipleChoice]
	formFields: List[FormField]
	calculations: List[Calculation]
	metadata: Optional[Metadata] = Metadata(reportDate="")

	class Section(BaseModel):
	id: str
	title: str
	content: Content

	class Document(BaseModel):
	documentTitle: str
	documentDate: str
	sections: List[Section]

	SCHEMA_TEXT = inspect.getsource(Document)

	# 2️⃣ Build prompts ──────────────────────────────────────────────────────────
	SYSTEM_PROMPT = textwrap.dedent(f"""
	You are an expert data-extraction assistant.
	Extract structured info from unstructured text exactly following the Pydantic schema.

	── Schema ──
	{SCHEMA_TEXT}
	─────────────

	Rules:
	1. Follow the schema for keys & nesting.
	2. Copy values verbatim when possible.
	3. If a field is missing, return null.
	4. Output your step-by-step reasoning first.
	5. Then return ONLY the JSON inside this wrapper:
	final answer[ json object: {{ ... }} ]

	Format:
	<reasoning>…</reasoning>
	<answer>
	final answer[ json object: {{ … }} ]
	</answer>
	""").strip()

	UNSTRUCTURED_TEXT = """
	12 April 2025 – Onsite audit performed by Jane Smith.
	Observations: Two fire extinguishers past expiry; emergency lights functional.
	Calculations: Total extinguishers = 8, expired = 2 → 25 % overdue.
	"""

	USER_PROMPT = textwrap.dedent(f"""
	### Task
	Convert the following hier text to the schema.

	### hier
	{UNSTRUCTURED_TEXT}
	""").strip()

	# 3️⃣ Generate ───────────────────────────────────────────────────────────────
	tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
	model = AutoModelForCausalLM.from_pretrained(
	MODEL,
	device_map="auto",
	torch_dtype=torch.bfloat16
	)
	gen = pipeline("text-generation", model=model, tokenizer=tok,
	max_new_tokens=512, do_sample=False)

	prompt = f"<\|system\|>\n{SYSTEM_PROMPT}\n<\|user\|>\n{USER_PROMPT}"
	raw_out = gen(prompt)[0]["generated_text"]

	# 4️⃣ Slice out the JSON ─────────────────────────────────────────────────────
	start = raw_out.find("final answer[")
	end = raw_out.rfind("]") + 1
	json_text = raw_out[start:].split("json object:")[-1].strip(" []\n")
	data = json.loads(json_text) # ✅ Raises if malformed

	print(raw_out) # reasoning + JSON
	print("\n✅ Parsed object:\n", data)
	````

	### Why it Works 🧐

	* Schema-priming ensures key-level fidelity—no “creative” field names.
	* Chain-of-thought improves factual extraction (was rewarded during GRPO).
	* The `final answer[…]` wrapper makes downstream parsing a one-liner.

	---

	## 🏋️ Training Recipe (Condensed)

	\| Setting \| Value \|
	\| -------------- \| ------------------------------------------------------------------- \|
	\| Algorithm \| GRPO – policy ≈ LM, reward LM ≈ `Qwen2.5-7B` w/ JSON-validator head \|
	\| Epochs \| 3 (effective) \|
	\| Batch \| Grad-accum 8, bfloat16 \|
	\| Optimizer \| Fused AdamW \|
	\| Throughput \| ≈ 45 k tokens/s on 8×A100 \|

	---

	## 📊 Evaluation (WIP)

	\| Metric \| Status \|
	\| ------------------------- \| ------ \|
	\| Exact-Match JSON Accuracy \| 🔜 \|
	\| Structural F1 \| 🔜 \|
	\| Valid-JSON Rate \| 🔜 \|

	Stay tuned—numbers landing faster than you can say “schema validation.” 🛰️

	---

	## 🤝 Citation

	```bibtex
	@misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
	title = {An Unsloth-accelerated GRPO-trained Qwen 2.5-3B for JSON structuring},
	author = {MasterControlAIML},
	year = {2025},
	howpublished = {\url{https://huggingface.co/MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora}}
	}
	```

	May your JSON always parse and your losses always converge! 😎

	```