|
|
| --- |
| |
| base_model: unsloth/Qwen2.5-3B-Instruct |
| tags: |
| - text-generation-inference |
| - transformers |
| - unsloth |
| - qwen2 |
| - trl |
| - grpo |
| license: apache-2.0 |
| language: |
| - en |
| --- |
| |
| # π¦ Uploaded Model |
|
|
| | **Field** | **Value** | |
| |-----------------------|--------------------------------------------| |
| | **Developed by** | **MasterControlAIML** | |
| | **License** | Apache 2.0 | |
| | **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` | |
| | **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) Γ Hugging Face TRL | |
|
|
| [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="190"/>](https://github.com/unslothai/unsloth) |
|
|
| --- |
|
|
| ## π Whatβs New? |
| > *The protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**βnow with more neurons, zero SFT, and a league of reward functions.* |
|
|
| | Upgrade | Explanation | |
| |--------------------|------------------------------------------------------------------------------| |
| | **Bigger Backbone**| 1.5 B β **3 B** Qwen 2.5 for bigger reasoning muscles. | |
| | **Pure RL** | No supervised fine-tuningβpolicy learned *only* from reward signals (GRPO). | |
| | **LM-as-Judge** | Separate LLM rates each candidate for correctness, JSON validity, style⦠| |
| | **2Γ Faster Train**| Unslothβs flash-attention & fused ops = less VRAM, more speed. | |
|
|
| --- |
|
|
| ## π οΈ Intended Use |
| * Convert messy prose, logs, or audit notes into a pristine JSON document that follows a complex, nested schema. |
| * Drop-in replacement for any pipeline using the older DeepSeek-R1 1.5 B structurerβjust swap the checkpoint and enjoy the headroom. |
|
|
| --- |
|
|
| ## π§ How to Use (Reasoning + JSON) |
| The snippet below: |
|
|
| 1. **Primes** the model with the *exact* Pydantic schema, so it outputs the right keys. |
| 2. Makes the model **think step-by-step** (reasoning) but still wraps the final JSON in an easy-to-parse container. |
| 3. Uses the correct repo name: `MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora`. |
|
|
| ```python |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| # QUICK-START |
| # Structured-data extraction with reasoning + JSON output |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline |
| import torch, json, textwrap, inspect |
| from pydantic import BaseModel |
| from typing import List, Optional |
| |
| MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora" |
| |
| # 1οΈβ£ Inline schema (keeps the LLM on-rails) βββββββββββββββββββββββββββββββββ |
| class MultipleChoice(BaseModel): |
| question: str |
| options: List[str] |
| selected: str |
| |
| class FormField(BaseModel): |
| fieldName: str |
| value: str |
| notes: Optional[str] = "" |
| |
| class Calculation(BaseModel): |
| formula: str |
| result: str |
| notes: Optional[str] = "" |
| |
| class Metadata(BaseModel): |
| reportDate: str |
| auditorId: Optional[str] = None |
| comments: Optional[str] = None |
| |
| class Content(BaseModel): |
| paragraphs: List[str] |
| tables: List["Table"] # assume Table defined elsewhere |
| checkboxes: List["Checkbox"] # γ |
| multipleChoice: List[MultipleChoice] |
| formFields: List[FormField] |
| calculations: List[Calculation] |
| metadata: Optional[Metadata] = Metadata(reportDate="") |
| |
| class Section(BaseModel): |
| id: str |
| title: str |
| content: Content |
| |
| class Document(BaseModel): |
| documentTitle: str |
| documentDate: str |
| sections: List[Section] |
| |
| SCHEMA_TEXT = inspect.getsource(Document) |
| |
| # 2οΈβ£ Build prompts ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| SYSTEM_PROMPT = textwrap.dedent(f""" |
| You are an expert **data-extraction assistant**. |
| Extract structured info from unstructured text **exactly** following the Pydantic schema. |
| |
| ββ Schema ββ |
| {SCHEMA_TEXT} |
| βββββββββββββ |
| |
| Rules: |
| 1. Follow the schema for keys & nesting. |
| 2. Copy values verbatim when possible. |
| 3. If a field is missing, return null. |
| 4. Output your step-by-step reasoning first. |
| 5. Then return ONLY the JSON inside this wrapper: |
| final answer[ json object: {{ ... }} ] |
| |
| Format: |
| <reasoning>β¦</reasoning> |
| <answer> |
| final answer[ json object: {{ β¦ }} ] |
| </answer> |
| """).strip() |
| |
| UNSTRUCTURED_TEXT = """ |
| 12 April 2025 β Onsite audit performed by Jane Smith. |
| Observations: Two fire extinguishers past expiry; emergency lights functional. |
| Calculations: Total extinguishers = 8, expired = 2 β 25 % overdue. |
| """ |
| |
| USER_PROMPT = textwrap.dedent(f""" |
| ### Task |
| Convert the following *hier* text to the schema. |
| |
| ### hier |
| {UNSTRUCTURED_TEXT} |
| """).strip() |
| |
| # 3οΈβ£ Generate βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| MODEL, |
| device_map="auto", |
| torch_dtype=torch.bfloat16 |
| ) |
| gen = pipeline("text-generation", model=model, tokenizer=tok, |
| max_new_tokens=512, do_sample=False) |
| |
| prompt = f"<|system|>\n{SYSTEM_PROMPT}\n<|user|>\n{USER_PROMPT}" |
| raw_out = gen(prompt)[0]["generated_text"] |
| |
| # 4οΈβ£ Slice out the JSON βββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| start = raw_out.find("final answer[") |
| end = raw_out.rfind("]") + 1 |
| json_text = raw_out[start:].split("json object:")[-1].strip(" []\n") |
| data = json.loads(json_text) # β
Raises if malformed |
| |
| print(raw_out) # reasoning + JSON |
| print("\nβ
Parsed object:\n", data) |
| ```` |
|
|
| ### Why it Works π§ |
|
|
| * **Schema-priming** ensures key-level fidelityβno βcreativeβ field names. |
| * **Chain-of-thought** improves factual extraction (was rewarded during GRPO). |
| * The `final answer[β¦]` wrapper makes downstream parsing a one-liner. |
|
|
| --- |
|
|
| ## ποΈ Training Recipe (Condensed) |
|
|
| | Setting | Value | |
| | -------------- | ------------------------------------------------------------------- | |
| | **Algorithm** | GRPO β policy β LM, reward LM β `Qwen2.5-7B` w/ JSON-validator head | |
| | **Epochs** | 3 (effective) | |
| | **Batch** | Grad-accum 8, bfloat16 | |
| | **Optimizer** | Fused AdamW | |
| | **Throughput** | β 45 k tokens/s on 8ΓA100 | |
|
|
| --- |
|
|
| ## π Evaluation (WIP) |
|
|
| | Metric | Status | |
| | ------------------------- | ------ | |
| | Exact-Match JSON Accuracy | π | |
| | Structural F1 | π | |
| | Valid-JSON Rate | π | |
|
|
| Stay tunedβnumbers landing faster than you can say βschema validation.β π°οΈ |
|
|
| --- |
|
|
| ## π€ Citation |
|
|
| ```bibtex |
| @misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo, |
| title = {An Unsloth-accelerated GRPO-trained Qwen 2.5-3B for JSON structuring}, |
| author = {MasterControlAIML}, |
| year = {2025}, |
| howpublished = {\url{https://huggingface.co/MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora}} |
| } |
| ``` |
|
|
| *May your JSON always parse and your losses always converge!* π |
|
|
| ``` |
| |
| |