--- base_model: - Qwen/Qwen2.5-14B library_name: peft model_name: FactualDPO_qwen25_delta100_last tags: - base_model:adapter:unsloth/qwen2.5-14b-instruct-unsloth-bnb-4bit - dpo - lora - transformers - trl - unsloth licence: license pipeline_tag: text-generation license: apache-2.0 --- # Factuality-Alignment-Qwen2.5-14B A factuality-aligned Large Language Model fine-tuned using **Factuality-Aware Direct Preference Optimization (Factual-DPO)** to reduce hallucinations while preserving preference alignment.
Website: Project Page | Paper: arXiv | Dataset: Hugging Face | Code: Github
--- ## 🧭 Background & Motivation Large Language Models optimized via preference learning (e.g., DPO, RLHF) often **over-prefer fluent but hallucinated responses**, especially when factual correctness is not explicitly supervised. **Factuality-Alignment-Qwen2.5-14B** addresses this limitation by applying **Factual-DPO**, a factuality-aware extension of Direct Preference Optimization that: - Integrates **explicit binary factuality supervision** - Penalizes preferences that favor hallucinated responses - Introduces **margin-based factual penalties (Δ)** for controllable hallucination suppression This model is fine-tuned from **Qwen2.5-14B-Instruct** using a large-scale, balanced, and synthetic factuality-aware preference dataset derived from Skywork Reward-Preference-80K. --- ## 🧠 What Is Factual-DPO? Standard DPO optimizes preference alignment without distinguishing whether the preferred response is factual. **Factual-DPO modifies the DPO objective by introducing factuality indicators**: - Each preference pair includes factuality labels `(h_w, h_l)` - A margin penalty `Δ` is applied when the preferred response is less factual - Optimization pressure shifts toward **factually correct preferences** ➡️ **Result**: Lower hallucination rates **without sacrificing preference win-rate or fluency**. --- ## ✨ Key Contributions - 🔍 **Binary factuality supervision** integrated into preference learning - 🧪 **Synthetic hallucination inversion** to balance factual vs hallucinated pairs - 📐 **Δ-margin factual penalties** for controllable hallucination suppression - ⚙️ **Config-driven, reproducible training and evaluation pipelines** - 📊 **Multi-model × multi-Δ benchmarking at scale** --- ## 🧪 Training Overview - **Base model**: Qwen2.5-14B-Instruct - **Training method**: Factuality-Aware DPO (QLoRA, 4-bit NF4) - **Frameworks**: TRL, Unsloth, Accelerate - **Hardware**: A100 / A40 GPUs - **Objective**: Reduce hallucinations while maintaining preference alignment Each Δ value produces a **separate fine-tuned checkpoint**, enabling controlled factuality–preference trade-offs. --- ## 📊 Evaluation Evaluation is performed using **GPT-4o-mini as an LLM-as-a-Judge**. **Metrics** | Metric | Description | |---------------|-------------| | factuality | Mean factual score | | halluc_rate | % outputs below factual threshold | | win_rate | Preference win-rate vs baseline | | count | Number of evaluated prompts | The Factual-DPO variants consistently show: - ↓ hallucination rate - ↑ factuality score - Comparable or improved preference win-rate --- ## 🚀 Usage Example ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "vector-institute/Factuality-Alignment-Qwen2.5-14B" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" ) prompt = "What are the causes of Type 1 diabetes?" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=256, temperature=0.7, do_sample=True ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` Citation: If you use this model please cite us ```bib @article{FactualAlignment2026, title={Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning}, author={Sindhuja Chaduvula, Ahmed Radwan, Azib Farooq, Yani Ioannou, Shaina Raza}, journal={arXiv preprint arXiv:2601.03027}, year={2026} } ```