---
base_model:
- Qwen/Qwen2.5-14B
library_name: peft
model_name: FactualDPO_qwen25_delta100_last
tags:
- base_model:adapter:unsloth/qwen2.5-14b-instruct-unsloth-bnb-4bit
- dpo
- lora
- transformers
- trl
- unsloth
licence: license
pipeline_tag: text-generation
license: apache-2.0
---

# Factuality-Alignment-Qwen2.5-14B

A factuality-aligned Large Language Model fine-tuned using **Factuality-Aware Direct Preference Optimization (Factual-DPO)** to reduce hallucinations while preserving preference alignment.

<p align="center">
  <strong>Website</strong>: <a href="https://vectorinstitute.github.io/factualdpo" target="_blank">Project Page</a>  
  &nbsp;|&nbsp;
  <strong>Paper</strong>: <a href="https://arxiv.org/abs/2505.11454" target="_blank">arXiv</a>  
  &nbsp;|&nbsp;
  <strong>Dataset</strong>: <a href="https://huggingface.co/datasets/vector-institute/Factuality_Alignment" target="_blank">Hugging Face</a>
  &nbsp;|&nbsp;
  <strong>Code</strong>: <a href="https://github.com/VectorInstitute/Factual-Preference-Alignment" target="_blank">Github</a>
</p>

---

## 🧭 Background & Motivation

Large Language Models optimized via preference learning (e.g., DPO, RLHF) often **over-prefer fluent but hallucinated responses**, especially when factual correctness is not explicitly supervised.

**Factuality-Alignment-Qwen2.5-14B** addresses this limitation by applying **Factual-DPO**, a factuality-aware extension of Direct Preference Optimization that:

- Integrates **explicit binary factuality supervision**
- Penalizes preferences that favor hallucinated responses
- Introduces **margin-based factual penalties (Δ)** for controllable hallucination suppression

This model is fine-tuned from **Qwen2.5-14B-Instruct** using a large-scale, balanced, and synthetic factuality-aware preference dataset derived from Skywork Reward-Preference-80K.

---

## 🧠 What Is Factual-DPO?

Standard DPO optimizes preference alignment without distinguishing whether the preferred response is factual.

**Factual-DPO modifies the DPO objective by introducing factuality indicators**:

- Each preference pair includes factuality labels `(h_w, h_l)`
- A margin penalty `Δ` is applied when the preferred response is less factual
- Optimization pressure shifts toward **factually correct preferences**

➡️ **Result**:  
Lower hallucination rates **without sacrificing preference win-rate or fluency**.

---

## ✨ Key Contributions

- 🔍 **Binary factuality supervision** integrated into preference learning  
- 🧪 **Synthetic hallucination inversion** to balance factual vs hallucinated pairs  
- 📐 **Δ-margin factual penalties** for controllable hallucination suppression  
- ⚙️ **Config-driven, reproducible training and evaluation pipelines**  
- 📊 **Multi-model × multi-Δ benchmarking at scale**

---

## 🧪 Training Overview

- **Base model**: Qwen2.5-14B-Instruct  
- **Training method**: Factuality-Aware DPO (QLoRA, 4-bit NF4)  
- **Frameworks**: TRL, Unsloth, Accelerate  
- **Hardware**: A100 / A40 GPUs  
- **Objective**: Reduce hallucinations while maintaining preference alignment  

Each Δ value produces a **separate fine-tuned checkpoint**, enabling controlled factuality–preference trade-offs.

---

## 📊 Evaluation

Evaluation is performed using **GPT-4o-mini as an LLM-as-a-Judge**.

**Metrics**
| Metric        | Description |
|---------------|-------------|
| factuality    | Mean factual score |
| halluc_rate   | % outputs below factual threshold |
| win_rate      | Preference win-rate vs baseline |
| count         | Number of evaluated prompts |

The Factual-DPO variants consistently show:
- ↓ hallucination rate  
- ↑ factuality score  
- Comparable or improved preference win-rate  

---

## 🚀 Usage Example

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "vector-institute/Factuality-Alignment-Qwen2.5-14B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "What are the causes of Type 1 diabetes?"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7,
        do_sample=True
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

Citation:
If you use this model please cite us

```bib
@article{FactualAlignment2026,
  title={Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning},
  author={Sindhuja Chaduvula, Ahmed Radwan, Azib Farooq, Yani Ioannou, Shaina Raza},
  journal={arXiv preprint arXiv:2601.03027},
  year={2026}
}
```