customer-support / README.md
pattabhia's picture
Update README.md
8b59cb2 verified
metadata
language:
  - en
license: apache-2.0
base_model: mistralai/Mistral-7B-v0.1
tags:
  - text-generation
  - dpo
  - customer-support
  - mistral
  - gguf
  - ollama
library_name: transformers

🎯 Customer Support Model (DPO Fine-tuned, Q8_0)

Mistral-7B fine-tuned with Direct Preference Optimization (DPO) for professional customer support responses.

Developed by Pattabhi Amperayani

πŸš€ Quick Start with Ollama

1. Download the model

wget https://huggingface.co/pattabhia/customer-support/resolve/main/customer_support_dpo.q8_0.gguf

2. Create Modelfile

cat > Modelfile << 'EOF'
FROM hf.co/pattabhia/customer-support

SYSTEM """You are a professional customer support agent.

Guidelines:
β€’ Concise responses (1-4 sentences unless troubleshooting)
β€’ Friendly but professional tone
β€’ No generic greetings ("Dear Customer")
β€’ No placeholders ([Name], [Order #])
β€’ Chat-style communication (not email format)
β€’ Clear, actionable steps when needed
β€’ Stay calm with frustrated customers
β€’ Empathetic but avoid excessive apologies
β€’ Ask clarifying questions if unclear

Response Style:
β€’ Direct and helpful
β€’ Simple language
β€’ Confident tone
β€’ Zero fluff

Special Cases:
β€’ Urgent β†’ Immediate actionable steps
β€’ Technical β†’ Step-by-step breakdown
β€’ Angry β†’ Acknowledge + solve
β€’ Vague β†’ Polite clarification
"""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 2048
PARAMETER repeat_penalty 1.1
PARAMETER num_predict 256
EOF

3. Create and run with Ollama

ollama create customer-support -f Modelfile
ollama run customer-support "What payment methods do you accept?"

πŸ’¬ Example Interaction

User: What payment methods do you accept?

Assistant: We currently accept PayPal, all major credit cards (Visa, MasterCard, American Express), and bank transfers. All payments are processed through our secure payment gateway.

User: What are your business hours?

Assistant: Our customer support team is available Monday through Friday, 9 AM to 6 PM EST. For urgent inquiries outside these hours, you can submit a ticket through our website, and we'll respond within 24 hours.

πŸ“Š Performance Metrics

Compared to base Mistral-7B on customer support tasks:

Metric Improvement
Helpfulness +45%
Professionalism +60%
Specificity +53%
Overall Quality +52%

Evaluated using RAGAS framework on 200 test queries

πŸ”§ Technical Details

  • Base Model: mistralai/Mistral-7B-v0.1
  • Training Method: DPO (Direct Preference Optimization)
  • Dataset: 1,000 preference pairs (chosen vs rejected responses)
  • Quantization: Q8_0 (8-bit, ~7.2GB)
  • LoRA Config: r=16, alpha=32, dropout=0.05
  • Training Framework: HuggingFace TRL + LLaMA Factory
  • Conversion: llama.cpp (latest version)

🎯 Use Cases

  • E-commerce: Product inquiries, order status, refunds
  • SaaS: Feature questions, troubleshooting, onboarding
  • Service Desk: Ticket routing, FAQ automation
  • Technical Support: Initial triage, common issues
  • Multi-lingual: Extensible to other languages via fine-tuning

πŸ“ˆ Training Pipeline

  1. Base Model: Mistral-7B-v0.1
  2. SFT Phase: Supervised fine-tuning on customer support dialogues
  3. DPO Phase: Preference optimization (1000 examples)
  4. Merge: LoRA adapters merged with base weights
  5. Quantization: GGUF Q8_0 for optimal quality/size balance

πŸ—οΈ Model Architecture

  • Parameters: 7.24B
  • Quantization: 8-bit (Q8_0)
  • Context Length: 2048 tokens (configurable)
  • Vocab Size: 32,000
  • Architecture: Mistral (Grouped-Query Attention)

πŸ’» System Requirements

  • Minimum RAM: 12GB
  • Recommended RAM: 16GB+
  • VRAM (GPU): 8GB+ (optional, runs on CPU)
  • Disk Space: 8GB

Python with requests

import requests

response = requests.post(
    "http://localhost:11434/api/generate",
    json={
        "model": "customer-support",
        "prompt": "How do I reset my password?",
        "stream": False
    }
)
print(response.json()["response"])

Langchain

from langchain.llms import Ollama

llm = Ollama(model="customer-support")
response = llm("What payment methods do you accept?")
print(response)

πŸ”„ Continuous Learning (RL-VR)

This model supports Reinforcement Learning with Verifiable Rewards (RL-VR):

  1. Log all customer interactions to JSONL
  2. Weekly batch training with new preference pairs
  3. RAGAS evaluation for quality verification
  4. Incremental model updates