customer-support / README.md

pattabhia

Update README.md

8b59cb2 verified 7 days ago

preview code

raw

history blame contribute delete

4.64 kB

metadata

language:
  - en
license: apache-2.0
base_model: mistralai/Mistral-7B-v0.1
tags:
  - text-generation
  - dpo
  - customer-support
  - mistral
  - gguf
  - ollama
library_name: transformers

🎯 Customer Support Model (DPO Fine-tuned, Q8_0)

Mistral-7B fine-tuned with Direct Preference Optimization (DPO) for professional customer support responses.

Developed by Pattabhi Amperayani

🚀 Quick Start with Ollama

1. Download the model

wget https://huggingface.co/pattabhia/customer-support/resolve/main/customer_support_dpo.q8_0.gguf

2. Create Modelfile

cat > Modelfile << 'EOF'
FROM hf.co/pattabhia/customer-support

SYSTEM """You are a professional customer support agent.

Guidelines:
• Concise responses (1-4 sentences unless troubleshooting)
• Friendly but professional tone
• No generic greetings ("Dear Customer")
• No placeholders ([Name], [Order #])
• Chat-style communication (not email format)
• Clear, actionable steps when needed
• Stay calm with frustrated customers
• Empathetic but avoid excessive apologies
• Ask clarifying questions if unclear

Response Style:
• Direct and helpful
• Simple language
• Confident tone
• Zero fluff

Special Cases:
• Urgent → Immediate actionable steps
• Technical → Step-by-step breakdown
• Angry → Acknowledge + solve
• Vague → Polite clarification
"""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 2048
PARAMETER repeat_penalty 1.1
PARAMETER num_predict 256
EOF

3. Create and run with Ollama

ollama create customer-support -f Modelfile
ollama run customer-support "What payment methods do you accept?"

💬 Example Interaction

User: What payment methods do you accept?

Assistant: We currently accept PayPal, all major credit cards (Visa, MasterCard, American Express), and bank transfers. All payments are processed through our secure payment gateway.

User: What are your business hours?

Assistant: Our customer support team is available Monday through Friday, 9 AM to 6 PM EST. For urgent inquiries outside these hours, you can submit a ticket through our website, and we'll respond within 24 hours.

📊 Performance Metrics

Compared to base Mistral-7B on customer support tasks:

Metric	Improvement
Helpfulness	+45%
Professionalism	+60%
Specificity	+53%
Overall Quality	+52%

Evaluated using RAGAS framework on 200 test queries

🔧 Technical Details

Base Model: mistralai/Mistral-7B-v0.1
Training Method: DPO (Direct Preference Optimization)
Dataset: 1,000 preference pairs (chosen vs rejected responses)
Quantization: Q8_0 (8-bit, ~7.2GB)
LoRA Config: r=16, alpha=32, dropout=0.05
Training Framework: HuggingFace TRL + LLaMA Factory
Conversion: llama.cpp (latest version)

🎯 Use Cases

E-commerce: Product inquiries, order status, refunds
SaaS: Feature questions, troubleshooting, onboarding
Service Desk: Ticket routing, FAQ automation
Technical Support: Initial triage, common issues
Multi-lingual: Extensible to other languages via fine-tuning

📈 Training Pipeline

Base Model: Mistral-7B-v0.1
SFT Phase: Supervised fine-tuning on customer support dialogues
DPO Phase: Preference optimization (1000 examples)
Merge: LoRA adapters merged with base weights
Quantization: GGUF Q8_0 for optimal quality/size balance

🏗️ Model Architecture

Parameters: 7.24B
Quantization: 8-bit (Q8_0)
Context Length: 2048 tokens (configurable)
Vocab Size: 32,000
Architecture: Mistral (Grouped-Query Attention)

💻 System Requirements

Minimum RAM: 12GB
Recommended RAM: 16GB+
VRAM (GPU): 8GB+ (optional, runs on CPU)
Disk Space: 8GB

Python with requests

import requests

response = requests.post(
    "http://localhost:11434/api/generate",
    json={
        "model": "customer-support",
        "prompt": "How do I reset my password?",
        "stream": False
    }
)
print(response.json()["response"])

Langchain

from langchain.llms import Ollama

llm = Ollama(model="customer-support")
response = llm("What payment methods do you accept?")
print(response)

🔄 Continuous Learning (RL-VR)

This model supports Reinforcement Learning with Verifiable Rewards (RL-VR):

Log all customer interactions to JSONL
Weekly batch training with new preference pairs
RAGAS evaluation for quality verification
Incremental model updates