Kaanta / QUICK_FIX.md
Oluwaferanmi
This is the latest changes
dac20be

Quick Fix for Hugging Face Deployment Error

The Error You're Seeing

[WARN] Failed to cache embedding model: Can't load the model for 'sentence-transformers/all-MiniLM-L6-v2'
[WARN] Embedding model not available. RAG disabled.
[INFO] Tax Optimizer disabled (requires RAG)

Root Cause

The model downloads successfully during build but fails to load at runtime because:

  • Build-time cache location ≠ Runtime cache location
  • Model cached in /root/.cache/ during build
  • Runtime looks in different location (permission/user mismatch)

The Fix (Already Applied ✅)

Set explicit cache directories so build-time and runtime use the same location.

What Changed

1. Dockerfile - Set cache environment variables:

ENV TRANSFORMERS_CACHE=/app/.cache/huggingface \
    HF_HOME=/app/.cache/huggingface \
    SENTENCE_TRANSFORMERS_HOME=/app/.cache/sentence-transformers

# Create cache directories with proper permissions
RUN mkdir -p /app/.cache/huggingface /app/.cache/sentence-transformers \
    && chmod -R 777 /app/.cache

# Pre-download model to /app/.cache
RUN python -c "from sentence_transformers import SentenceTransformer; \
    SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"

2. orchestrator.py - Use same cache at runtime:

# Set cache directories for model loading (must match Dockerfile)
os.environ.setdefault('TRANSFORMERS_CACHE', '/app/.cache/huggingface')
os.environ.setdefault('HF_HOME', '/app/.cache/huggingface')
os.environ.setdefault('SENTENCE_TRANSFORMERS_HOME', '/app/.cache/sentence-transformers')

Deploy Now

Step 1: Push to Hugging Face

git add Dockerfile orchestrator.py
git commit -m "Fix: Set explicit cache directories for embedding model"
git push

Step 2: Set Environment Variables

In Hugging Face Space Settings → Variables and secrets:

  • GROQ_API_KEY = your_groq_api_key (Secret)

Step 3: Wait for Build

  • Build takes ~5-10 minutes
  • Watch for "Model cached successfully" in build logs

Step 4: Verify

curl https://YOUR-SPACE-URL.hf.space/health

Should return:

{"status": "ok", "rag_ready": true}

If It Still Fails

Option 1: Use Smaller Model

Edit orchestrator.py line 35:

EMBED_MODEL = "sentence-transformers/paraphrase-MiniLM-L3-v2"  # 61MB instead of 400MB

Option 2: Disable RAG

Set environment variable:

DISABLE_RAG=true

Trade-off:

  • ✅ Service works immediately
  • ❌ No tax optimization
  • ❌ No Q&A features

Expected Logs After Fix

Build Logs (During Docker Build)

Downloading embedding model...
Model cached successfully

Runtime Logs (When Service Starts)

[INFO] Pre-downloading embedding model: sentence-transformers/all-MiniLM-L6-v2
[INFO] Embedding model cached successfully
[INFO] RAG pipeline initialized successfully
[INFO] Tax Optimizer initialized successfully
INFO:     Application startup complete.

Test Commands

Health Check

curl https://YOUR-SPACE-URL.hf.space/

Tax Optimization

curl -X POST https://YOUR-SPACE-URL.hf.space/v1/optimize \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "test",
    "transactions": [
      {
        "type": "credit",
        "amount": 500000,
        "narration": "SALARY",
        "date": "2025-01-31"
      }
    ],
    "tax_year": 2025
  }'

Summary

Fixed: Dockerfile now pre-downloads model
🚀 Action: Push to Hugging Face and wait for build
⏱️ Time: 5-10 minutes build time
Result: RAG and Tax Optimizer will work


Need help? Check DEPLOY_TO_HF.md for detailed troubleshooting.