Quick Fix for Hugging Face Deployment Error
The Error You're Seeing
[WARN] Failed to cache embedding model: Can't load the model for 'sentence-transformers/all-MiniLM-L6-v2'
[WARN] Embedding model not available. RAG disabled.
[INFO] Tax Optimizer disabled (requires RAG)
Root Cause
The model downloads successfully during build but fails to load at runtime because:
- Build-time cache location ≠ Runtime cache location
- Model cached in
/root/.cache/during build - Runtime looks in different location (permission/user mismatch)
The Fix (Already Applied ✅)
Set explicit cache directories so build-time and runtime use the same location.
What Changed
1. Dockerfile - Set cache environment variables:
ENV TRANSFORMERS_CACHE=/app/.cache/huggingface \
HF_HOME=/app/.cache/huggingface \
SENTENCE_TRANSFORMERS_HOME=/app/.cache/sentence-transformers
# Create cache directories with proper permissions
RUN mkdir -p /app/.cache/huggingface /app/.cache/sentence-transformers \
&& chmod -R 777 /app/.cache
# Pre-download model to /app/.cache
RUN python -c "from sentence_transformers import SentenceTransformer; \
SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
2. orchestrator.py - Use same cache at runtime:
# Set cache directories for model loading (must match Dockerfile)
os.environ.setdefault('TRANSFORMERS_CACHE', '/app/.cache/huggingface')
os.environ.setdefault('HF_HOME', '/app/.cache/huggingface')
os.environ.setdefault('SENTENCE_TRANSFORMERS_HOME', '/app/.cache/sentence-transformers')
Deploy Now
Step 1: Push to Hugging Face
git add Dockerfile orchestrator.py
git commit -m "Fix: Set explicit cache directories for embedding model"
git push
Step 2: Set Environment Variables
In Hugging Face Space Settings → Variables and secrets:
GROQ_API_KEY= your_groq_api_key (Secret)
Step 3: Wait for Build
- Build takes ~5-10 minutes
- Watch for "Model cached successfully" in build logs
Step 4: Verify
curl https://YOUR-SPACE-URL.hf.space/health
Should return:
{"status": "ok", "rag_ready": true}
If It Still Fails
Option 1: Use Smaller Model
Edit orchestrator.py line 35:
EMBED_MODEL = "sentence-transformers/paraphrase-MiniLM-L3-v2" # 61MB instead of 400MB
Option 2: Disable RAG
Set environment variable:
DISABLE_RAG=true
Trade-off:
- ✅ Service works immediately
- ❌ No tax optimization
- ❌ No Q&A features
Expected Logs After Fix
Build Logs (During Docker Build)
Downloading embedding model...
Model cached successfully
Runtime Logs (When Service Starts)
[INFO] Pre-downloading embedding model: sentence-transformers/all-MiniLM-L6-v2
[INFO] Embedding model cached successfully
[INFO] RAG pipeline initialized successfully
[INFO] Tax Optimizer initialized successfully
INFO: Application startup complete.
Test Commands
Health Check
curl https://YOUR-SPACE-URL.hf.space/
Tax Optimization
curl -X POST https://YOUR-SPACE-URL.hf.space/v1/optimize \
-H "Content-Type: application/json" \
-d '{
"user_id": "test",
"transactions": [
{
"type": "credit",
"amount": 500000,
"narration": "SALARY",
"date": "2025-01-31"
}
],
"tax_year": 2025
}'
Summary
✅ Fixed: Dockerfile now pre-downloads model
🚀 Action: Push to Hugging Face and wait for build
⏱️ Time: 5-10 minutes build time
✅ Result: RAG and Tax Optimizer will work
Need help? Check DEPLOY_TO_HF.md for detailed troubleshooting.