Kaanta / EMBEDDING_MODEL_FIX.md
Oluwaferanmi
This is the latest change
143273c
|
raw
history blame
3.62 kB

Embedding Model Issue - Solutions

Problem

[WARN] RAG not initialized: Can't load the model for 'sentence-transformers/all-MiniLM-L6-v2'

This happens because:

  1. Hugging Face Spaces has limited disk space (~50GB)
  2. The embedding model needs to download (~400MB)
  3. First-time downloads can fail due to network/space issues

Solutions

Option 1: Disable RAG (Recommended for Now)

Set environment variable in Hugging Face Space settings:

DISABLE_RAG=true

Result:

  • ✅ Service starts immediately
  • ✅ Tax calculations work perfectly
  • ❌ Tax optimization unavailable (requires RAG)
  • ❌ Tax Q&A unavailable (requires RAG)

Use this if: You only need tax calculations, not optimization recommendations.

Option 2: Wait for Model Download

The model will eventually download on subsequent restarts. It may take 2-3 tries.

Steps:

  1. Don't set DISABLE_RAG
  2. Restart the Space multiple times
  3. Check logs for: [INFO] Embedding model cached successfully

Use this if: You need full tax optimization features.

Option 3: Use Smaller Embedding Model

Change in orchestrator.py:

# Instead of:
EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"

# Use:
EMBED_MODEL = "sentence-transformers/all-MiniLM-L12-v2"  # Smaller, faster

Option 4: Pre-build Docker Image with Model

Add to Dockerfile:

# After RUN pip install...
RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"

This downloads the model during build time.

Current Status

The code now:

  1. ✅ Tries to pre-download the model
  2. ✅ Provides clear error messages
  3. ✅ Continues without RAG if model fails
  4. ✅ Supports DISABLE_RAG environment variable

What Works Without RAG

Feature Status
Tax Calculations (PIT, CIT, VAT) ✅ Works
Tax Rules Engine ✅ Works
/v1/query endpoint (calculations) ✅ Works
/v1/query endpoint (Q&A) ❌ Requires RAG
/v1/optimize endpoint ❌ Requires RAG
Transaction classification ⚠️ Basic patterns only

Recommendation for Production

For auth-backend integration:

Since you mainly need transaction classification and tax calculations (not Q&A), you have two options:

Option A: Disable RAG, Use Basic Classification

# In HF Space settings
DISABLE_RAG=true

Transaction classification will use pattern matching (salary, pension, rent, etc.) without LLM.

Option B: Wait for Model Download

Keep trying to restart until the model downloads successfully. This gives you full optimization features.

Testing After Fix

Test 1: Health Check

curl https://your-space.hf.space/health

Expected with RAG disabled:

{
  "status": "ok",
  "rag_ready": false
}

Test 2: Tax Calculation (Should Work)

curl -X POST https://your-space.hf.space/v1/query \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Calculate PIT for 5000000 annual income",
    "tax_type": "PIT"
  }'

Test 3: Tax Optimization (Requires RAG)

curl -X POST https://your-space.hf.space/v1/optimize \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "test",
    "transactions": [...],
    "tax_year": 2025
  }'

Will return 503 if RAG disabled.

Next Steps

  1. For immediate use: Set DISABLE_RAG=true
  2. For full features: Wait for model download or use Option 4 (pre-build)
  3. For production: Consider upgrading to Hugging Face Spaces Pro for more resources