# Deploy to Hugging Face Spaces - Complete Guide ## Problem Fixed ✅ **Issue:** Embedding model download fails at runtime in Hugging Face Spaces ``` [WARN] Failed to cache embedding model: Can't load the model for 'sentence-transformers/all-MiniLM-L6-v2' ``` **Solution:** Pre-download the model during Docker build (not runtime) ## What Changed ### Updated `Dockerfile` Added model pre-download step: ```dockerfile # Pre-download embedding model to avoid runtime errors in HF Spaces RUN python -c "from sentence_transformers import SentenceTransformer; \ print('Downloading embedding model...'); \ SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2'); \ print('Model cached successfully')" ``` This ensures the model is **cached in the Docker image** before deployment. ## Deployment Steps ### 1. Commit and Push Changes ```bash cd AIML111 # Stage the updated Dockerfile git add Dockerfile # Commit git commit -m "Fix: Pre-download embedding model in Docker build for HF Spaces" # Push to Hugging Face git push ``` ### 2. Set Environment Variables in Hugging Face Space Go to your Space settings and add: | Variable | Value | Type | |----------|-------|------| | `GROQ_API_KEY` | `your_groq_api_key` | Secret | | `VECTOR_STORE_DIR` | `/tmp/vector_store` | Variable | **Optional:** If you want to disable RAG entirely: | Variable | Value | Type | |----------|-------|------| | `DISABLE_RAG` | `true` | Variable | ### 3. Wait for Build The Space will rebuild automatically. This will take **5-10 minutes** because: - Docker image is being built - Embedding model is being downloaded (~400MB) - Dependencies are being installed ### 4. Verify Deployment #### Check Build Logs Look for these messages in the build logs: ``` Downloading embedding model... Model cached successfully ``` #### Check Runtime Logs After deployment, you should see: ``` [INFO] Pre-downloading embedding model: sentence-transformers/all-MiniLM-L6-v2 [INFO] Embedding model cached successfully [INFO] RAG pipeline initialized successfully [INFO] Tax Optimizer initialized successfully INFO: Application startup complete. ``` #### Test Health Endpoint ```bash curl https://YOUR-USERNAME-aiml111.hf.space/health ``` Expected response: ```json { "status": "ok", "rag_ready": true } ``` #### Test Root Endpoint ```bash curl https://YOUR-USERNAME-aiml111.hf.space/ ``` Expected response: ```json { "service": "Kaanta Tax Assistant", "version": "0.2.0", "rag_ready": true, "calculator_ready": true, "optimizer_ready": true, "docs_url": "/docs" } ``` ### 5. Test Optimization Endpoint ```bash curl -X POST https://YOUR-USERNAME-aiml111.hf.space/v1/optimize \ -H "Content-Type: application/json" \ -d '{ "user_id": "test_user", "transactions": [ { "type": "credit", "amount": 500000, "narration": "SALARY PAYMENT FROM ABC LTD", "date": "2025-01-31", "balance": 750000 }, { "type": "debit", "amount": 40000, "narration": "PENSION CONTRIBUTION TO XYZ PFA", "date": "2025-01-31", "balance": 710000 } ], "tax_year": 2025 }' ``` ## Troubleshooting ### Build Still Fails **Check 1: Build Timeout** - Hugging Face free tier has build time limits - Consider upgrading to Pro for longer builds **Check 2: Disk Space** - Free tier has ~50GB disk space - Model + dependencies need ~2GB - Should work on free tier **Check 3: Network Issues** - Hugging Face may have temporary network issues - Try rebuilding (Settings → Factory Reboot) ### Runtime Errors Persist **Option A: Use Smaller Model** Edit `orchestrator.py` line 35: ```python # Change from: EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2" # To: EMBED_MODEL = "sentence-transformers/paraphrase-MiniLM-L3-v2" # Smaller (61MB) ``` **Option B: Disable RAG Temporarily** Set environment variable: ```bash DISABLE_RAG=true ``` This will: - ✅ Tax calculations work - ❌ Tax optimization unavailable - ❌ Q&A unavailable ### Logs Show "Model cached successfully" but RAG Still Fails Check if PDF files are present: ```bash # In Space logs, look for: [WARN] No PDFs found under data. RAG disabled. ``` **Solution:** Ensure `data/` folder contains PDF files and is committed to git. ## Performance Optimization ### Cold Start - First request after deployment: ~10-30 seconds - Subsequent requests: ~1-3 seconds ### Persistent Storage For production, consider: - Hugging Face Spaces Pro (persistent storage) - External vector database (Pinecone, Weaviate) - Pre-built vector store in Docker image ## Integration with Backend Once deployed, update your backend `.env`: ```bash # Replace with your actual Space URL KAANTA_AI_BASE_URL=https://YOUR-USERNAME-aiml111.hf.space ``` Test from backend: ```bash curl -X POST http://localhost:5000/api/tax/optimize \ -H "Authorization: Bearer YOUR_JWT_TOKEN" \ -H "Content-Type: application/json" \ -d '{"tax_year": 2025}' ``` ## Expected Results After Fix ### ✅ What Should Work - [x] Service starts without errors - [x] Embedding model loads successfully - [x] RAG pipeline initializes - [x] Tax Optimizer available - [x] `/v1/optimize` endpoint works - [x] `/v1/query` endpoint works - [x] Transaction classification with LLM - [x] Tax strategy extraction from PDFs ### ❌ What Won't Work (If RAG Disabled) - [ ] Tax optimization recommendations - [ ] Q&A from tax documents - [ ] LLM-based transaction classification ## Next Steps 1. **Push the updated Dockerfile** to Hugging Face 2. **Wait for build to complete** (~5-10 minutes) 3. **Check logs** for "Model cached successfully" 4. **Test endpoints** using curl commands above 5. **Update backend** with Space URL 6. **Test end-to-end** from your frontend ## Support If issues persist after following this guide: 1. **Check Space Logs**: Settings → Logs 2. **Factory Reboot**: Settings → Factory Reboot 3. **Verify Environment Variables**: Settings → Variables and secrets 4. **Test Locally First**: Run with same Dockerfile locally 5. **Check PDF Files**: Ensure `data/` folder has PDFs ## Alternative: Use Hugging Face Inference API If Docker build continues to fail, consider using Hugging Face Inference API for embeddings: ```python # In orchestrator.py from langchain_huggingface import HuggingFaceEndpoint # Instead of local model, use API embedding_model = HuggingFaceEndpoint( repo_id="sentence-transformers/all-MiniLM-L6-v2", huggingfacehub_api_token=os.getenv("HF_TOKEN") ) ``` This requires `HF_TOKEN` environment variable but avoids local model storage.