# Deploy to Hugging Face Spaces - Complete Guide

## Problem Fixed ✅

**Issue:** Embedding model download fails at runtime in Hugging Face Spaces
```
[WARN] Failed to cache embedding model: Can't load the model for 'sentence-transformers/all-MiniLM-L6-v2'
```

**Solution:** Pre-download the model during Docker build (not runtime)

## What Changed

### Updated `Dockerfile`
Added model pre-download step:
```dockerfile
# Pre-download embedding model to avoid runtime errors in HF Spaces
RUN python -c "from sentence_transformers import SentenceTransformer; \
    print('Downloading embedding model...'); \
    SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2'); \
    print('Model cached successfully')"
```

This ensures the model is **cached in the Docker image** before deployment.

## Deployment Steps

### 1. Commit and Push Changes

```bash
cd AIML111

# Stage the updated Dockerfile
git add Dockerfile

# Commit
git commit -m "Fix: Pre-download embedding model in Docker build for HF Spaces"

# Push to Hugging Face
git push
```

### 2. Set Environment Variables in Hugging Face Space

Go to your Space settings and add:

| Variable | Value | Type |
|----------|-------|------|
| `GROQ_API_KEY` | `your_groq_api_key` | Secret |
| `VECTOR_STORE_DIR` | `/tmp/vector_store` | Variable |

**Optional:** If you want to disable RAG entirely:
| Variable | Value | Type |
|----------|-------|------|
| `DISABLE_RAG` | `true` | Variable |

### 3. Wait for Build

The Space will rebuild automatically. This will take **5-10 minutes** because:
- Docker image is being built
- Embedding model is being downloaded (~400MB)
- Dependencies are being installed

### 4. Verify Deployment

#### Check Build Logs
Look for these messages in the build logs:
```
Downloading embedding model...
Model cached successfully
```

#### Check Runtime Logs
After deployment, you should see:
```
[INFO] Pre-downloading embedding model: sentence-transformers/all-MiniLM-L6-v2
[INFO] Embedding model cached successfully
[INFO] RAG pipeline initialized successfully
[INFO] Tax Optimizer initialized successfully
INFO:     Application startup complete.
```

#### Test Health Endpoint
```bash
curl https://YOUR-USERNAME-aiml111.hf.space/health
```

Expected response:
```json
{
  "status": "ok",
  "rag_ready": true
}
```

#### Test Root Endpoint
```bash
curl https://YOUR-USERNAME-aiml111.hf.space/
```

Expected response:
```json
{
  "service": "Kaanta Tax Assistant",
  "version": "0.2.0",
  "rag_ready": true,
  "calculator_ready": true,
  "optimizer_ready": true,
  "docs_url": "/docs"
}
```

### 5. Test Optimization Endpoint

```bash
curl -X POST https://YOUR-USERNAME-aiml111.hf.space/v1/optimize \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "test_user",
    "transactions": [
      {
        "type": "credit",
        "amount": 500000,
        "narration": "SALARY PAYMENT FROM ABC LTD",
        "date": "2025-01-31",
        "balance": 750000
      },
      {
        "type": "debit",
        "amount": 40000,
        "narration": "PENSION CONTRIBUTION TO XYZ PFA",
        "date": "2025-01-31",
        "balance": 710000
      }
    ],
    "tax_year": 2025
  }'
```

## Troubleshooting

### Build Still Fails

**Check 1: Build Timeout**
- Hugging Face free tier has build time limits
- Consider upgrading to Pro for longer builds

**Check 2: Disk Space**
- Free tier has ~50GB disk space
- Model + dependencies need ~2GB
- Should work on free tier

**Check 3: Network Issues**
- Hugging Face may have temporary network issues
- Try rebuilding (Settings → Factory Reboot)

### Runtime Errors Persist

**Option A: Use Smaller Model**

Edit `orchestrator.py` line 35:
```python
# Change from:
EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"

# To:
EMBED_MODEL = "sentence-transformers/paraphrase-MiniLM-L3-v2"  # Smaller (61MB)
```

**Option B: Disable RAG Temporarily**

Set environment variable:
```bash
DISABLE_RAG=true
```

This will:
- ✅ Tax calculations work
- ❌ Tax optimization unavailable
- ❌ Q&A unavailable

### Logs Show "Model cached successfully" but RAG Still Fails

Check if PDF files are present:
```bash
# In Space logs, look for:
[WARN] No PDFs found under data. RAG disabled.
```

**Solution:** Ensure `data/` folder contains PDF files and is committed to git.

## Performance Optimization

### Cold Start
- First request after deployment: ~10-30 seconds
- Subsequent requests: ~1-3 seconds

### Persistent Storage
For production, consider:
- Hugging Face Spaces Pro (persistent storage)
- External vector database (Pinecone, Weaviate)
- Pre-built vector store in Docker image

## Integration with Backend

Once deployed, update your backend `.env`:

```bash
# Replace with your actual Space URL
KAANTA_AI_BASE_URL=https://YOUR-USERNAME-aiml111.hf.space
```

Test from backend:
```bash
curl -X POST http://localhost:5000/api/tax/optimize \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"tax_year": 2025}'
```

## Expected Results After Fix

### ✅ What Should Work
- [x] Service starts without errors
- [x] Embedding model loads successfully
- [x] RAG pipeline initializes
- [x] Tax Optimizer available
- [x] `/v1/optimize` endpoint works
- [x] `/v1/query` endpoint works
- [x] Transaction classification with LLM
- [x] Tax strategy extraction from PDFs

### ❌ What Won't Work (If RAG Disabled)
- [ ] Tax optimization recommendations
- [ ] Q&A from tax documents
- [ ] LLM-based transaction classification

## Next Steps

1. **Push the updated Dockerfile** to Hugging Face
2. **Wait for build to complete** (~5-10 minutes)
3. **Check logs** for "Model cached successfully"
4. **Test endpoints** using curl commands above
5. **Update backend** with Space URL
6. **Test end-to-end** from your frontend

## Support

If issues persist after following this guide:

1. **Check Space Logs**: Settings → Logs
2. **Factory Reboot**: Settings → Factory Reboot
3. **Verify Environment Variables**: Settings → Variables and secrets
4. **Test Locally First**: Run with same Dockerfile locally
5. **Check PDF Files**: Ensure `data/` folder has PDFs

## Alternative: Use Hugging Face Inference API

If Docker build continues to fail, consider using Hugging Face Inference API for embeddings:

```python
# In orchestrator.py
from langchain_huggingface import HuggingFaceEndpoint

# Instead of local model, use API
embedding_model = HuggingFaceEndpoint(
    repo_id="sentence-transformers/all-MiniLM-L6-v2",
    huggingfacehub_api_token=os.getenv("HF_TOKEN")
)
```

This requires `HF_TOKEN` environment variable but avoids local model storage.