Kaanta / CACHE_FIX_EXPLAINED.md
Oluwaferanmi
This is the latest changes
dac20be
# Cache Location Mismatch - The Real Problem
## What You Observed
### Build Logs (✅ Success)
```
--> RUN python -c "from sentence_transformers import SentenceTransformer; ..."
Downloading embedding model...
Model cached successfully
DONE 5.5s
```
### Runtime Logs (❌ Failure)
```
[INFO] Pre-downloading embedding model: sentence-transformers/all-MiniLM-L6-v2
No sentence-transformers model found with name sentence-transformers/all-MiniLM-L6-v2
[WARN] Failed to cache embedding model: Can't load the model...
```
## The Problem
The model **downloads successfully during build** but **can't be found at runtime**. This is a **cache location mismatch**.
### What Happens
1. **During Docker Build (as root user):**
- Model downloads to: `/root/.cache/huggingface/`
- Build succeeds ✅
2. **During Runtime (as different user):**
- App looks for model in: `/root/.cache/huggingface/`
- Permission denied (different user can't access `/root/`)
- Falls back to downloading from Hugging Face
- Download fails (network/space constraints)
- RAG disabled ❌
### Why This Happens
Hugging Face Spaces runs containers with different users for build vs runtime:
- **Build time**: root user
- **Runtime**: non-root user (security)
Default cache locations:
- `sentence-transformers`: `~/.cache/torch/sentence_transformers/`
- `transformers`: `~/.cache/huggingface/`
- `~` = different paths for different users
## The Solution
### Set Explicit Cache Directories
Use `/app/.cache/` which is accessible to both build and runtime users.
### Implementation
#### 1. Dockerfile Changes
```dockerfile
# Set cache environment variables (accessible location)
ENV TRANSFORMERS_CACHE=/app/.cache/huggingface \
HF_HOME=/app/.cache/huggingface \
SENTENCE_TRANSFORMERS_HOME=/app/.cache/sentence-transformers
# Create directories with proper permissions
RUN mkdir -p /app/.cache/huggingface /app/.cache/sentence-transformers \
&& chmod -R 777 /app/.cache
# Download model to /app/.cache (not /root/.cache)
RUN python -c "from sentence_transformers import SentenceTransformer; \
SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
```
#### 2. orchestrator.py Changes
```python
# Ensure runtime uses same cache directories
os.environ.setdefault('TRANSFORMERS_CACHE', '/app/.cache/huggingface')
os.environ.setdefault('HF_HOME', '/app/.cache/huggingface')
os.environ.setdefault('SENTENCE_TRANSFORMERS_HOME', '/app/.cache/sentence-transformers')
```
## How This Fixes It
### Before (❌ Broken)
```
Build: /root/.cache/huggingface/ ← Model downloaded here
Runtime: /home/user/.cache/huggingface/ ← Looking here (empty!)
Result: Model not found, download fails
```
### After (✅ Fixed)
```
Build: /app/.cache/huggingface/ ← Model downloaded here
Runtime: /app/.cache/huggingface/ ← Looking here (found!)
Result: Model loaded successfully
```
## Expected Logs After Fix
### Build Logs
```
--> RUN python -c "from sentence_transformers import SentenceTransformer; ..."
Downloading embedding model...
Downloading (…)ce_transformers_config.json: 100%|██████████| 116/116
Downloading (…)_Pooling/config.json: 100%|██████████| 190/190
Downloading (…)b52ce780/config.json: 100%|██████████| 612/612
Downloading model.safetensors: 100%|██████████| 90.9M/90.9M
Model cached successfully
DONE 5.5s
```
### Runtime Logs
```
[INFO] Pre-downloading embedding model: sentence-transformers/all-MiniLM-L6-v2
[INFO] Embedding model cached successfully ← No download, uses cached model
[INFO] RAG pipeline initialized successfully
[INFO] Tax Optimizer initialized successfully
INFO: Application startup complete.
```
## Why Previous Attempt Failed
Your first fix downloaded the model during build, but didn't set the cache location:
- ✅ Model downloaded
- ❌ Downloaded to `/root/.cache/`
- ❌ Runtime couldn't access it
- ❌ Tried to re-download, failed
## Deploy the Fix
```bash
# Stage changes
git add Dockerfile orchestrator.py CACHE_FIX_EXPLAINED.md QUICK_FIX.md
# Commit
git commit -m "Fix: Set explicit cache directories for embedding model
- Set TRANSFORMERS_CACHE, HF_HOME, SENTENCE_TRANSFORMERS_HOME to /app/.cache
- Create cache directories with proper permissions
- Ensure build and runtime use same cache location
- Fixes model not found error at runtime"
# Push to Hugging Face
git push
```
## Verification
After deployment, check logs for:
### ✅ Success Indicators
```
[INFO] Embedding model cached successfully
[INFO] RAG pipeline initialized successfully
[INFO] Tax Optimizer initialized successfully
```
### ❌ Failure Indicators (if still broken)
```
No sentence-transformers model found with name...
[WARN] Failed to cache embedding model...
[WARN] RAG not initialized...
```
## Alternative Solutions (If This Still Fails)
### Option 1: Use Hugging Face Hub API
Instead of local model, use Hugging Face Inference API:
```python
from langchain_huggingface import HuggingFaceEndpointEmbeddings
embeddings = HuggingFaceEndpointEmbeddings(
model="sentence-transformers/all-MiniLM-L6-v2",
huggingfacehub_api_token=os.getenv("HF_TOKEN")
)
```
### Option 2: Use Smaller Model
```python
EMBED_MODEL = "sentence-transformers/paraphrase-MiniLM-L3-v2" # 61MB vs 400MB
```
### Option 3: Disable RAG
```bash
# In HF Space settings
DISABLE_RAG=true
```
## Technical Details
### Environment Variables Used
| Variable | Purpose | Value |
|----------|---------|-------|
| `TRANSFORMERS_CACHE` | Transformers library cache | `/app/.cache/huggingface` |
| `HF_HOME` | Hugging Face Hub cache | `/app/.cache/huggingface` |
| `SENTENCE_TRANSFORMERS_HOME` | Sentence Transformers cache | `/app/.cache/sentence-transformers` |
### Why `/app/.cache/`?
- `/app/` is the WORKDIR in Dockerfile
- Accessible to all users in container
- Persists across build and runtime
- Can set permissions explicitly
### Why `chmod -R 777`?
- Ensures all users can read/write
- Necessary for non-root runtime user
- Safe in container environment
- Alternative: use `chown` to set specific user
## Summary
**Problem**: Model cached in user-specific directory during build, inaccessible at runtime
**Solution**: Use shared `/app/.cache/` directory for both build and runtime
**Result**: Model loads instantly at runtime, no re-download needed
This is a common issue in Docker deployments with multi-stage builds or user switching.