| # Cache Location Mismatch - The Real Problem | |
| ## What You Observed | |
| ### Build Logs (✅ Success) | |
| ``` | |
| --> RUN python -c "from sentence_transformers import SentenceTransformer; ..." | |
| Downloading embedding model... | |
| Model cached successfully | |
| DONE 5.5s | |
| ``` | |
| ### Runtime Logs (❌ Failure) | |
| ``` | |
| [INFO] Pre-downloading embedding model: sentence-transformers/all-MiniLM-L6-v2 | |
| No sentence-transformers model found with name sentence-transformers/all-MiniLM-L6-v2 | |
| [WARN] Failed to cache embedding model: Can't load the model... | |
| ``` | |
| ## The Problem | |
| The model **downloads successfully during build** but **can't be found at runtime**. This is a **cache location mismatch**. | |
| ### What Happens | |
| 1. **During Docker Build (as root user):** | |
| - Model downloads to: `/root/.cache/huggingface/` | |
| - Build succeeds ✅ | |
| 2. **During Runtime (as different user):** | |
| - App looks for model in: `/root/.cache/huggingface/` | |
| - Permission denied (different user can't access `/root/`) | |
| - Falls back to downloading from Hugging Face | |
| - Download fails (network/space constraints) | |
| - RAG disabled ❌ | |
| ### Why This Happens | |
| Hugging Face Spaces runs containers with different users for build vs runtime: | |
| - **Build time**: root user | |
| - **Runtime**: non-root user (security) | |
| Default cache locations: | |
| - `sentence-transformers`: `~/.cache/torch/sentence_transformers/` | |
| - `transformers`: `~/.cache/huggingface/` | |
| - `~` = different paths for different users | |
| ## The Solution | |
| ### Set Explicit Cache Directories | |
| Use `/app/.cache/` which is accessible to both build and runtime users. | |
| ### Implementation | |
| #### 1. Dockerfile Changes | |
| ```dockerfile | |
| # Set cache environment variables (accessible location) | |
| ENV TRANSFORMERS_CACHE=/app/.cache/huggingface \ | |
| HF_HOME=/app/.cache/huggingface \ | |
| SENTENCE_TRANSFORMERS_HOME=/app/.cache/sentence-transformers | |
| # Create directories with proper permissions | |
| RUN mkdir -p /app/.cache/huggingface /app/.cache/sentence-transformers \ | |
| && chmod -R 777 /app/.cache | |
| # Download model to /app/.cache (not /root/.cache) | |
| RUN python -c "from sentence_transformers import SentenceTransformer; \ | |
| SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')" | |
| ``` | |
| #### 2. orchestrator.py Changes | |
| ```python | |
| # Ensure runtime uses same cache directories | |
| os.environ.setdefault('TRANSFORMERS_CACHE', '/app/.cache/huggingface') | |
| os.environ.setdefault('HF_HOME', '/app/.cache/huggingface') | |
| os.environ.setdefault('SENTENCE_TRANSFORMERS_HOME', '/app/.cache/sentence-transformers') | |
| ``` | |
| ## How This Fixes It | |
| ### Before (❌ Broken) | |
| ``` | |
| Build: /root/.cache/huggingface/ ← Model downloaded here | |
| Runtime: /home/user/.cache/huggingface/ ← Looking here (empty!) | |
| Result: Model not found, download fails | |
| ``` | |
| ### After (✅ Fixed) | |
| ``` | |
| Build: /app/.cache/huggingface/ ← Model downloaded here | |
| Runtime: /app/.cache/huggingface/ ← Looking here (found!) | |
| Result: Model loaded successfully | |
| ``` | |
| ## Expected Logs After Fix | |
| ### Build Logs | |
| ``` | |
| --> RUN python -c "from sentence_transformers import SentenceTransformer; ..." | |
| Downloading embedding model... | |
| Downloading (…)ce_transformers_config.json: 100%|██████████| 116/116 | |
| Downloading (…)_Pooling/config.json: 100%|██████████| 190/190 | |
| Downloading (…)b52ce780/config.json: 100%|██████████| 612/612 | |
| Downloading model.safetensors: 100%|██████████| 90.9M/90.9M | |
| Model cached successfully | |
| DONE 5.5s | |
| ``` | |
| ### Runtime Logs | |
| ``` | |
| [INFO] Pre-downloading embedding model: sentence-transformers/all-MiniLM-L6-v2 | |
| [INFO] Embedding model cached successfully ← No download, uses cached model | |
| [INFO] RAG pipeline initialized successfully | |
| [INFO] Tax Optimizer initialized successfully | |
| INFO: Application startup complete. | |
| ``` | |
| ## Why Previous Attempt Failed | |
| Your first fix downloaded the model during build, but didn't set the cache location: | |
| - ✅ Model downloaded | |
| - ❌ Downloaded to `/root/.cache/` | |
| - ❌ Runtime couldn't access it | |
| - ❌ Tried to re-download, failed | |
| ## Deploy the Fix | |
| ```bash | |
| # Stage changes | |
| git add Dockerfile orchestrator.py CACHE_FIX_EXPLAINED.md QUICK_FIX.md | |
| # Commit | |
| git commit -m "Fix: Set explicit cache directories for embedding model | |
| - Set TRANSFORMERS_CACHE, HF_HOME, SENTENCE_TRANSFORMERS_HOME to /app/.cache | |
| - Create cache directories with proper permissions | |
| - Ensure build and runtime use same cache location | |
| - Fixes model not found error at runtime" | |
| # Push to Hugging Face | |
| git push | |
| ``` | |
| ## Verification | |
| After deployment, check logs for: | |
| ### ✅ Success Indicators | |
| ``` | |
| [INFO] Embedding model cached successfully | |
| [INFO] RAG pipeline initialized successfully | |
| [INFO] Tax Optimizer initialized successfully | |
| ``` | |
| ### ❌ Failure Indicators (if still broken) | |
| ``` | |
| No sentence-transformers model found with name... | |
| [WARN] Failed to cache embedding model... | |
| [WARN] RAG not initialized... | |
| ``` | |
| ## Alternative Solutions (If This Still Fails) | |
| ### Option 1: Use Hugging Face Hub API | |
| Instead of local model, use Hugging Face Inference API: | |
| ```python | |
| from langchain_huggingface import HuggingFaceEndpointEmbeddings | |
| embeddings = HuggingFaceEndpointEmbeddings( | |
| model="sentence-transformers/all-MiniLM-L6-v2", | |
| huggingfacehub_api_token=os.getenv("HF_TOKEN") | |
| ) | |
| ``` | |
| ### Option 2: Use Smaller Model | |
| ```python | |
| EMBED_MODEL = "sentence-transformers/paraphrase-MiniLM-L3-v2" # 61MB vs 400MB | |
| ``` | |
| ### Option 3: Disable RAG | |
| ```bash | |
| # In HF Space settings | |
| DISABLE_RAG=true | |
| ``` | |
| ## Technical Details | |
| ### Environment Variables Used | |
| | Variable | Purpose | Value | | |
| |----------|---------|-------| | |
| | `TRANSFORMERS_CACHE` | Transformers library cache | `/app/.cache/huggingface` | | |
| | `HF_HOME` | Hugging Face Hub cache | `/app/.cache/huggingface` | | |
| | `SENTENCE_TRANSFORMERS_HOME` | Sentence Transformers cache | `/app/.cache/sentence-transformers` | | |
| ### Why `/app/.cache/`? | |
| - `/app/` is the WORKDIR in Dockerfile | |
| - Accessible to all users in container | |
| - Persists across build and runtime | |
| - Can set permissions explicitly | |
| ### Why `chmod -R 777`? | |
| - Ensures all users can read/write | |
| - Necessary for non-root runtime user | |
| - Safe in container environment | |
| - Alternative: use `chown` to set specific user | |
| ## Summary | |
| **Problem**: Model cached in user-specific directory during build, inaccessible at runtime | |
| **Solution**: Use shared `/app/.cache/` directory for both build and runtime | |
| **Result**: Model loads instantly at runtime, no re-download needed | |
| This is a common issue in Docker deployments with multi-stage builds or user switching. | |