Cache Location Mismatch - The Real Problem
What You Observed
Build Logs (✅ Success)
--> RUN python -c "from sentence_transformers import SentenceTransformer; ..."
Downloading embedding model...
Model cached successfully
DONE 5.5s
Runtime Logs (❌ Failure)
[INFO] Pre-downloading embedding model: sentence-transformers/all-MiniLM-L6-v2
No sentence-transformers model found with name sentence-transformers/all-MiniLM-L6-v2
[WARN] Failed to cache embedding model: Can't load the model...
The Problem
The model downloads successfully during build but can't be found at runtime. This is a cache location mismatch.
What Happens
During Docker Build (as root user):
- Model downloads to:
/root/.cache/huggingface/ - Build succeeds ✅
- Model downloads to:
During Runtime (as different user):
- App looks for model in:
/root/.cache/huggingface/ - Permission denied (different user can't access
/root/) - Falls back to downloading from Hugging Face
- Download fails (network/space constraints)
- RAG disabled ❌
- App looks for model in:
Why This Happens
Hugging Face Spaces runs containers with different users for build vs runtime:
- Build time: root user
- Runtime: non-root user (security)
Default cache locations:
sentence-transformers:~/.cache/torch/sentence_transformers/transformers:~/.cache/huggingface/~= different paths for different users
The Solution
Set Explicit Cache Directories
Use /app/.cache/ which is accessible to both build and runtime users.
Implementation
1. Dockerfile Changes
# Set cache environment variables (accessible location)
ENV TRANSFORMERS_CACHE=/app/.cache/huggingface \
HF_HOME=/app/.cache/huggingface \
SENTENCE_TRANSFORMERS_HOME=/app/.cache/sentence-transformers
# Create directories with proper permissions
RUN mkdir -p /app/.cache/huggingface /app/.cache/sentence-transformers \
&& chmod -R 777 /app/.cache
# Download model to /app/.cache (not /root/.cache)
RUN python -c "from sentence_transformers import SentenceTransformer; \
SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
2. orchestrator.py Changes
# Ensure runtime uses same cache directories
os.environ.setdefault('TRANSFORMERS_CACHE', '/app/.cache/huggingface')
os.environ.setdefault('HF_HOME', '/app/.cache/huggingface')
os.environ.setdefault('SENTENCE_TRANSFORMERS_HOME', '/app/.cache/sentence-transformers')
How This Fixes It
Before (❌ Broken)
Build: /root/.cache/huggingface/ ← Model downloaded here
Runtime: /home/user/.cache/huggingface/ ← Looking here (empty!)
Result: Model not found, download fails
After (✅ Fixed)
Build: /app/.cache/huggingface/ ← Model downloaded here
Runtime: /app/.cache/huggingface/ ← Looking here (found!)
Result: Model loaded successfully
Expected Logs After Fix
Build Logs
--> RUN python -c "from sentence_transformers import SentenceTransformer; ..."
Downloading embedding model...
Downloading (…)ce_transformers_config.json: 100%|██████████| 116/116
Downloading (…)_Pooling/config.json: 100%|██████████| 190/190
Downloading (…)b52ce780/config.json: 100%|██████████| 612/612
Downloading model.safetensors: 100%|██████████| 90.9M/90.9M
Model cached successfully
DONE 5.5s
Runtime Logs
[INFO] Pre-downloading embedding model: sentence-transformers/all-MiniLM-L6-v2
[INFO] Embedding model cached successfully ← No download, uses cached model
[INFO] RAG pipeline initialized successfully
[INFO] Tax Optimizer initialized successfully
INFO: Application startup complete.
Why Previous Attempt Failed
Your first fix downloaded the model during build, but didn't set the cache location:
- ✅ Model downloaded
- ❌ Downloaded to
/root/.cache/ - ❌ Runtime couldn't access it
- ❌ Tried to re-download, failed
Deploy the Fix
# Stage changes
git add Dockerfile orchestrator.py CACHE_FIX_EXPLAINED.md QUICK_FIX.md
# Commit
git commit -m "Fix: Set explicit cache directories for embedding model
- Set TRANSFORMERS_CACHE, HF_HOME, SENTENCE_TRANSFORMERS_HOME to /app/.cache
- Create cache directories with proper permissions
- Ensure build and runtime use same cache location
- Fixes model not found error at runtime"
# Push to Hugging Face
git push
Verification
After deployment, check logs for:
✅ Success Indicators
[INFO] Embedding model cached successfully
[INFO] RAG pipeline initialized successfully
[INFO] Tax Optimizer initialized successfully
❌ Failure Indicators (if still broken)
No sentence-transformers model found with name...
[WARN] Failed to cache embedding model...
[WARN] RAG not initialized...
Alternative Solutions (If This Still Fails)
Option 1: Use Hugging Face Hub API
Instead of local model, use Hugging Face Inference API:
from langchain_huggingface import HuggingFaceEndpointEmbeddings
embeddings = HuggingFaceEndpointEmbeddings(
model="sentence-transformers/all-MiniLM-L6-v2",
huggingfacehub_api_token=os.getenv("HF_TOKEN")
)
Option 2: Use Smaller Model
EMBED_MODEL = "sentence-transformers/paraphrase-MiniLM-L3-v2" # 61MB vs 400MB
Option 3: Disable RAG
# In HF Space settings
DISABLE_RAG=true
Technical Details
Environment Variables Used
| Variable | Purpose | Value |
|---|---|---|
TRANSFORMERS_CACHE |
Transformers library cache | /app/.cache/huggingface |
HF_HOME |
Hugging Face Hub cache | /app/.cache/huggingface |
SENTENCE_TRANSFORMERS_HOME |
Sentence Transformers cache | /app/.cache/sentence-transformers |
Why /app/.cache/?
/app/is the WORKDIR in Dockerfile- Accessible to all users in container
- Persists across build and runtime
- Can set permissions explicitly
Why chmod -R 777?
- Ensures all users can read/write
- Necessary for non-root runtime user
- Safe in container environment
- Alternative: use
chownto set specific user
Summary
Problem: Model cached in user-specific directory during build, inaccessible at runtime
Solution: Use shared /app/.cache/ directory for both build and runtime
Result: Model loads instantly at runtime, no re-download needed
This is a common issue in Docker deployments with multi-stage builds or user switching.