Kaanta / CACHE_FIX_EXPLAINED.md
Oluwaferanmi
This is the latest changes
dac20be

Cache Location Mismatch - The Real Problem

What You Observed

Build Logs (✅ Success)

--> RUN python -c "from sentence_transformers import SentenceTransformer; ..."
Downloading embedding model...
Model cached successfully
DONE 5.5s

Runtime Logs (❌ Failure)

[INFO] Pre-downloading embedding model: sentence-transformers/all-MiniLM-L6-v2
No sentence-transformers model found with name sentence-transformers/all-MiniLM-L6-v2
[WARN] Failed to cache embedding model: Can't load the model...

The Problem

The model downloads successfully during build but can't be found at runtime. This is a cache location mismatch.

What Happens

  1. During Docker Build (as root user):

    • Model downloads to: /root/.cache/huggingface/
    • Build succeeds ✅
  2. During Runtime (as different user):

    • App looks for model in: /root/.cache/huggingface/
    • Permission denied (different user can't access /root/)
    • Falls back to downloading from Hugging Face
    • Download fails (network/space constraints)
    • RAG disabled ❌

Why This Happens

Hugging Face Spaces runs containers with different users for build vs runtime:

  • Build time: root user
  • Runtime: non-root user (security)

Default cache locations:

  • sentence-transformers: ~/.cache/torch/sentence_transformers/
  • transformers: ~/.cache/huggingface/
  • ~ = different paths for different users

The Solution

Set Explicit Cache Directories

Use /app/.cache/ which is accessible to both build and runtime users.

Implementation

1. Dockerfile Changes

# Set cache environment variables (accessible location)
ENV TRANSFORMERS_CACHE=/app/.cache/huggingface \
    HF_HOME=/app/.cache/huggingface \
    SENTENCE_TRANSFORMERS_HOME=/app/.cache/sentence-transformers

# Create directories with proper permissions
RUN mkdir -p /app/.cache/huggingface /app/.cache/sentence-transformers \
    && chmod -R 777 /app/.cache

# Download model to /app/.cache (not /root/.cache)
RUN python -c "from sentence_transformers import SentenceTransformer; \
    SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"

2. orchestrator.py Changes

# Ensure runtime uses same cache directories
os.environ.setdefault('TRANSFORMERS_CACHE', '/app/.cache/huggingface')
os.environ.setdefault('HF_HOME', '/app/.cache/huggingface')
os.environ.setdefault('SENTENCE_TRANSFORMERS_HOME', '/app/.cache/sentence-transformers')

How This Fixes It

Before (❌ Broken)

Build:   /root/.cache/huggingface/  ← Model downloaded here
Runtime: /home/user/.cache/huggingface/  ← Looking here (empty!)
Result:  Model not found, download fails

After (✅ Fixed)

Build:   /app/.cache/huggingface/  ← Model downloaded here
Runtime: /app/.cache/huggingface/  ← Looking here (found!)
Result:  Model loaded successfully

Expected Logs After Fix

Build Logs

--> RUN python -c "from sentence_transformers import SentenceTransformer; ..."
Downloading embedding model...
Downloading (…)ce_transformers_config.json: 100%|██████████| 116/116
Downloading (…)_Pooling/config.json: 100%|██████████| 190/190
Downloading (…)b52ce780/config.json: 100%|██████████| 612/612
Downloading model.safetensors: 100%|██████████| 90.9M/90.9M
Model cached successfully
DONE 5.5s

Runtime Logs

[INFO] Pre-downloading embedding model: sentence-transformers/all-MiniLM-L6-v2
[INFO] Embedding model cached successfully  ← No download, uses cached model
[INFO] RAG pipeline initialized successfully
[INFO] Tax Optimizer initialized successfully
INFO:     Application startup complete.

Why Previous Attempt Failed

Your first fix downloaded the model during build, but didn't set the cache location:

  • ✅ Model downloaded
  • ❌ Downloaded to /root/.cache/
  • ❌ Runtime couldn't access it
  • ❌ Tried to re-download, failed

Deploy the Fix

# Stage changes
git add Dockerfile orchestrator.py CACHE_FIX_EXPLAINED.md QUICK_FIX.md

# Commit
git commit -m "Fix: Set explicit cache directories for embedding model

- Set TRANSFORMERS_CACHE, HF_HOME, SENTENCE_TRANSFORMERS_HOME to /app/.cache
- Create cache directories with proper permissions
- Ensure build and runtime use same cache location
- Fixes model not found error at runtime"

# Push to Hugging Face
git push

Verification

After deployment, check logs for:

✅ Success Indicators

[INFO] Embedding model cached successfully
[INFO] RAG pipeline initialized successfully
[INFO] Tax Optimizer initialized successfully

❌ Failure Indicators (if still broken)

No sentence-transformers model found with name...
[WARN] Failed to cache embedding model...
[WARN] RAG not initialized...

Alternative Solutions (If This Still Fails)

Option 1: Use Hugging Face Hub API

Instead of local model, use Hugging Face Inference API:

from langchain_huggingface import HuggingFaceEndpointEmbeddings

embeddings = HuggingFaceEndpointEmbeddings(
    model="sentence-transformers/all-MiniLM-L6-v2",
    huggingfacehub_api_token=os.getenv("HF_TOKEN")
)

Option 2: Use Smaller Model

EMBED_MODEL = "sentence-transformers/paraphrase-MiniLM-L3-v2"  # 61MB vs 400MB

Option 3: Disable RAG

# In HF Space settings
DISABLE_RAG=true

Technical Details

Environment Variables Used

Variable Purpose Value
TRANSFORMERS_CACHE Transformers library cache /app/.cache/huggingface
HF_HOME Hugging Face Hub cache /app/.cache/huggingface
SENTENCE_TRANSFORMERS_HOME Sentence Transformers cache /app/.cache/sentence-transformers

Why /app/.cache/?

  • /app/ is the WORKDIR in Dockerfile
  • Accessible to all users in container
  • Persists across build and runtime
  • Can set permissions explicitly

Why chmod -R 777?

  • Ensures all users can read/write
  • Necessary for non-root runtime user
  • Safe in container environment
  • Alternative: use chown to set specific user

Summary

Problem: Model cached in user-specific directory during build, inaccessible at runtime
Solution: Use shared /app/.cache/ directory for both build and runtime
Result: Model loads instantly at runtime, no re-download needed

This is a common issue in Docker deployments with multi-stage builds or user switching.