Spaces:

Eniiyanu
/

Kaanta

Running

App Files Files Community

Kaanta / CACHE_FIX_EXPLAINED.md

Oluwaferanmi

This is the latest changes

dac20be 3 months ago

preview code

raw

history blame contribute delete

6.49 kB

	# Cache Location Mismatch - The Real Problem

	## What You Observed

	### Build Logs (✅ Success)
	```
	--> RUN python -c "from sentence_transformers import SentenceTransformer; ..."
	Downloading embedding model...
	Model cached successfully
	DONE 5.5s
	```

	### Runtime Logs (❌ Failure)
	```
	[INFO] Pre-downloading embedding model: sentence-transformers/all-MiniLM-L6-v2
	No sentence-transformers model found with name sentence-transformers/all-MiniLM-L6-v2
	[WARN] Failed to cache embedding model: Can't load the model...
	```

	## The Problem

	The model downloads successfully during build but can't be found at runtime. This is a cache location mismatch.

	### What Happens

	1. During Docker Build (as root user):
	- Model downloads to: `/root/.cache/huggingface/`
	- Build succeeds ✅

	2. During Runtime (as different user):
	- App looks for model in: `/root/.cache/huggingface/`
	- Permission denied (different user can't access `/root/`)
	- Falls back to downloading from Hugging Face
	- Download fails (network/space constraints)
	- RAG disabled ❌

	### Why This Happens

	Hugging Face Spaces runs containers with different users for build vs runtime:
	- Build time: root user
	- Runtime: non-root user (security)

	Default cache locations:
	- `sentence-transformers`: `~/.cache/torch/sentence_transformers/`
	- `transformers`: `~/.cache/huggingface/`
	- `~` = different paths for different users

	## The Solution

	### Set Explicit Cache Directories

	Use `/app/.cache/` which is accessible to both build and runtime users.

	### Implementation

	#### 1. Dockerfile Changes

	```dockerfile
	# Set cache environment variables (accessible location)
	ENV TRANSFORMERS_CACHE=/app/.cache/huggingface \
	HF_HOME=/app/.cache/huggingface \
	SENTENCE_TRANSFORMERS_HOME=/app/.cache/sentence-transformers

	# Create directories with proper permissions
	RUN mkdir -p /app/.cache/huggingface /app/.cache/sentence-transformers \
	&& chmod -R 777 /app/.cache

	# Download model to /app/.cache (not /root/.cache)
	RUN python -c "from sentence_transformers import SentenceTransformer; \
	SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
	```

	#### 2. orchestrator.py Changes

	```python
	# Ensure runtime uses same cache directories
	os.environ.setdefault('TRANSFORMERS_CACHE', '/app/.cache/huggingface')
	os.environ.setdefault('HF_HOME', '/app/.cache/huggingface')
	os.environ.setdefault('SENTENCE_TRANSFORMERS_HOME', '/app/.cache/sentence-transformers')
	```

	## How This Fixes It

	### Before (❌ Broken)
	```
	Build: /root/.cache/huggingface/ ← Model downloaded here
	Runtime: /home/user/.cache/huggingface/ ← Looking here (empty!)
	Result: Model not found, download fails
	```

	### After (✅ Fixed)
	```
	Build: /app/.cache/huggingface/ ← Model downloaded here
	Runtime: /app/.cache/huggingface/ ← Looking here (found!)
	Result: Model loaded successfully
	```

	## Expected Logs After Fix

	### Build Logs
	```
	--> RUN python -c "from sentence_transformers import SentenceTransformer; ..."
	Downloading embedding model...
	Downloading (…)ce_transformers_config.json: 100%\|██████████\| 116/116
	Downloading (…)_Pooling/config.json: 100%\|██████████\| 190/190
	Downloading (…)b52ce780/config.json: 100%\|██████████\| 612/612
	Downloading model.safetensors: 100%\|██████████\| 90.9M/90.9M
	Model cached successfully
	DONE 5.5s
	```

	### Runtime Logs
	```
	[INFO] Pre-downloading embedding model: sentence-transformers/all-MiniLM-L6-v2
	[INFO] Embedding model cached successfully ← No download, uses cached model
	[INFO] RAG pipeline initialized successfully
	[INFO] Tax Optimizer initialized successfully
	INFO: Application startup complete.
	```

	## Why Previous Attempt Failed

	Your first fix downloaded the model during build, but didn't set the cache location:
	- ✅ Model downloaded
	- ❌ Downloaded to `/root/.cache/`
	- ❌ Runtime couldn't access it
	- ❌ Tried to re-download, failed

	## Deploy the Fix

	```bash
	# Stage changes
	git add Dockerfile orchestrator.py CACHE_FIX_EXPLAINED.md QUICK_FIX.md

	# Commit
	git commit -m "Fix: Set explicit cache directories for embedding model

	- Set TRANSFORMERS_CACHE, HF_HOME, SENTENCE_TRANSFORMERS_HOME to /app/.cache
	- Create cache directories with proper permissions
	- Ensure build and runtime use same cache location
	- Fixes model not found error at runtime"

	# Push to Hugging Face
	git push
	```

	## Verification

	After deployment, check logs for:

	### ✅ Success Indicators
	```
	[INFO] Embedding model cached successfully
	[INFO] RAG pipeline initialized successfully
	[INFO] Tax Optimizer initialized successfully
	```

	### ❌ Failure Indicators (if still broken)
	```
	No sentence-transformers model found with name...
	[WARN] Failed to cache embedding model...
	[WARN] RAG not initialized...
	```

	## Alternative Solutions (If This Still Fails)

	### Option 1: Use Hugging Face Hub API
	Instead of local model, use Hugging Face Inference API:
	```python
	from langchain_huggingface import HuggingFaceEndpointEmbeddings

	embeddings = HuggingFaceEndpointEmbeddings(
	model="sentence-transformers/all-MiniLM-L6-v2",
	huggingfacehub_api_token=os.getenv("HF_TOKEN")
	)
	```

	### Option 2: Use Smaller Model
	```python
	EMBED_MODEL = "sentence-transformers/paraphrase-MiniLM-L3-v2" # 61MB vs 400MB
	```

	### Option 3: Disable RAG
	```bash
	# In HF Space settings
	DISABLE_RAG=true
	```

	## Technical Details

	### Environment Variables Used

	\| Variable \| Purpose \| Value \|
	\|----------\|---------\|-------\|
	\| `TRANSFORMERS_CACHE` \| Transformers library cache \| `/app/.cache/huggingface` \|
	\| `HF_HOME` \| Hugging Face Hub cache \| `/app/.cache/huggingface` \|
	\| `SENTENCE_TRANSFORMERS_HOME` \| Sentence Transformers cache \| `/app/.cache/sentence-transformers` \|

	### Why `/app/.cache/`?

	- `/app/` is the WORKDIR in Dockerfile
	- Accessible to all users in container
	- Persists across build and runtime
	- Can set permissions explicitly

	### Why `chmod -R 777`?

	- Ensures all users can read/write
	- Necessary for non-root runtime user
	- Safe in container environment
	- Alternative: use `chown` to set specific user

	## Summary

	Problem: Model cached in user-specific directory during build, inaccessible at runtime
	Solution: Use shared `/app/.cache/` directory for both build and runtime
	Result: Model loads instantly at runtime, no re-download needed

	This is a common issue in Docker deployments with multi-stage builds or user switching.