Oluwaferanmi commited on
Commit
143273c
·
1 Parent(s): 61a314c

This is the latest change

Browse files
Files changed (2) hide show
  1. EMBEDDING_MODEL_FIX.md +143 -0
  2. orchestrator.py +51 -24
EMBEDDING_MODEL_FIX.md ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Embedding Model Issue - Solutions
2
+
3
+ ## Problem
4
+ ```
5
+ [WARN] RAG not initialized: Can't load the model for 'sentence-transformers/all-MiniLM-L6-v2'
6
+ ```
7
+
8
+ This happens because:
9
+ 1. Hugging Face Spaces has limited disk space (~50GB)
10
+ 2. The embedding model needs to download (~400MB)
11
+ 3. First-time downloads can fail due to network/space issues
12
+
13
+ ## Solutions
14
+
15
+ ### Option 1: Disable RAG (Recommended for Now)
16
+
17
+ Set environment variable in Hugging Face Space settings:
18
+
19
+ ```bash
20
+ DISABLE_RAG=true
21
+ ```
22
+
23
+ **Result:**
24
+ - ✅ Service starts immediately
25
+ - ✅ Tax calculations work perfectly
26
+ - ❌ Tax optimization unavailable (requires RAG)
27
+ - ❌ Tax Q&A unavailable (requires RAG)
28
+
29
+ **Use this if:** You only need tax calculations, not optimization recommendations.
30
+
31
+ ### Option 2: Wait for Model Download
32
+
33
+ The model will eventually download on subsequent restarts. It may take 2-3 tries.
34
+
35
+ **Steps:**
36
+ 1. Don't set `DISABLE_RAG`
37
+ 2. Restart the Space multiple times
38
+ 3. Check logs for: `[INFO] Embedding model cached successfully`
39
+
40
+ **Use this if:** You need full tax optimization features.
41
+
42
+ ### Option 3: Use Smaller Embedding Model
43
+
44
+ Change in `orchestrator.py`:
45
+
46
+ ```python
47
+ # Instead of:
48
+ EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
49
+
50
+ # Use:
51
+ EMBED_MODEL = "sentence-transformers/all-MiniLM-L12-v2" # Smaller, faster
52
+ ```
53
+
54
+ ### Option 4: Pre-build Docker Image with Model
55
+
56
+ Add to `Dockerfile`:
57
+
58
+ ```dockerfile
59
+ # After RUN pip install...
60
+ RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
61
+ ```
62
+
63
+ This downloads the model during build time.
64
+
65
+ ## Current Status
66
+
67
+ The code now:
68
+ 1. ✅ Tries to pre-download the model
69
+ 2. ✅ Provides clear error messages
70
+ 3. ✅ Continues without RAG if model fails
71
+ 4. ✅ Supports `DISABLE_RAG` environment variable
72
+
73
+ ## What Works Without RAG
74
+
75
+ | Feature | Status |
76
+ |---------|--------|
77
+ | Tax Calculations (PIT, CIT, VAT) | ✅ Works |
78
+ | Tax Rules Engine | ✅ Works |
79
+ | `/v1/query` endpoint (calculations) | ✅ Works |
80
+ | `/v1/query` endpoint (Q&A) | ❌ Requires RAG |
81
+ | `/v1/optimize` endpoint | ❌ Requires RAG |
82
+ | Transaction classification | ⚠️ Basic patterns only |
83
+
84
+ ## Recommendation for Production
85
+
86
+ **For auth-backend integration:**
87
+
88
+ Since you mainly need transaction classification and tax calculations (not Q&A), you have two options:
89
+
90
+ ### Option A: Disable RAG, Use Basic Classification
91
+ ```bash
92
+ # In HF Space settings
93
+ DISABLE_RAG=true
94
+ ```
95
+
96
+ Transaction classification will use pattern matching (salary, pension, rent, etc.) without LLM.
97
+
98
+ ### Option B: Wait for Model Download
99
+ Keep trying to restart until the model downloads successfully. This gives you full optimization features.
100
+
101
+ ## Testing After Fix
102
+
103
+ ### Test 1: Health Check
104
+ ```bash
105
+ curl https://your-space.hf.space/health
106
+ ```
107
+
108
+ Expected with RAG disabled:
109
+ ```json
110
+ {
111
+ "status": "ok",
112
+ "rag_ready": false
113
+ }
114
+ ```
115
+
116
+ ### Test 2: Tax Calculation (Should Work)
117
+ ```bash
118
+ curl -X POST https://your-space.hf.space/v1/query \
119
+ -H "Content-Type: application/json" \
120
+ -d '{
121
+ "question": "Calculate PIT for 5000000 annual income",
122
+ "tax_type": "PIT"
123
+ }'
124
+ ```
125
+
126
+ ### Test 3: Tax Optimization (Requires RAG)
127
+ ```bash
128
+ curl -X POST https://your-space.hf.space/v1/optimize \
129
+ -H "Content-Type: application/json" \
130
+ -d '{
131
+ "user_id": "test",
132
+ "transactions": [...],
133
+ "tax_year": 2025
134
+ }'
135
+ ```
136
+
137
+ Will return 503 if RAG disabled.
138
+
139
+ ## Next Steps
140
+
141
+ 1. **For immediate use:** Set `DISABLE_RAG=true`
142
+ 2. **For full features:** Wait for model download or use Option 4 (pre-build)
143
+ 3. **For production:** Consider upgrading to Hugging Face Spaces Pro for more resources
orchestrator.py CHANGED
@@ -38,6 +38,24 @@ GROQ_MODEL = "llama-3.1-8b-instant"
38
  # Use /tmp for vector store in Hugging Face Spaces (writable directory)
39
  VECTOR_STORE_DIR = os.getenv('VECTOR_STORE_DIR', '/tmp/vector_store')
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  CALC_KEYWORDS = {
42
  "compute", "calculate", "calc", "how much tax", "tax due", "paye", "cit", "vat to pay",
43
  "what will i pay", "liability", "estimate", "breakdown", "net pay", "withholding"
@@ -264,31 +282,40 @@ class Orchestrator:
264
 
265
  # RAG
266
  rag = None
267
- try:
268
- src = Path(PDF_SOURCE)
269
- # Use writable directory for Hugging Face Spaces
270
- vector_store_path = Path(VECTOR_STORE_DIR)
271
-
272
- # Create directory if it doesn't exist and is writable
273
  try:
274
- vector_store_path.mkdir(parents=True, exist_ok=True)
275
- except (PermissionError, OSError) as mkdir_err:
276
- print(f"[WARN] Cannot create vector_store directory: {mkdir_err}", file=sys.stderr)
277
- print(f"[INFO] RAG will be disabled. Tax calculations will still work.", file=sys.stderr)
278
- raise
279
-
280
- ds = DocumentStore(persist_dir=vector_store_path, embedding_model=EMBED_MODEL)
281
- pdfs = ds.discover_pdfs(src)
282
- if not pdfs:
283
- print(f"[WARN] No PDFs found under {src}. RAG disabled.", file=sys.stderr)
284
- raise FileNotFoundError(f"No PDFs found under {src}")
285
- ds.build_vector_store(pdfs, force_rebuild=False)
286
- # RAGPipeline reads GROQ_API_KEY from env via langchain_groq; ensure .env loaded
287
- rag = RAGPipeline(doc_store=ds, model=GROQ_MODEL, temperature=0.1)
288
- print("[INFO] RAG pipeline initialized successfully", file=sys.stderr)
289
- except Exception as e:
290
- print(f"[WARN] RAG not initialized: {e}", file=sys.stderr)
291
- print(f"[INFO] Service will continue without RAG. Tax calculations available.", file=sys.stderr)
 
 
 
 
 
 
 
 
 
 
 
292
 
293
  # Tax Optimizer
294
  optimizer = None
 
38
  # Use /tmp for vector store in Hugging Face Spaces (writable directory)
39
  VECTOR_STORE_DIR = os.getenv('VECTOR_STORE_DIR', '/tmp/vector_store')
40
 
41
+ # Allow disabling RAG entirely for resource-constrained environments
42
+ DISABLE_RAG = os.getenv('DISABLE_RAG', 'false').lower() in ('true', '1', 'yes')
43
+
44
+ # Pre-download embedding model to cache
45
+ def _ensure_embedding_model_cached(model_name: str) -> bool:
46
+ """Pre-download embedding model to avoid runtime errors"""
47
+ try:
48
+ from sentence_transformers import SentenceTransformer
49
+ print(f"[INFO] Pre-downloading embedding model: {model_name}", file=sys.stderr)
50
+ model = SentenceTransformer(model_name)
51
+ print(f"[INFO] Embedding model cached successfully", file=sys.stderr)
52
+ return True
53
+ except Exception as e:
54
+ print(f"[WARN] Failed to cache embedding model: {e}", file=sys.stderr)
55
+ print(f"[INFO] This is common in Hugging Face Spaces with limited disk space", file=sys.stderr)
56
+ print(f"[INFO] Set DISABLE_RAG=true to skip RAG initialization", file=sys.stderr)
57
+ return False
58
+
59
  CALC_KEYWORDS = {
60
  "compute", "calculate", "calc", "how much tax", "tax due", "paye", "cit", "vat to pay",
61
  "what will i pay", "liability", "estimate", "breakdown", "net pay", "withholding"
 
282
 
283
  # RAG
284
  rag = None
285
+
286
+ if DISABLE_RAG:
287
+ print(f"[INFO] RAG disabled via DISABLE_RAG environment variable", file=sys.stderr)
288
+ else:
 
 
289
  try:
290
+ # Pre-download embedding model
291
+ if not _ensure_embedding_model_cached(EMBED_MODEL):
292
+ print(f"[WARN] Embedding model not available. RAG disabled.", file=sys.stderr)
293
+ raise RuntimeError("Embedding model unavailable")
294
+
295
+ src = Path(PDF_SOURCE)
296
+ # Use writable directory for Hugging Face Spaces
297
+ vector_store_path = Path(VECTOR_STORE_DIR)
298
+
299
+ # Create directory if it doesn't exist and is writable
300
+ try:
301
+ vector_store_path.mkdir(parents=True, exist_ok=True)
302
+ except (PermissionError, OSError) as mkdir_err:
303
+ print(f"[WARN] Cannot create vector_store directory: {mkdir_err}", file=sys.stderr)
304
+ print(f"[INFO] RAG will be disabled. Tax calculations will still work.", file=sys.stderr)
305
+ raise
306
+
307
+ ds = DocumentStore(persist_dir=vector_store_path, embedding_model=EMBED_MODEL)
308
+ pdfs = ds.discover_pdfs(src)
309
+ if not pdfs:
310
+ print(f"[WARN] No PDFs found under {src}. RAG disabled.", file=sys.stderr)
311
+ raise FileNotFoundError(f"No PDFs found under {src}")
312
+ ds.build_vector_store(pdfs, force_rebuild=False)
313
+ # RAGPipeline reads GROQ_API_KEY from env via langchain_groq; ensure .env loaded
314
+ rag = RAGPipeline(doc_store=ds, model=GROQ_MODEL, temperature=0.1)
315
+ print("[INFO] RAG pipeline initialized successfully", file=sys.stderr)
316
+ except Exception as e:
317
+ print(f"[WARN] RAG not initialized: {e}", file=sys.stderr)
318
+ print(f"[INFO] Service will continue without RAG. Tax calculations available.", file=sys.stderr)
319
 
320
  # Tax Optimizer
321
  optimizer = None