S143 commited on
Commit
880952d
·
verified ·
1 Parent(s): 2e0db40

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +183 -17
  2. gitattributes.txt +38 -0
  3. requirements.txt +8 -0
README.md CHANGED
@@ -1,17 +1,183 @@
1
- ---
2
- title: SamaviyaInsurance
3
- emoji: 💬
4
- colorFrom: yellow
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 5.42.0
8
- app_file: app.py
9
- pinned: false
10
- hf_oauth: true
11
- hf_oauth_scopes:
12
- - inference-api
13
- license: mit
14
- short_description: This is an insurance bot that helps us answer questions.
15
- ---
16
-
17
- An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ DDS Insurance Q&A — RAG Assistant (Pinecone + OpenAI + Gradio)
2
+
3
+ Summary: A beginner-friendly, document-grounded insurance bot that you can replicate and deploy on Hugging Face Spaces. It answers only from your uploaded insurance documents using LlamaIndex + Pinecone (serverless) + OpenAI with a simple, polite system prompt.
4
+
5
+ What You’ll Get
6
+
7
+ Deployed Space URL you can share.
8
+
9
+ Grounded answers (no docs → the bot politely says it can’t find it).
10
+
11
+ Simple UI with an FAQ dropdown + free-text question box.
12
+
13
+ Clean structure designed for easy replication.
14
+
15
+ Features
16
+
17
+ Answers strictly from your data/ documents (RAG).
18
+
19
+ Pinecone serverless index (AWS us-east-1, cosine, 1536-dim).
20
+
21
+ OpenAI for embeddings (text-embedding-3-small) and LLM (gpt-4o-mini).
22
+
23
+ Gradio interface with a centered required logo (data/dds_logo.png).
24
+
25
+ Beginner-friendly defaults and error messages.
26
+
27
+ Repository Structure
28
+ .
29
+ ├─ data/ # Your insurance docs + required logo
30
+ │ └─ dds_logo.png # REQUIRED (shown in header)
31
+ ├─ app.py # Main app: indexing + query + Gradio UI
32
+ ├─ requirements.txt # Dependencies
33
+ └─ README.md # This file
34
+
35
+ Configuration (in app.py)
36
+ EMBED_MODEL = "text-embedding-3-small" # 1536-dim
37
+ LLM_MODEL = "gpt-4o-mini"
38
+ TOP_K = 4 # retrieval depth
39
+
40
+
41
+ System Prompt (keeps answers grounded + polite):
42
+
43
+ SYSTEM_PROMPT = """You are Aisha, a polite and professional Insurance assistant.
44
+ Answer ONLY using the information found in the indexed insurance document(s).
45
+ If the answer is not in the document(s), say: "I couldn’t find that in the document."
46
+ Keep responses concise, helpful, and courteous.
47
+ """
48
+
49
+
50
+ FAQ List (editable):
51
+
52
+ FAQS = [
53
+ "",
54
+ "What benefits are covered under the policy?",
55
+ "How do I file a claim and what documents are required?",
56
+ "What are the exclusions and limitations?",
57
+ "Is pre-authorization needed for hospitalization?",
58
+ "What is the reimbursement timeline?",
59
+ "How are outpatient vs inpatient services handled?",
60
+ "How can I check my network hospitals/clinics?",
61
+ "What is the co-pay or deductible policy?",
62
+ ]
63
+
64
+ Deploy to Hugging Face Spaces (Beginner-Friendly)
65
+ 1) Create a Space
66
+
67
+ Go to Hugging Face → Spaces → New Space
68
+
69
+ SDK: Gradio
70
+
71
+ Visibility/licensing: your choice
72
+
73
+ 2) Add Project Files
74
+
75
+ Upload these into your Space:
76
+
77
+ app.py
78
+
79
+ requirements.txt
80
+
81
+ README.md
82
+
83
+ Create folder data/ and upload:
84
+
85
+ Your insurance documents (PDF/TXT/MD…)
86
+
87
+ dds_logo.png (mandatory; exact filename)
88
+
89
+ Tip: Your Space file tree should match the Repository Structure above.
90
+
91
+ 3) Set Secrets (Environment Variables)
92
+
93
+ In Space → Settings → Variables and secrets, add:
94
+
95
+ OPENAI_API_KEY → your OpenAI key
96
+
97
+ PINECONE_API_KEY → your Pinecone key
98
+
99
+ No legacy Pinecone environment URL needed. This app uses pinecone-client ≥ 5 with serverless.
100
+
101
+ 4) Build & Run
102
+
103
+ Spaces auto-install from requirements.txt.
104
+
105
+ Default CPU hardware is fine.
106
+
107
+ Entry point auto-detected from app.py.
108
+
109
+ On first start, the app will:
110
+
111
+ Ensure a Pinecone serverless index:
112
+ dds-insurance-index · cosine · 1536-dim · aws/us-east-1
113
+
114
+ Read and index documents from data/
115
+
116
+ Launch the Gradio UI
117
+
118
+ Your deployed link is simply the Space URL once its status is Running.
119
+
120
+ 5) Updating Documents Later
121
+
122
+ Upload/change files in data/
123
+
124
+ Click Restart on the Space so it re-indexes your documents
125
+
126
+ Troubleshooting (Common Issues)
127
+
128
+ “Missing PINECONE_API_KEY or OPENAI_API_KEY”
129
+ Add both secrets in Space → Settings → Variables and secrets.
130
+
131
+ Pinecone 401 / “Malformed domain”
132
+
133
+ Ensure you’re on pinecone-client>=5.0.1 (already in requirements.txt).
134
+
135
+ Use a valid Pinecone API key; no environment URL needed for serverless.
136
+
137
+ “Logo not found: data/dds_logo.png”
138
+ Upload an image named exactly dds_logo.png into the data/ folder.
139
+
140
+ “No documents found in data/”
141
+ Upload at least one doc (PDF/TXT/MD) into data/, then Restart the Space.
142
+
143
+ OpenAI authorization/rate-limit errors
144
+ Confirm key validity and model access; reduce usage if rate-limited.
145
+
146
+ Slow first load
147
+ First run installs dependencies and builds the index; later runs are faster.
148
+
149
+ Manual Test Checklist
150
+
151
+ Ask a question clearly answered in your docs → response should quote that knowledge.
152
+
153
+ Ask something not in your docs → bot should say it can’t find it.
154
+
155
+ Adjust TOP_K in app.py to see how answer completeness changes.
156
+
157
+ Requirements (from requirements.txt)
158
+ gradio>=4.44.0
159
+ pinecone-client>=5.0.1
160
+ openai>=1.51.0
161
+ llama-index>=0.11.0
162
+ llama-index-vector-stores-pinecone>=0.3.0
163
+ llama-index-embeddings-openai>=0.3.0
164
+ llama-index-llms-openai>=0.2.0
165
+ tiktoken>=0.7.0
166
+
167
+ Customization Ideas
168
+
169
+ Swap LLMs by editing LLM_MODEL.
170
+
171
+ Add a file uploader to refresh docs from the UI.
172
+
173
+ Add metadata filters (e.g., policy type).
174
+
175
+ Log queries to refine the FAQ list.
176
+
177
+ License
178
+
179
+ Add your chosen license (e.g., MIT) as LICENSE.
180
+
181
+ Acknowledgments
182
+
183
+ Thanks to LlamaIndex, Pinecone, OpenAI, and Gradio for the tooling that makes this simple and reproducible.
gitattributes.txt ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ insurance.pdf filter=lfs diff=lfs merge=lfs -text
37
+ data/insurance.pdf filter=lfs diff=lfs merge=lfs -text
38
+ data/dds_logo.png filter=lfs diff=lfs merge=lfs -text
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ gradio>=4.44.0
2
+ pinecone-client>=5.0.1
3
+ openai>=1.51.0
4
+ llama-index>=0.11.0
5
+ llama-index-vector-stores-pinecone>=0.3.0
6
+ llama-index-embeddings-openai>=0.3.0
7
+ llama-index-llms-openai>=0.2.0
8
+ tiktoken>=0.7.0