---
title: PDF Chatbot
emoji: 📘
colorFrom: blue
colorTo: purple
sdk: streamlit
app_port: 7860
pinned: false
license: mit
---

# 📘 PDF RAG Chatbot (Groq + LangChain)

A **Retrieval-Augmented Generation (RAG)** application that allows users to:

- Upload a **PDF**
- Ask questions based **only on the PDF content**
- Get accurate answers powered by **Groq LLMs**
- Runs fully on **CPU (Hugging Face Free Tier)**

---

## 🚀 Features

- 📄 PDF upload & processing
- ✂️ Intelligent text chunking
- 🔍 Semantic search using embeddings
- 🧠 Context-aware LLM responses
- 🧹 Memory clear & health endpoints
- ⚡ Fast inference via **Groq API**

---

## 🧱 Tech Stack

- **Frontend**: Streamlit
- **Backend**: FastAPI
- **LLM**: Groq (`llama-3.1-8b-instant`)
- **Embeddings**: `all-MiniLM-L6-v2`
- **Vector DB**: Chroma (in-memory)
- **Frameworks**: LangChain
- **Deployment**: Docker + Hugging Face Spaces

---

## 🧪 How It Works (RAG Pipeline)

1. Upload PDF
2. Split text into chunks
3. Generate embeddings
4. Store in vector database
5. Retrieve relevant chunks
6. Generate answer using Groq LLM

---

## 🖥️ Usage

1. Upload a PDF file
2. Ask questions related to the document
3. If the answer is not in the PDF, the assistant will reply:
   > **"I cannot find this in the PDF."**

---

## 🔐 Environment Variables

The following secret **must** be added in Hugging Face Spaces:

| Variable | Description |
|--------|------------|
| `GROQ_API_KEY` | Groq API key |

> ⚠️ Do NOT commit `.env` files to the repository.

---

## ❤️ Notes

- Runs on **CPU only** (no GPU required)
- Free-tier friendly
- First load may take a few minutes
- Space may sleep when idle

---

## 👨‍💻 Author

**Abhishek Saxena**  
M.Tech Data Science, IIT Roorkee  

---

## ⭐ If you like this project

Give it a ⭐ on Hugging Face and feel free to fork!