--- title: PDF Chatbot emoji: ๐Ÿ“˜ colorFrom: blue colorTo: purple sdk: streamlit app_port: 7860 pinned: false license: mit --- # ๐Ÿ“˜ PDF RAG Chatbot (Groq + LangChain) A **Retrieval-Augmented Generation (RAG)** application that allows users to: - Upload a **PDF** - Ask questions based **only on the PDF content** - Get accurate answers powered by **Groq LLMs** - Runs fully on **CPU (Hugging Face Free Tier)** --- ## ๐Ÿš€ Features - ๐Ÿ“„ PDF upload & processing - โœ‚๏ธ Intelligent text chunking - ๐Ÿ” Semantic search using embeddings - ๐Ÿง  Context-aware LLM responses - ๐Ÿงน Memory clear & health endpoints - โšก Fast inference via **Groq API** --- ## ๐Ÿงฑ Tech Stack - **Frontend**: Streamlit - **Backend**: FastAPI - **LLM**: Groq (`llama-3.1-8b-instant`) - **Embeddings**: `all-MiniLM-L6-v2` - **Vector DB**: Chroma (in-memory) - **Frameworks**: LangChain - **Deployment**: Docker + Hugging Face Spaces --- ## ๐Ÿงช How It Works (RAG Pipeline) 1. Upload PDF 2. Split text into chunks 3. Generate embeddings 4. Store in vector database 5. Retrieve relevant chunks 6. Generate answer using Groq LLM --- ## ๐Ÿ–ฅ๏ธ Usage 1. Upload a PDF file 2. Ask questions related to the document 3. If the answer is not in the PDF, the assistant will reply: > **"I cannot find this in the PDF."** --- ## ๐Ÿ” Environment Variables The following secret **must** be added in Hugging Face Spaces: | Variable | Description | |--------|------------| | `GROQ_API_KEY` | Groq API key | > โš ๏ธ Do NOT commit `.env` files to the repository. --- ## โค๏ธ Notes - Runs on **CPU only** (no GPU required) - Free-tier friendly - First load may take a few minutes - Space may sleep when idle --- ## ๐Ÿ‘จโ€๐Ÿ’ป Author **Abhishek Saxena** M.Tech Data Science, IIT Roorkee --- ## โญ If you like this project Give it a โญ on Hugging Face and feel free to fork!