Spaces:
Sleeping
Sleeping
Upload 2 files
Browse files- QUICKSTART.md +279 -0
- README.md +357 -0
QUICKSTART.md
ADDED
|
@@ -0,0 +1,279 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ๐ Quick Start Guide
|
| 2 |
+
|
| 3 |
+
Get the Multilingual QA System up and running in **5 minutes**!
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## โก Fast Track
|
| 8 |
+
|
| 9 |
+
```bash
|
| 10 |
+
# 1. Clone and enter directory
|
| 11 |
+
git clone https://github.com/Praanshull/multilingual-qa-system.git
|
| 12 |
+
cd multilingual-qa-system
|
| 13 |
+
|
| 14 |
+
# 2. Install dependencies
|
| 15 |
+
pip install -r requirements.txt
|
| 16 |
+
|
| 17 |
+
# 3. Run setup script (first time only)
|
| 18 |
+
python setup_project.py
|
| 19 |
+
|
| 20 |
+
# 4. Launch application
|
| 21 |
+
python app.py
|
| 22 |
+
```
|
| 23 |
+
|
| 24 |
+
Then open **http://localhost:7860** in your browser!
|
| 25 |
+
|
| 26 |
+
---
|
| 27 |
+
|
| 28 |
+
## ๐ Detailed Steps
|
| 29 |
+
|
| 30 |
+
### Step 1: Prerequisites
|
| 31 |
+
|
| 32 |
+
Make sure you have:
|
| 33 |
+
- โ
Python 3.8 or higher
|
| 34 |
+
- โ
pip (Python package manager)
|
| 35 |
+
- โ
Git
|
| 36 |
+
- โ
(Optional) CUDA-capable GPU
|
| 37 |
+
|
| 38 |
+
Check your Python version:
|
| 39 |
+
```bash
|
| 40 |
+
python --version
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
### Step 2: Clone Repository
|
| 44 |
+
|
| 45 |
+
```bash
|
| 46 |
+
git clone https://github.com/Praanshull/multilingual-qa-system.git
|
| 47 |
+
cd multilingual-qa-system
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
### Step 3: Create Virtual Environment (Recommended)
|
| 51 |
+
|
| 52 |
+
**Windows:**
|
| 53 |
+
```bash
|
| 54 |
+
python -m venv venv
|
| 55 |
+
venv\Scripts\activate
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
**Mac/Linux:**
|
| 59 |
+
```bash
|
| 60 |
+
python -m venv venv
|
| 61 |
+
source venv/bin/activate
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
### Step 4: Install Dependencies
|
| 65 |
+
|
| 66 |
+
```bash
|
| 67 |
+
pip install -r requirements.txt
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
This will install:
|
| 71 |
+
- PyTorch
|
| 72 |
+
- Transformers
|
| 73 |
+
- Gradio
|
| 74 |
+
- PEFT
|
| 75 |
+
- And other required packages
|
| 76 |
+
|
| 77 |
+
**Estimated time:** 2-5 minutes
|
| 78 |
+
|
| 79 |
+
### Step 5: Setup Project Structure
|
| 80 |
+
|
| 81 |
+
```bash
|
| 82 |
+
python setup_project.py
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
This script will:
|
| 86 |
+
1. Create necessary directories
|
| 87 |
+
2. Move model files to correct locations
|
| 88 |
+
3. Create configuration files
|
| 89 |
+
4. Verify everything is set up correctly
|
| 90 |
+
|
| 91 |
+
**Note:** If you haven't downloaded the model yet, you'll need to:
|
| 92 |
+
- Download from Google Drive (if shared)
|
| 93 |
+
- Or the model will be downloaded automatically on first run
|
| 94 |
+
|
| 95 |
+
### Step 6: Test the Model (Optional)
|
| 96 |
+
|
| 97 |
+
```bash
|
| 98 |
+
python test_model.py
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
This runs quick tests to verify everything works.
|
| 102 |
+
|
| 103 |
+
### Step 7: Launch the Application
|
| 104 |
+
|
| 105 |
+
```bash
|
| 106 |
+
python app.py
|
| 107 |
+
```
|
| 108 |
+
|
| 109 |
+
You should see:
|
| 110 |
+
```
|
| 111 |
+
================================================================================
|
| 112 |
+
๐ LAUNCHING APPLICATION
|
| 113 |
+
================================================================================
|
| 114 |
+
โ
Application launched successfully!
|
| 115 |
+
๐ฑ Access the interface at: http://localhost:7860
|
| 116 |
+
```
|
| 117 |
+
|
| 118 |
+
### Step 8: Open in Browser
|
| 119 |
+
|
| 120 |
+
Open your web browser and go to:
|
| 121 |
+
```
|
| 122 |
+
http://localhost:7860
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
---
|
| 126 |
+
|
| 127 |
+
## ๐ฏ Using the Interface
|
| 128 |
+
|
| 129 |
+
### Ask Questions Tab
|
| 130 |
+
|
| 131 |
+
1. **Select Language:** Choose English ๐ฌ๐ง or German ๐ฉ๐ช
|
| 132 |
+
2. **Enter Question:** Type your question
|
| 133 |
+
3. **Provide Context:** Paste the passage containing the answer
|
| 134 |
+
4. **Click "Get Answer":** The model will extract the answer
|
| 135 |
+
|
| 136 |
+
**Tips:**
|
| 137 |
+
- Keep context under 300 words for best results
|
| 138 |
+
- Make sure the answer is explicitly stated in the context
|
| 139 |
+
- Use clear, direct questions
|
| 140 |
+
|
| 141 |
+
### Try Examples
|
| 142 |
+
|
| 143 |
+
1. Click on "Try Examples" section
|
| 144 |
+
2. Select example type (General Knowledge, Historical, Scientific)
|
| 145 |
+
3. Click "Load Example"
|
| 146 |
+
4. The question and context will be filled automatically
|
| 147 |
+
5. Click "Get Answer"
|
| 148 |
+
|
| 149 |
+
---
|
| 150 |
+
|
| 151 |
+
## ๐ง Troubleshooting
|
| 152 |
+
|
| 153 |
+
### Model Not Found Error
|
| 154 |
+
|
| 155 |
+
**Problem:** `โ Failed to load model: Model not found`
|
| 156 |
+
|
| 157 |
+
**Solution:**
|
| 158 |
+
```bash
|
| 159 |
+
# Update the model path in app.py
|
| 160 |
+
MODEL_PATH = "models/multilingual_model"
|
| 161 |
+
|
| 162 |
+
# Or download the model:
|
| 163 |
+
python download_model.py
|
| 164 |
+
```
|
| 165 |
+
|
| 166 |
+
### CUDA Out of Memory
|
| 167 |
+
|
| 168 |
+
**Problem:** `RuntimeError: CUDA out of memory`
|
| 169 |
+
|
| 170 |
+
**Solution:**
|
| 171 |
+
```python
|
| 172 |
+
# The model will automatically fall back to CPU
|
| 173 |
+
# Or reduce batch size in config if running inference in batches
|
| 174 |
+
```
|
| 175 |
+
|
| 176 |
+
### Port Already in Use
|
| 177 |
+
|
| 178 |
+
**Problem:** `OSError: [Errno 48] Address already in use`
|
| 179 |
+
|
| 180 |
+
**Solution:**
|
| 181 |
+
```bash
|
| 182 |
+
# Use a different port
|
| 183 |
+
python app.py --port 7861
|
| 184 |
+
```
|
| 185 |
+
|
| 186 |
+
Or kill the process using port 7860:
|
| 187 |
+
```bash
|
| 188 |
+
# Mac/Linux
|
| 189 |
+
lsof -ti:7860 | xargs kill -9
|
| 190 |
+
|
| 191 |
+
# Windows
|
| 192 |
+
netstat -ano | findstr :7860
|
| 193 |
+
taskkill /PID <PID> /F
|
| 194 |
+
```
|
| 195 |
+
|
| 196 |
+
### Import Errors
|
| 197 |
+
|
| 198 |
+
**Problem:** `ModuleNotFoundError: No module named 'xxx'`
|
| 199 |
+
|
| 200 |
+
**Solution:**
|
| 201 |
+
```bash
|
| 202 |
+
# Reinstall dependencies
|
| 203 |
+
pip install -r requirements.txt --force-reinstall
|
| 204 |
+
```
|
| 205 |
+
|
| 206 |
+
---
|
| 207 |
+
|
| 208 |
+
## ๐ Deploy to Cloud
|
| 209 |
+
|
| 210 |
+
### Deploy to Hugging Face Spaces (Free)
|
| 211 |
+
|
| 212 |
+
```bash
|
| 213 |
+
# Install Gradio
|
| 214 |
+
pip install gradio
|
| 215 |
+
|
| 216 |
+
# Deploy (from project directory)
|
| 217 |
+
gradio deploy
|
| 218 |
+
```
|
| 219 |
+
|
| 220 |
+
### Deploy to Railway/Render
|
| 221 |
+
|
| 222 |
+
1. Create account on Railway/Render
|
| 223 |
+
2. Connect your GitHub repository
|
| 224 |
+
3. Set start command: `python app.py`
|
| 225 |
+
4. Deploy!
|
| 226 |
+
|
| 227 |
+
---
|
| 228 |
+
|
| 229 |
+
## ๐ Next Steps
|
| 230 |
+
|
| 231 |
+
Now that you have the app running:
|
| 232 |
+
|
| 233 |
+
1. โ
Read the full [README.md](README.md) for detailed documentation
|
| 234 |
+
2. โ
Check out the [notebook/main.ipynb](notebook/main.ipynb) to see training process
|
| 235 |
+
3. โ
Explore the code in `app/` directory
|
| 236 |
+
4. โ
Try modifying examples in `app/utils.py`
|
| 237 |
+
5. โ
Add your own test cases in `test_model.py`
|
| 238 |
+
|
| 239 |
+
---
|
| 240 |
+
|
| 241 |
+
## ๐ก Pro Tips
|
| 242 |
+
|
| 243 |
+
### For Development
|
| 244 |
+
|
| 245 |
+
```bash
|
| 246 |
+
# Enable debug mode
|
| 247 |
+
python app.py --debug
|
| 248 |
+
|
| 249 |
+
# Share publicly (generates public URL)
|
| 250 |
+
python app.py --share
|
| 251 |
+
|
| 252 |
+
# Run on specific port
|
| 253 |
+
python app.py --port 8080
|
| 254 |
+
```
|
| 255 |
+
|
| 256 |
+
### For Production
|
| 257 |
+
|
| 258 |
+
```bash
|
| 259 |
+
# Use gunicorn for better performance
|
| 260 |
+
gunicorn app:app --workers 4 --bind 0.0.0.0:7860
|
| 261 |
+
```
|
| 262 |
+
|
| 263 |
+
---
|
| 264 |
+
|
| 265 |
+
## โ Need Help?
|
| 266 |
+
|
| 267 |
+
- ๐ Check [README.md](README.md) for detailed docs
|
| 268 |
+
- ๐ Report issues on [GitHub Issues](https://github.com/Praanshull/multilingual-qa-system/issues)
|
| 269 |
+
- ๐ฌ Ask questions in Discussions
|
| 270 |
+
|
| 271 |
+
---
|
| 272 |
+
|
| 273 |
+
<div align="center">
|
| 274 |
+
|
| 275 |
+
**Happy Question Answering! ๐**
|
| 276 |
+
|
| 277 |
+
[โฌ๏ธ Back to Top](#-quick-start-guide)
|
| 278 |
+
|
| 279 |
+
</div>
|
README.md
ADDED
|
@@ -0,0 +1,357 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ๐ Multilingual Question Answering System
|
| 2 |
+
|
| 3 |
+
A state-of-the-art multilingual question answering system supporting **English ๐ฌ๐ง** and **German ๐ฉ๐ช**, built with **mBART-large-50** fine-tuned using **LoRA** (Low-Rank Adaptation).
|
| 4 |
+
|
| 5 |
+

|
| 6 |
+

|
| 7 |
+

|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## ๐ Table of Contents
|
| 12 |
+
|
| 13 |
+
- [Overview](#overview)
|
| 14 |
+
- [Key Features](#key-features)
|
| 15 |
+
- [Performance](#performance)
|
| 16 |
+
- [Installation](#installation)
|
| 17 |
+
- [Project Structure](#project-structure)
|
| 18 |
+
- [Usage](#usage)
|
| 19 |
+
- [Model Details](#model-details)
|
| 20 |
+
- [Training](#training)
|
| 21 |
+
- [Limitations](#limitations)
|
| 22 |
+
- [Future Improvements](#future-improvements)
|
| 23 |
+
- [Citation](#citation)
|
| 24 |
+
- [License](#license)
|
| 25 |
+
|
| 26 |
+
---
|
| 27 |
+
|
| 28 |
+
## ๐ฏ Overview
|
| 29 |
+
|
| 30 |
+
This project implements a **bilingual extractive question answering system** that can:
|
| 31 |
+
- Extract answers from English contexts
|
| 32 |
+
- Extract answers from German contexts
|
| 33 |
+
- Achieve **high accuracy** with minimal training data through transfer learning
|
| 34 |
+
- Run efficiently using **Parameter-Efficient Fine-Tuning (LoRA)**
|
| 35 |
+
|
| 36 |
+
### What is Extractive QA?
|
| 37 |
+
The model reads a passage (context) and a question, then extracts the exact answer span from the context.
|
| 38 |
+
|
| 39 |
+
**Example:**
|
| 40 |
+
- **Question:** "What is the capital of France?"
|
| 41 |
+
- **Context:** "Paris is the capital and most populous city of France."
|
| 42 |
+
- **Answer:** "Paris"
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
## โจ Key Features
|
| 47 |
+
|
| 48 |
+
โ
**Bilingual Support** - English and German
|
| 49 |
+
โ
**Fast Inference** - <1 second per query on GPU
|
| 50 |
+
โ
**Memory Efficient** - Uses LoRA (only 0.29% trainable parameters)
|
| 51 |
+
โ
**High Accuracy** - >65% F1 score on both languages
|
| 52 |
+
โ
**Easy Deployment** - Gradio web interface included
|
| 53 |
+
โ
**Well Documented** - Comprehensive code comments and README
|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
## ๐ Performance
|
| 58 |
+
|
| 59 |
+
### Model Metrics
|
| 60 |
+
|
| 61 |
+
| Metric | English (SQuAD) | German (XQuAD) | Improvement |
|
| 62 |
+
|--------|----------------|----------------|-------------|
|
| 63 |
+
| **BLEU** | 37.79 | **43.12** | +5.33 |
|
| 64 |
+
| **ROUGE-L** | 0.6272 | **0.6622** | +0.035 |
|
| 65 |
+
| **Exact Match** | 43.60% | **48.74%** | +5.14% |
|
| 66 |
+
| **F1 Score** | 0.6329 | **0.6580** | +0.025 |
|
| 67 |
+
| **Avg (EM+F1)** | 0.5344 | **0.5727** | +0.038 |
|
| 68 |
+
|
| 69 |
+
### Key Insights
|
| 70 |
+
- ๐ **German achieves 107.2% of English performance** despite having only ~5% of training data
|
| 71 |
+
- ๐ Strong **transfer learning** from English to German
|
| 72 |
+
- ๐ช Better German scores demonstrate effective **cross-lingual adaptation**
|
| 73 |
+
|
| 74 |
+
---
|
| 75 |
+
|
| 76 |
+
## ๐ Installation
|
| 77 |
+
|
| 78 |
+
### Prerequisites
|
| 79 |
+
- Python 3.8+
|
| 80 |
+
- CUDA-capable GPU (recommended, 8GB+ VRAM)
|
| 81 |
+
- 16GB+ RAM
|
| 82 |
+
|
| 83 |
+
### Setup
|
| 84 |
+
|
| 85 |
+
1. **Clone the repository**
|
| 86 |
+
```bash
|
| 87 |
+
git clone https://github.com/Praanshull/multilingual-qa-system.git
|
| 88 |
+
cd multilingual-qa-system
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
2. **Create virtual environment**
|
| 92 |
+
```bash
|
| 93 |
+
python -m venv venv
|
| 94 |
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
3. **Install dependencies**
|
| 98 |
+
```bash
|
| 99 |
+
pip install -r requirements.txt
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
4. **Download the model**
|
| 103 |
+
```bash
|
| 104 |
+
# Option 1: Download from your Google Drive
|
| 105 |
+
# (Replace with your actual model path)
|
| 106 |
+
|
| 107 |
+
# Option 2: Use Hugging Face (if uploaded)
|
| 108 |
+
# Will be automatically downloaded on first run
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
---
|
| 112 |
+
|
| 113 |
+
## ๐ Project Structure
|
| 114 |
+
|
| 115 |
+
```
|
| 116 |
+
Multilingual-QA-System/
|
| 117 |
+
โโโ app/
|
| 118 |
+
โ โโโ __init__.py # Package initialization
|
| 119 |
+
โ โโโ model_loader.py # Model loading logic
|
| 120 |
+
โ โโโ inference.py # Inference/prediction engine
|
| 121 |
+
โ โโโ interface.py # Gradio UI components
|
| 122 |
+
โ โโโ utils.py # Utility functions
|
| 123 |
+
โ
|
| 124 |
+
โโโ models/
|
| 125 |
+
โ โโโ multilingual_model/ # Saved model files
|
| 126 |
+
โ โโโ adapter_config.json
|
| 127 |
+
โ โโโ adapter_model.bin
|
| 128 |
+
โ โโโ tokenizer_config.json
|
| 129 |
+
โ โโโ ...
|
| 130 |
+
โ
|
| 131 |
+
โโโ checkpoints/ # Training checkpoints
|
| 132 |
+
โ โโโ checkpoint-500/
|
| 133 |
+
โ โโโ checkpoint-1000/
|
| 134 |
+
โ โโโ ...
|
| 135 |
+
โ
|
| 136 |
+
โโโ logs/ # Training logs
|
| 137 |
+
โ โโโ training.log
|
| 138 |
+
โ
|
| 139 |
+
โโโ notebook/ # Original Jupyter notebook
|
| 140 |
+
โ โโโ main.ipynb
|
| 141 |
+
โ
|
| 142 |
+
โโโ app.py # Main application entry point
|
| 143 |
+
โโโ requirements.txt # Python dependencies
|
| 144 |
+
โโโ README.md # This file
|
| 145 |
+
โโโ .gitignore # Git ignore rules
|
| 146 |
+
โโโ LICENSE # MIT License
|
| 147 |
+
|
| 148 |
+
```
|
| 149 |
+
|
| 150 |
+
---
|
| 151 |
+
|
| 152 |
+
## ๐ป Usage
|
| 153 |
+
|
| 154 |
+
### 1. Launch the Web Interface
|
| 155 |
+
|
| 156 |
+
```bash
|
| 157 |
+
python app.py
|
| 158 |
+
```
|
| 159 |
+
|
| 160 |
+
Then open your browser to **http://localhost:7860**
|
| 161 |
+
|
| 162 |
+
### 2. Programmatic Usage
|
| 163 |
+
|
| 164 |
+
```python
|
| 165 |
+
from app.model_loader import ModelLoader
|
| 166 |
+
from app.inference import QAInference
|
| 167 |
+
|
| 168 |
+
# Load model
|
| 169 |
+
loader = ModelLoader(model_path="models/multilingual_model")
|
| 170 |
+
model, tokenizer = loader.load()
|
| 171 |
+
|
| 172 |
+
# Create inference engine
|
| 173 |
+
qa = QAInference(model, tokenizer, loader.device)
|
| 174 |
+
|
| 175 |
+
# English example
|
| 176 |
+
answer, info = qa.answer_question(
|
| 177 |
+
question="What is the capital of France?",
|
| 178 |
+
context="Paris is the capital and most populous city of France.",
|
| 179 |
+
language="English"
|
| 180 |
+
)
|
| 181 |
+
print(f"Answer: {answer}")
|
| 182 |
+
|
| 183 |
+
# German example
|
| 184 |
+
answer_de, info_de = qa.answer_question(
|
| 185 |
+
question="Was ist die Hauptstadt von Deutschland?",
|
| 186 |
+
context="Berlin ist die Hauptstadt von Deutschland.",
|
| 187 |
+
language="German"
|
| 188 |
+
)
|
| 189 |
+
print(f"Antwort: {answer_de}")
|
| 190 |
+
```
|
| 191 |
+
|
| 192 |
+
### 3. API Server (Coming Soon)
|
| 193 |
+
|
| 194 |
+
```bash
|
| 195 |
+
# Launch FastAPI server
|
| 196 |
+
python -m app.api --host 0.0.0.0 --port 8000
|
| 197 |
+
```
|
| 198 |
+
|
| 199 |
+
---
|
| 200 |
+
|
| 201 |
+
## ๐ง Model Details
|
| 202 |
+
|
| 203 |
+
### Architecture
|
| 204 |
+
- **Base Model:** `facebook/mbart-large-50-many-to-many-mmt`
|
| 205 |
+
- 610M total parameters
|
| 206 |
+
- Pre-trained on 50 languages
|
| 207 |
+
- Sequence-to-sequence architecture
|
| 208 |
+
|
| 209 |
+
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
|
| 210 |
+
- Rank (r): 8
|
| 211 |
+
- Alpha: 32
|
| 212 |
+
- Target modules: `q_proj`, `k_proj`, `v_proj`
|
| 213 |
+
- Only **1.77M trainable parameters** (0.29% of total)
|
| 214 |
+
|
| 215 |
+
### Training Data
|
| 216 |
+
- **English:** SQuAD v1.1
|
| 217 |
+
- 20,000 samples (from 87,599 available)
|
| 218 |
+
- Balanced sampling across topics
|
| 219 |
+
|
| 220 |
+
- **German:** XQuAD (German)
|
| 221 |
+
- ~950 samples (80% of 1,190 available)
|
| 222 |
+
- Cross-lingual evaluation dataset
|
| 223 |
+
|
| 224 |
+
### Hyperparameters
|
| 225 |
+
```python
|
| 226 |
+
{
|
| 227 |
+
"learning_rate": 3e-4,
|
| 228 |
+
"batch_size": 16 (2 * 8 gradient accumulation),
|
| 229 |
+
"epochs": 3,
|
| 230 |
+
"max_source_length": 256,
|
| 231 |
+
"max_target_length": 64,
|
| 232 |
+
"fp16": True,
|
| 233 |
+
"optimizer": "AdamW",
|
| 234 |
+
"weight_decay": 0.01
|
| 235 |
+
}
|
| 236 |
+
```
|
| 237 |
+
|
| 238 |
+
---
|
| 239 |
+
|
| 240 |
+
## ๐ง Training
|
| 241 |
+
|
| 242 |
+
### Train from Scratch
|
| 243 |
+
|
| 244 |
+
```bash
|
| 245 |
+
# See notebook/main.ipynb for full training pipeline
|
| 246 |
+
jupyter notebook notebook/main.ipynb
|
| 247 |
+
```
|
| 248 |
+
|
| 249 |
+
### Key Training Steps
|
| 250 |
+
|
| 251 |
+
1. **Data Preparation**
|
| 252 |
+
- Load SQuAD and XQuAD datasets
|
| 253 |
+
- Convert to text-to-text format
|
| 254 |
+
- Tokenize with mBART tokenizer
|
| 255 |
+
|
| 256 |
+
2. **Model Setup**
|
| 257 |
+
- Load base mBART model
|
| 258 |
+
- Apply LoRA configuration
|
| 259 |
+
- Configure language tokens
|
| 260 |
+
|
| 261 |
+
3. **Training**
|
| 262 |
+
- English: 3 epochs (~2 hours on T4 GPU)
|
| 263 |
+
- German: 3 epochs (~30 minutes on T4 GPU)
|
| 264 |
+
- Total: ~2.5 hours
|
| 265 |
+
|
| 266 |
+
4. **Evaluation**
|
| 267 |
+
- BLEU, ROUGE, Exact Match, F1
|
| 268 |
+
- Cross-lingual performance analysis
|
| 269 |
+
|
| 270 |
+
---
|
| 271 |
+
|
| 272 |
+
## โ ๏ธ Limitations
|
| 273 |
+
|
| 274 |
+
### Current Constraints
|
| 275 |
+
1. **Long Context** - Performance degrades with passages >500 words
|
| 276 |
+
2. **Complex Questions** - Multi-hop reasoning not supported
|
| 277 |
+
3. **Answer Presence** - Answer must be explicitly stated in context
|
| 278 |
+
4. **Languages** - Only English and German supported
|
| 279 |
+
5. **Training Data** - Limited to 20K English + 1K German samples
|
| 280 |
+
|
| 281 |
+
### Why These Exist
|
| 282 |
+
- โ๏ธ **Context truncation** due to GPU memory constraints
|
| 283 |
+
- ๐งฎ **Simple architecture** optimized for extractive QA only
|
| 284 |
+
- โก **Fast training** prioritized over maximum performance
|
| 285 |
+
|
| 286 |
+
---
|
| 287 |
+
|
| 288 |
+
## ๐ฏ Future Improvements
|
| 289 |
+
|
| 290 |
+
- [ ] Increase context window to 512 tokens
|
| 291 |
+
- [ ] Add more languages (French, Spanish, Chinese)
|
| 292 |
+
- [ ] Implement answer confidence scoring
|
| 293 |
+
- [ ] Add data augmentation techniques
|
| 294 |
+
- [ ] Deploy as REST API with FastAPI
|
| 295 |
+
- [ ] Create Docker container for easy deployment
|
| 296 |
+
- [ ] Add answer verification layer
|
| 297 |
+
- [ ] Support generative (non-extractive) answers
|
| 298 |
+
|
| 299 |
+
---
|
| 300 |
+
|
| 301 |
+
## ๐ Citation
|
| 302 |
+
|
| 303 |
+
If you use this project in your research or work, please cite:
|
| 304 |
+
|
| 305 |
+
```bibtex
|
| 306 |
+
@software{verma2025multilingual_qa,
|
| 307 |
+
author = {Verma, Praanshull},
|
| 308 |
+
title = {Multilingual Question Answering System with mBART and LoRA},
|
| 309 |
+
year = {2025},
|
| 310 |
+
publisher = {GitHub},
|
| 311 |
+
url = {https://github.com/Praanshull/multilingual-qa-system}
|
| 312 |
+
}
|
| 313 |
+
```
|
| 314 |
+
|
| 315 |
+
---
|
| 316 |
+
|
| 317 |
+
## ๐ License
|
| 318 |
+
|
| 319 |
+
This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.
|
| 320 |
+
|
| 321 |
+
---
|
| 322 |
+
|
| 323 |
+
## ๐จโ๐ป Author
|
| 324 |
+
|
| 325 |
+
**Praanshull Verma**
|
| 326 |
+
- GitHub: [@Praanshull](https://github.com/Praanshull)
|
| 327 |
+
- LinkedIn: [Your LinkedIn]
|
| 328 |
+
|
| 329 |
+
---
|
| 330 |
+
|
| 331 |
+
## ๐ Acknowledgments
|
| 332 |
+
|
| 333 |
+
- **Hugging Face** - For Transformers library and model hosting
|
| 334 |
+
- **Facebook AI** - For mBART pre-trained model
|
| 335 |
+
- **Stanford NLP** - For SQuAD dataset
|
| 336 |
+
- **Google Research** - For XQuAD dataset
|
| 337 |
+
- **PEFT Team** - For LoRA implementation
|
| 338 |
+
|
| 339 |
+
---
|
| 340 |
+
|
| 341 |
+
## ๐ Support
|
| 342 |
+
|
| 343 |
+
If you encounter any issues or have questions:
|
| 344 |
+
|
| 345 |
+
1. Check [Issues](https://github.com/Praanshull/multilingual-qa-system/issues)
|
| 346 |
+
2. Create a new issue with detailed description
|
| 347 |
+
3. Reach out on LinkedIn
|
| 348 |
+
|
| 349 |
+
---
|
| 350 |
+
|
| 351 |
+
<div align="center">
|
| 352 |
+
|
| 353 |
+
**Built with โค๏ธ using PyTorch, Transformers, and Gradio**
|
| 354 |
+
|
| 355 |
+
โญ Star this repo if you find it helpful!
|
| 356 |
+
|
| 357 |
+
</div>
|