Spaces:
Sleeping
๐ Multilingual Question Answering System
A state-of-the-art multilingual question answering system supporting English ๐ฌ๐ง and German ๐ฉ๐ช, built with mBART-large-50 fine-tuned using LoRA (Low-Rank Adaptation).
๐ Table of Contents
- Overview
- Key Features
- Performance
- Installation
- Project Structure
- Usage
- Model Details
- Training
- Limitations
- Future Improvements
- Citation
- License
๐ฏ Overview
This project implements a bilingual extractive question answering system that can:
- Extract answers from English contexts
- Extract answers from German contexts
- Achieve high accuracy with minimal training data through transfer learning
- Run efficiently using Parameter-Efficient Fine-Tuning (LoRA)
What is Extractive QA?
The model reads a passage (context) and a question, then extracts the exact answer span from the context.
Example:
- Question: "What is the capital of France?"
- Context: "Paris is the capital and most populous city of France."
- Answer: "Paris"
โจ Key Features
โ Bilingual Support - English and German โ Fast Inference - <1 second per query on GPU โ **Memory Efficient** - Uses LoRA (only 0.29% trainable parameters) โ **High Accuracy** - >65% F1 score on both languages โ Easy Deployment - Gradio web interface included โ Well Documented - Comprehensive code comments and README
๐ Performance
Model Metrics
| Metric | English (SQuAD) | German (XQuAD) | Improvement |
|---|---|---|---|
| BLEU | 37.79 | 43.12 | +5.33 |
| ROUGE-L | 0.6272 | 0.6622 | +0.035 |
| Exact Match | 43.60% | 48.74% | +5.14% |
| F1 Score | 0.6329 | 0.6580 | +0.025 |
| Avg (EM+F1) | 0.5344 | 0.5727 | +0.038 |
Key Insights
- ๐ German achieves 107.2% of English performance despite having only ~5% of training data
- ๐ Strong transfer learning from English to German
- ๐ช Better German scores demonstrate effective cross-lingual adaptation
๐ Installation
Prerequisites
- Python 3.8+
- CUDA-capable GPU (recommended, 8GB+ VRAM)
- 16GB+ RAM
Setup
- Clone the repository
git clone https://github.com/Praanshull/multilingual-qa-system.git
cd multilingual-qa-system
- Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies
pip install -r requirements.txt
- Download the model
# Option 1: Download from your Google Drive
# (Replace with your actual model path)
# Option 2: Use Hugging Face (if uploaded)
# Will be automatically downloaded on first run
๐ Project Structure
Multilingual-QA-System/
โโโ app/
โ โโโ __init__.py # Package initialization
โ โโโ model_loader.py # Model loading logic
โ โโโ inference.py # Inference/prediction engine
โ โโโ interface.py # Gradio UI components
โ โโโ utils.py # Utility functions
โ
โโโ models/
โ โโโ multilingual_model/ # Saved model files
โ โโโ adapter_config.json
โ โโโ adapter_model.bin
โ โโโ tokenizer_config.json
โ โโโ ...
โ
โโโ checkpoints/ # Training checkpoints
โ โโโ checkpoint-500/
โ โโโ checkpoint-1000/
โ โโโ ...
โ
โโโ logs/ # Training logs
โ โโโ training.log
โ
โโโ notebook/ # Original Jupyter notebook
โ โโโ main.ipynb
โ
โโโ app.py # Main application entry point
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
โโโ .gitignore # Git ignore rules
โโโ LICENSE # MIT License
๐ป Usage
1. Launch the Web Interface
python app.py
Then open your browser to http://localhost:7860
2. Programmatic Usage
from app.model_loader import ModelLoader
from app.inference import QAInference
# Load model
loader = ModelLoader(model_path="models/multilingual_model")
model, tokenizer = loader.load()
# Create inference engine
qa = QAInference(model, tokenizer, loader.device)
# English example
answer, info = qa.answer_question(
question="What is the capital of France?",
context="Paris is the capital and most populous city of France.",
language="English"
)
print(f"Answer: {answer}")
# German example
answer_de, info_de = qa.answer_question(
question="Was ist die Hauptstadt von Deutschland?",
context="Berlin ist die Hauptstadt von Deutschland.",
language="German"
)
print(f"Antwort: {answer_de}")
3. API Server (Coming Soon)
# Launch FastAPI server
python -m app.api --host 0.0.0.0 --port 8000
๐ง Model Details
Architecture
Base Model:
facebook/mbart-large-50-many-to-many-mmt- 610M total parameters
- Pre-trained on 50 languages
- Sequence-to-sequence architecture
Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Rank (r): 8
- Alpha: 32
- Target modules:
q_proj,k_proj,v_proj - Only 1.77M trainable parameters (0.29% of total)
Training Data
English: SQuAD v1.1
- 20,000 samples (from 87,599 available)
- Balanced sampling across topics
German: XQuAD (German)
- ~950 samples (80% of 1,190 available)
- Cross-lingual evaluation dataset
Hyperparameters
{
"learning_rate": 3e-4,
"batch_size": 16 (2 * 8 gradient accumulation),
"epochs": 3,
"max_source_length": 256,
"max_target_length": 64,
"fp16": True,
"optimizer": "AdamW",
"weight_decay": 0.01
}
๐ง Training
Train from Scratch
# See notebook/main.ipynb for full training pipeline
jupyter notebook notebook/main.ipynb
Key Training Steps
Data Preparation
- Load SQuAD and XQuAD datasets
- Convert to text-to-text format
- Tokenize with mBART tokenizer
Model Setup
- Load base mBART model
- Apply LoRA configuration
- Configure language tokens
Training
- English: 3 epochs (~2 hours on T4 GPU)
- German: 3 epochs (~30 minutes on T4 GPU)
- Total: ~2.5 hours
Evaluation
- BLEU, ROUGE, Exact Match, F1
- Cross-lingual performance analysis
โ ๏ธ Limitations
Current Constraints
- Long Context - Performance degrades with passages >500 words
- Complex Questions - Multi-hop reasoning not supported
- Answer Presence - Answer must be explicitly stated in context
- Languages - Only English and German supported
- Training Data - Limited to 20K English + 1K German samples
Why These Exist
- โ๏ธ Context truncation due to GPU memory constraints
- ๐งฎ Simple architecture optimized for extractive QA only
- โก Fast training prioritized over maximum performance
๐ฏ Future Improvements
- Increase context window to 512 tokens
- Add more languages (French, Spanish, Chinese)
- Implement answer confidence scoring
- Add data augmentation techniques
- Deploy as REST API with FastAPI
- Create Docker container for easy deployment
- Add answer verification layer
- Support generative (non-extractive) answers
๐ Citation
If you use this project in your research or work, please cite:
@software{verma2025multilingual_qa,
author = {Verma, Praanshull},
title = {Multilingual Question Answering System with mBART and LoRA},
year = {2025},
publisher = {GitHub},
url = {https://github.com/Praanshull/multilingual-qa-system}
}
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐จโ๐ป Author
Praanshull Verma
- GitHub: @Praanshull
- LinkedIn: [Your LinkedIn]
๐ Acknowledgments
- Hugging Face - For Transformers library and model hosting
- Facebook AI - For mBART pre-trained model
- Stanford NLP - For SQuAD dataset
- Google Research - For XQuAD dataset
- PEFT Team - For LoRA implementation
๐ Support
If you encounter any issues or have questions:
- Check Issues
- Create a new issue with detailed description
- Reach out on LinkedIn
Built with โค๏ธ using PyTorch, Transformers, and Gradio
โญ Star this repo if you find it helpful!