|
|
--- |
|
|
title: Agentic Health Coach Medgemma |
|
|
emoji: π¬ |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: 5.33.1 |
|
|
app_file: app.py |
|
|
pinned: true |
|
|
tags: |
|
|
- agent-demo-track |
|
|
license: mit |
|
|
short_description: agentic medGemma health coach with vllm. |
|
|
--- |
|
|
|
|
|
[Youtube explainer (7 mins)](https://youtu.be/NwTKnTHfZAg) |
|
|
Nb. Modal backend is turned off since completion of hackathon. |
|
|
Host your own Modal LLM endpoint by referring to the .py files. |
|
|
|
|
|
# MedGemma Agent: AI-Powered Medical Assistant |
|
|
|
|
|
## π₯ Overview |
|
|
|
|
|
MedGemma Agent is an advanced AI-powered medical assistant that provides accessible and accurate medical information to patients and non-medical professionals. Built on top of Google's MedGemma model, this application combines state-of-the-art medical language understanding with multimodal capabilities to deliver clear, concise, and reliable medical insights. |
|
|
|
|
|
## β¨ Key Features |
|
|
|
|
|
- **Multimodal Understanding**: Process both text queries and medical images |
|
|
- **Real-time Responses**: Stream responses for an interactive experience |
|
|
- **Wikipedia Integration**: Access to verified medical information |
|
|
- **User-friendly Interface**: Clean, modern UI with example queries |
|
|
- **Secure API**: Protected endpoints with API key authentication |
|
|
|
|
|
## π Technical Implementation |
|
|
|
|
|
### Backend Architecture |
|
|
|
|
|
The application is built using: |
|
|
- **Modal**: For serverless deployment and GPU acceleration |
|
|
- **FastAPI**: For robust API endpoints |
|
|
- **VLLM**: For efficient model inference |
|
|
- **MedGemma-4B**: Fine-tuned medical language model |
|
|
- **Wikipedia API**: For additional medical context |
|
|
|
|
|
### Key Components |
|
|
|
|
|
1. **Model Deployment** |
|
|
- Utilizes Modal's GPU-accelerated containers |
|
|
- Implements efficient model loading with VLLM |
|
|
- Supports bfloat16 precision for optimal performance |
|
|
|
|
|
2. **API Layer** |
|
|
- Streaming responses for real-time interaction |
|
|
- Secure API key authentication |
|
|
- Base64 image processing for multimodal inputs |
|
|
|
|
|
3. **Frontend Interface** |
|
|
- Built with Gradio for seamless user interaction |
|
|
- Custom CSS theming for professional appearance |
|
|
- Example queries for common medical scenarios |
|
|
|
|
|
## π οΈ Usage |
|
|
|
|
|
1. **Text Queries** |
|
|
- Ask medical questions in natural language |
|
|
- Get clear, patient-friendly explanations |
|
|
- Example: "What are the symptoms of a stroke?" |
|
|
|
|
|
2. **Image Analysis** |
|
|
- Upload medical images for analysis |
|
|
- Get AI-powered insights about the image |
|
|
- Supports common medical image formats |
|
|
|
|
|
## π Security |
|
|
|
|
|
- API key authentication for all requests |
|
|
- Secure image processing |
|
|
- Protected model endpoints |
|
|
|
|
|
## ποΈ Technical Stack |
|
|
|
|
|
- **Backend**: Modal, FastAPI, VLLM |
|
|
- **Frontend**: Gradio |
|
|
- **Model**: MedGemma-4B (unsloth/medgemma-4b-it-unsloth-bnb-4bit) |
|
|
- **Additional Tools**: Wikipedia API for medical context |
|
|
|
|
|
## π― Performance |
|
|
|
|
|
- Optimized for low latency responses |
|
|
- GPU-accelerated inference |
|
|
- Efficient memory utilization with 4-bit quantization |
|
|
- Maximum context length of 8192 tokens |
|
|
|
|
|
## π€ Contributing |
|
|
|
|
|
We welcome contributions! Please feel free to submit issues and pull requests. |
|
|
|
|
|
## π License |
|
|
|
|
|
This project is licensed under the MIT License - see the LICENSE file for details. |
|
|
|
|
|
--- |
|
|
|
|
|
Built with β€οΈ for the Hugging Face Spaces Hackathon. |