Spaces:

Krishna346
/

Youtube-summarizer-api

Running

App Files Files Community

bskrishna2006 commited on 6 days ago

Commit

dfbb2da

0 Parent(s):

Initial backend deployment

Browse files

Files changed (12) hide show

.gitignore +28 -0
DEPLOY.md +160 -0
Dockerfile +49 -0
README.md +43 -0
app.py +459 -0
config.py +169 -0
requirements.txt +48 -0
services/__init__.py +1 -0
services/speech_to_text.py +303 -0
services/summarizer.py +141 -0
services/transcript.py +241 -0
services/translation.py +330 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,28 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+env/
+venv/
+.venv/
+# Environment
+.env
+.env.local
+# IDE
+.vscode/
+.idea/
+*.swp
+# Logs
+*.log
+# Temp files
+temp/
+*.tmp
+# Models cache (will be in container)
+.cache/

DEPLOY.md ADDED Viewed

	@@ -0,0 +1,160 @@

+# 🚀 Deploying to Hugging Face Spaces
+This guide will help you deploy the YouTube Summarizer API to Hugging Face Spaces for FREE cloud hosting.
+## Prerequisites
+1. A [Hugging Face account](https://huggingface.co/join)
+2. Git installed on your system
+3. Your Groq API key (from https://console.groq.com)
+---
+## Step 1: Create a Hugging Face Space
+1. Go to [huggingface.co/new-space](https://huggingface.co/new-space)
+2. Fill in the form:
+   - **Owner**: Select your username
+   - **Space name**: `youtube-summarizer-api`
+   - **License**: MIT
+   - **SDK**: Select **Docker**
+   - **Hardware**: CPU basic (Free)
+   - Leave other options as default
+3. Click **"Create Space"**
+---
+## Step 2: Clone and Push
+Open PowerShell/Terminal and run:
+```powershell
+# Navigate to the deploy folder
+cd "c:\Users\Krishna\Desktop\Updated Yt summarizer\backend\deploy"
+# Initialize git repository
+git init
+# Add Hugging Face as remote (replace YOUR_USERNAME with your HF username)
+git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/youtube-summarizer-api
+# Add all files
+git add .
+# Commit
+git commit -m "Initial deployment"
+# Push to Hugging Face (you'll be prompted for credentials)
+git push -u origin main
+```
+**For authentication**, you'll need to use:
+- Username: Your Hugging Face username
+- Password: Your Hugging Face Access Token (create one at Settings → Access Tokens)
+---
+## Step 3: Add Your Groq API Key
+1. Go to your Space: `https://huggingface.co/spaces/YOUR_USERNAME/youtube-summarizer-api`
+2. Click **Settings** (gear icon)
+3. Scroll to **Variables and secrets**
+4. Click **"New secret"** and add:
+   - **Name**: `GROQ_API_KEY`
+   - **Value**: Your Groq API key
+5. Click **Save**
+---
+## Step 4: Wait for Build
+The first build takes **10-15 minutes** because it:
+1. Builds the Docker image
+2. Installs all dependencies
+3. Sets up the environment
+You can watch the build progress in the "Logs" tab of your Space.
+---
+## Step 5: Test Your API
+Once the status shows **"Running"**, your API is live!
+### Test health check:
+```bash
+curl https://YOUR_USERNAME-youtube-summarizer-api.hf.space/api/health
+```
+### Test full pipeline:
+```bash
+curl -X POST https://YOUR_USERNAME-youtube-summarizer-api.hf.space/api/process \
+  -H "Content-Type: application/json" \
+  -d '{"url": "https://www.youtube.com/watch?v=jNQXAC9IVRw", "summary_type": "general"}'
+```
+---
+## Step 6: Update Your Frontend
+Update your frontend `.env` file:
+```env
+VITE_API_URL=https://YOUR_USERNAME-youtube-summarizer-api.hf.space
+```
+Then restart your frontend dev server.
+---
+## Troubleshooting
+### Build Failed?
+- Check the "Logs" tab for error messages
+- Make sure all files are properly committed
+### API Not Responding?
+- The Space may be sleeping (wakes up on first request, takes ~30s)
+- Check if GROQ_API_KEY secret is set
+### Out of Memory?
+- The free tier has 16GB RAM, which should be enough
+- Consider upgrading to paid tier if needed
+---
+## API Endpoints Summary
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/` | GET | Health check |
+| `/api/health` | GET | Detailed status |
+| `/api/languages` | GET | Supported languages |
+| `/api/transcript` | POST | Extract transcript |
+| `/api/translate` | POST | Translate text |
+| `/api/summarize` | POST | Generate summary |
+| `/api/process` | POST | Full pipeline |
+---
+## Cost
+| Tier | Cost | RAM | GPU |
+|------|------|-----|-----|
+| **Free** | $0 | 16GB | CPU only |
+| Upgraded | $0.60/hr | 16GB | GPU |
+The free tier is sufficient for this application!
+---
+## Need Help?
+- Hugging Face Docs: https://huggingface.co/docs/hub/spaces
+- Docker Spaces: https://huggingface.co/docs/hub/spaces-sdks-docker

Dockerfile ADDED Viewed

	@@ -0,0 +1,49 @@

+# Use Python 3.10 slim image for smaller size
+FROM python:3.10-slim
+# Set environment variables
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+ENV TRANSFORMERS_CACHE=/app/.cache
+ENV HF_HOME=/app/.cache
+# Install system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ffmpeg \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Create non-root user for security
+RUN useradd -m -u 1000 appuser
+# Set working directory
+WORKDIR /app
+# Copy requirements first (for Docker layer caching)
+COPY requirements.txt .
+# Install Python dependencies
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY . .
+# Create cache directory with proper permissions
+RUN mkdir -p /app/.cache && chown -R appuser:appuser /app
+# Switch to non-root user
+USER appuser
+# Expose port (Hugging Face Spaces uses 7860)
+EXPOSE 7860
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+    CMD python -c "import requests; requests.get('http://localhost:7860/api/health')" || exit 1
+# Run with gunicorn for production
+# - timeout 600s for long model loading times
+# - workers 1 to save memory (models are heavy)
+# - threads 4 for concurrent requests
+CMD ["gunicorn", "--bind", "0.0.0.0:7860", "--timeout", "600", "--workers", "1", "--threads", "4", "app:app"]

README.md ADDED Viewed

	@@ -0,0 +1,43 @@

+---
+title: YouTube Summarizer API
+emoji: 🎬
+colorFrom: purple
+colorTo: blue
+sdk: docker
+app_port: 7860
+---
+# YouTube Video Summarizer API
+A multilingual Flask API for summarizing YouTube videos using AI.
+## Features
+- 🎤 **Speech-to-Text**: Whisper for videos without subtitles
+- 🌐 **11 Languages**: English + 10 Indian languages
+- 🔄 **Translation**: NLLB-200 for multilingual support
+- 🤖 **AI Summarization**: Groq LLaMA 3.1
+## API Endpoints
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| GET | `/` | Health check |
+| GET | `/api/health` | API status |
+| GET | `/api/languages` | Supported languages |
+| POST | `/api/transcript` | Extract transcript |
+| POST | `/api/translate` | Translate text |
+| POST | `/api/summarize` | Generate summary |
+| POST | `/api/process` | Full pipeline |
+## Usage
+```bash
+curl -X POST https://YOUR-SPACE.hf.space/api/process \
+  -H "Content-Type: application/json" \
+  -d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "summary_type": "bullet_points"}'
+```
+## Models Used
+- **Whisper**: openai/whisper-small (~500MB)
+- **NLLB-200**: facebook/nllb-200-distilled-600M (~2.4GB)
+- **Summarization**: Groq API (LLaMA 3.1)

app.py ADDED Viewed

	@@ -0,0 +1,459 @@

+"""
+YouTube Video Summarizer API - Hugging Face Spaces Edition
+Flask backend deployed on Hugging Face Spaces.
+Provides multilingual YouTube video summarization using:
+- Whisper (speech-to-text)
+- NLLB-200 (translation)
+- Groq API (summarization)
+All ML models are FREE and run locally on HF Spaces infrastructure.
+"""
+from flask import Flask, request, jsonify
+from flask_cors import CORS
+from dotenv import load_dotenv
+import os
+import logging
+from services.transcript import TranscriptService
+from services.summarizer import SummarizerService
+from config import (
+    SUPPORTED_LANGUAGES,
+    get_language_name,
+    is_english,
+)
+# Load environment variables
+load_dotenv()
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+app = Flask(__name__)
+# Enable CORS for all origins (allow frontend from any domain)
+CORS(app, resources={
+    r"/*": {
+        "origins": "*",
+        "methods": ["GET", "POST", "OPTIONS"],
+        "allow_headers": ["Content-Type", "Authorization"]
+    }
+})
+# Initialize services (lazy-loaded for heavy models)
+transcript_service = TranscriptService()
+summarizer_service = SummarizerService()
+# Translation service is lazy-loaded to avoid loading 2.4GB model on startup
+_translation_service = None
+def get_translation_service():
+    """Lazy-load the translation service."""
+    global _translation_service
+    if _translation_service is None:
+        from services.translation import TranslationService
+        _translation_service = TranslationService()
+    return _translation_service
+# =============================================================================
+# ROOT & HEALTH ENDPOINTS
+# =============================================================================
+@app.route('/', methods=['GET'])
+def root():
+    """Root endpoint - serves as health check for HF Spaces"""
+    return jsonify({
+        'status': 'healthy',
+        'service': 'YouTube Summarizer API',
+        'version': '2.0.0',
+        'docs': '/api/health for detailed status'
+    }), 200
+@app.route('/api/health', methods=['GET'])
+def health_check():
+    """Detailed health check endpoint"""
+    return jsonify({
+        'status': 'healthy',
+        'message': 'YouTube Summarizer API is running on Hugging Face Spaces',
+        'version': '2.0.0',
+        'features': ['multilingual', 'whisper', 'translation'],
+        'models': {
+            'whisper': 'openai/whisper-small',
+            'translation': 'facebook/nllb-200-distilled-600M',
+            'summarization': 'groq/llama-3.1-8b-instant'
+        }
+    }), 200
+@app.route('/api/languages', methods=['GET'])
+def get_languages():
+    """Get list of supported languages"""
+    return jsonify({
+        'success': True,
+        'languages': SUPPORTED_LANGUAGES
+    }), 200
+@app.route('/api/warmup', methods=['POST'])
+def warmup_models():
+    """
+    Pre-load ML models to avoid delay on first request.
+    This can take 2-5 minutes on first run (downloading models).
+    """
+    try:
+        results = {}
+        data = request.get_json() or {}
+        if data.get('translation', False):
+            logger.info("Warming up translation model...")
+            translation_service = get_translation_service()
+            translation_service.warmup()
+            results['translation'] = 'loaded'
+        if data.get('whisper', False):
+            logger.info("Warming up Whisper model...")
+            from services.speech_to_text import SpeechToTextService
+            stt = SpeechToTextService()
+            stt.warmup()
+            results['whisper'] = 'loaded'
+        return jsonify({
+            'success': True,
+            'message': 'Models warmed up successfully',
+            'models': results
+        }), 200
+    except Exception as e:
+        logger.error(f"Warmup failed: {e}")
+        return jsonify({
+            'error': 'Warmup failed',
+            'message': str(e)
+        }), 500
+# =============================================================================
+# TRANSCRIPT ENDPOINTS
+# =============================================================================
+@app.route('/api/transcript', methods=['POST'])
+def get_transcript():
+    """
+    Extract transcript from YouTube video (multilingual).
+    Request: { "url": "youtube_url", "use_whisper": true }
+    Response: { "success": true, "transcript": "...", "language": "tam", ... }
+    """
+    try:
+        data = request.get_json()
+        if not data or 'url' not in data:
+            return jsonify({
+                'error': 'Missing YouTube URL',
+                'message': 'Please provide a valid YouTube URL'
+            }), 400
+        url = data['url']
+        use_whisper = data.get('use_whisper', True)
+        video_id = transcript_service.extract_video_id(url)
+        result = transcript_service.get_video_transcript(url, use_whisper_fallback=use_whisper)
+        return jsonify({
+            'success': True,
+            'video_id': video_id,
+            'transcript': result['transcript'],
+            'language': result['language'],
+            'language_name': get_language_name(result['language']),
+            'source': result['source'],
+            'word_count': result['word_count']
+        }), 200
+    except ValueError as e:
+        return jsonify({'error': 'Invalid URL', 'message': str(e)}), 400
+    except Exception as e:
+        logger.error(f"Transcript extraction failed: {e}")
+        return jsonify({'error': 'Transcript extraction failed', 'message': str(e)}), 500
+# =============================================================================
+# TRANSLATION ENDPOINTS
+# =============================================================================
+@app.route('/api/translate', methods=['POST'])
+def translate_text():
+    """
+    Translate text between languages.
+    Request: { "text": "Hello", "source_lang": "eng", "target_lang": "hin" }
+    Response: { "success": true, "translated_text": "नमस्ते", ... }
+    """
+    try:
+        data = request.get_json()
+        if not data or 'text' not in data:
+            return jsonify({
+                'error': 'Missing text',
+                'message': 'Please provide text to translate'
+            }), 400
+        text = data['text']
+        source_lang = data.get('source_lang', 'eng')
+        target_lang = data.get('target_lang', 'hin')
+        translation_service = get_translation_service()
+        translated = translation_service.translate(text, source_lang, target_lang)
+        return jsonify({
+            'success': True,
+            'translated_text': translated,
+            'source_lang': source_lang,
+            'source_lang_name': get_language_name(source_lang),
+            'target_lang': target_lang,
+            'target_lang_name': get_language_name(target_lang)
+        }), 200
+    except ValueError as e:
+        return jsonify({'error': 'Invalid language', 'message': str(e)}), 400
+    except Exception as e:
+        logger.error(f"Translation failed: {e}")
+        return jsonify({'error': 'Translation failed', 'message': str(e)}), 500
+@app.route('/api/detect-language', methods=['POST'])
+def detect_language():
+    """Detect the language of given text."""
+    try:
+        data = request.get_json()
+        if not data or 'text' not in data:
+            return jsonify({
+                'error': 'Missing text',
+                'message': 'Please provide text for language detection'
+            }), 400
+        translation_service = get_translation_service()
+        result = translation_service.detect_language(data['text'])
+        return jsonify({
+            'success': True,
+            'language': result['code'],
+            'language_name': result['name']
+        }), 200
+    except Exception as e:
+        logger.error(f"Language detection failed: {e}")
+        return jsonify({'error': 'Language detection failed', 'message': str(e)}), 500
+# =============================================================================
+# SUMMARIZATION ENDPOINTS
+# =============================================================================
+@app.route('/api/summarize', methods=['POST'])
+def summarize():
+    """
+    Generate summary from transcript.
+    Request: { "transcript": "...", "summary_type": "general" }
+    Response: { "success": true, "summary": "...", "statistics": {...} }
+    """
+    try:
+        data = request.get_json()
+        if not data or 'transcript' not in data:
+            return jsonify({
+                'error': 'Missing transcript',
+                'message': 'Please provide transcript text'
+            }), 400
+        transcript = data['transcript']
+        summary_type = data.get('summary_type', 'general')
+        chunk_size = data.get('chunk_size', 2500)
+        max_tokens = data.get('max_tokens', 500)
+        valid_types = ['general', 'detailed', 'bullet_points', 'key_takeaways']
+        if summary_type not in valid_types:
+            return jsonify({
+                'error': 'Invalid summary type',
+                'message': f'Must be one of: {", ".join(valid_types)}'
+            }), 400
+        summary = summarizer_service.summarize(
+            text=transcript,
+            summary_type=summary_type,
+            chunk_size=chunk_size,
+            max_tokens=max_tokens
+        )
+        summary_word_count = len(summary.split())
+        original_word_count = len(transcript.split())
+        compression_ratio = (summary_word_count / original_word_count) * 100 if original_word_count > 0 else 0
+        return jsonify({
+            'success': True,
+            'summary': summary,
+            'statistics': {
+                'original_word_count': original_word_count,
+                'summary_word_count': summary_word_count,
+                'compression_ratio': round(compression_ratio, 1),
+                'reading_time_minutes': max(1, summary_word_count // 200)
+            }
+        }), 200
+    except Exception as e:
+        logger.error(f"Summarization failed: {e}")
+        return jsonify({'error': 'Summarization failed', 'message': str(e)}), 500
+# =============================================================================
+# FULL PIPELINE ENDPOINT
+# =============================================================================
+@app.route('/api/process', methods=['POST'])
+def process_video():
+    """
+    Full multilingual pipeline: Transcript → Translation → Summary → Translation
+    Request: {
+        "url": "youtube_url",
+        "summary_type": "general",
+        "target_language": "hin" (optional)
+    }
+    """
+    try:
+        data = request.get_json()
+        if not data or 'url' not in data:
+            return jsonify({
+                'error': 'Missing YouTube URL',
+                'message': 'Please provide a valid YouTube URL'
+            }), 400
+        url = data['url']
+        summary_type = data.get('summary_type', 'general')
+        target_language = data.get('target_language', 'eng')
+        chunk_size = data.get('chunk_size', 2500)
+        max_tokens = data.get('max_tokens', 500)
+        # Step 1: Extract video ID
+        video_id = transcript_service.extract_video_id(url)
+        logger.info(f"Processing video: {video_id}")
+        # Step 2: Get transcript with language
+        logger.info("Step 1/4: Extracting transcript...")
+        transcript_result = transcript_service.get_video_transcript(url, use_whisper_fallback=True)
+        original_transcript = transcript_result['transcript']
+        original_language = transcript_result['language']
+        original_word_count = transcript_result['word_count']
+        # Step 3: Translate to English if needed
+        english_transcript = original_transcript
+        if not is_english(original_language):
+            logger.info("Step 2/4: Translating to English...")
+            translation_service = get_translation_service()
+            english_transcript = translation_service.translate_to_english(
+                original_transcript,
+                original_language
+            )
+        else:
+            logger.info("Step 2/4: Skipped (already English)")
+        # Step 4: Summarize in English
+        logger.info("Step 3/4: Generating summary...")
+        summary = summarizer_service.summarize(
+            text=english_transcript,
+            summary_type=summary_type,
+            chunk_size=chunk_size,
+            max_tokens=max_tokens
+        )
+        # Step 5: Translate summary to target language
+        final_summary = summary
+        summary_language = "eng"
+        if not is_english(target_language):
+            logger.info(f"Step 4/4: Translating summary to {target_language}...")
+            translation_service = get_translation_service()
+            final_summary = translation_service.translate_from_english(summary, target_language)
+            summary_language = target_language
+        else:
+            logger.info("Step 4/4: Skipped (English output)")
+        # Calculate statistics
+        summary_word_count = len(final_summary.split())
+        compression_ratio = (summary_word_count / original_word_count) * 100 if original_word_count > 0 else 0
+        response = {
+            'success': True,
+            'video_id': video_id,
+            'original_language': original_language,
+            'original_language_name': get_language_name(original_language),
+            'transcript': original_transcript,
+            'transcript_source': transcript_result['source'],
+            'summary': final_summary,
+            'summary_language': summary_language,
+            'summary_language_name': get_language_name(summary_language),
+            'statistics': {
+                'original_word_count': original_word_count,
+                'summary_word_count': summary_word_count,
+                'compression_ratio': round(compression_ratio, 1),
+                'reading_time_minutes': max(1, summary_word_count // 200)
+            }
+        }
+        if not is_english(original_language):
+            response['english_transcript'] = english_transcript
+        if not is_english(target_language):
+            response['english_summary'] = summary
+        logger.info("Processing complete!")
+        return jsonify(response), 200
+    except ValueError as e:
+        return jsonify({'error': 'Invalid URL', 'message': str(e)}), 400
+    except Exception as e:
+        logger.error(f"Processing failed: {e}")
+        return jsonify({'error': 'Processing failed', 'message': str(e)}), 500
+# =============================================================================
+# ERROR HANDLERS
+# =============================================================================
+@app.errorhandler(404)
+def not_found(error):
+    return jsonify({
+        'error': 'Not found',
+        'message': 'The requested endpoint does not exist'
+    }), 404
+@app.errorhandler(500)
+def internal_error(error):
+    return jsonify({
+        'error': 'Internal server error',
+        'message': 'An unexpected error occurred'
+    }), 500
+# =============================================================================
+# MAIN (for local testing only - gunicorn is used in production)
+# =============================================================================
+if __name__ == '__main__':
+    port = int(os.environ.get('PORT', 7860))
+    if not os.getenv('GROQ_API_KEY'):
+        print("⚠️  Warning: GROQ_API_KEY not found")
+        print("Set it in HF Spaces Settings → Secrets")
+    print("🚀 Starting YouTube Summarizer API...")
+    print(f"📡 API available at: http://localhost:{port}")
+    app.run(debug=False, host='0.0.0.0', port=port)

config.py ADDED Viewed

	@@ -0,0 +1,169 @@

+"""
+Configuration module for multilingual YouTube summarizer.
+Contains model names, language mappings, and settings.
+All models used are FREE and run LOCALLY - no API costs!
+"""
+import os
+# =============================================================================
+# MODEL CONFIGURATION
+# =============================================================================
+# Whisper model for speech-to-text (runs locally)
+# Options: "openai/whisper-tiny", "openai/whisper-small", "openai/whisper-medium"
+# Smaller = faster but less accurate, larger = slower but more accurate
+WHISPER_MODEL = "openai/whisper-small"
+# NLLB-200 model for translation (runs locally)
+# Using distilled version for lower RAM usage (~2.4GB)
+NLLB_MODEL = "facebook/nllb-200-distilled-600M"
+# Groq model for summarization (free API)
+GROQ_MODEL = "llama-3.1-8b-instant"
+# =============================================================================
+# LANGUAGE CONFIGURATION
+# =============================================================================
+# Mapping from simple language codes to NLLB-200 language codes
+# NLLB uses format: language_Script (e.g., hin_Deva for Hindi in Devanagari)
+LANGUAGE_MAP = {
+    # English (including regional variants)
+    "eng": {"nllb": "eng_Latn", "name": "English", "script": "Latin"},
+    "en": {"nllb": "eng_Latn", "name": "English", "script": "Latin"},
+    "en-in": {"nllb": "eng_Latn", "name": "English", "script": "Latin"},
+    "en-us": {"nllb": "eng_Latn", "name": "English", "script": "Latin"},
+    "en-gb": {"nllb": "eng_Latn", "name": "English", "script": "Latin"},
+    "en-au": {"nllb": "eng_Latn", "name": "English", "script": "Latin"},
+    "english": {"nllb": "eng_Latn", "name": "English", "script": "Latin"},
+    # Hindi (including regional variants)
+    "hin": {"nllb": "hin_Deva", "name": "Hindi", "script": "Devanagari"},
+    "hi": {"nllb": "hin_Deva", "name": "Hindi", "script": "Devanagari"},
+    "hi-in": {"nllb": "hin_Deva", "name": "Hindi", "script": "Devanagari"},
+    # Tamil
+    "tam": {"nllb": "tam_Taml", "name": "Tamil", "script": "Tamil"},
+    "ta": {"nllb": "tam_Taml", "name": "Tamil", "script": "Tamil"},
+    "ta-in": {"nllb": "tam_Taml", "name": "Tamil", "script": "Tamil"},
+    # Telugu
+    "tel": {"nllb": "tel_Telu", "name": "Telugu", "script": "Telugu"},
+    "te": {"nllb": "tel_Telu", "name": "Telugu", "script": "Telugu"},
+    "te-in": {"nllb": "tel_Telu", "name": "Telugu", "script": "Telugu"},
+    # Kannada
+    "kan": {"nllb": "kan_Knda", "name": "Kannada", "script": "Kannada"},
+    "kn": {"nllb": "kan_Knda", "name": "Kannada", "script": "Kannada"},
+    "kn-in": {"nllb": "kan_Knda", "name": "Kannada", "script": "Kannada"},
+    # Malayalam
+    "mal": {"nllb": "mal_Mlym", "name": "Malayalam", "script": "Malayalam"},
+    "ml": {"nllb": "mal_Mlym", "name": "Malayalam", "script": "Malayalam"},
+    "ml-in": {"nllb": "mal_Mlym", "name": "Malayalam", "script": "Malayalam"},
+    "saahjz": {"nllb": "saahjz_Deva", "name": "Sahaj", "script": "Devanagari"},     xxxc     b
+    # Bengali
+    "ben": {"nllb": "ben_Beng", "name": "Bengali", "script": "Bengali"},
+    "bn": {"nllb": "ben_Beng", "name": "Bengali", "script": "Bengali"},
+    "bn-in": {"nllb": "ben_Beng", "name": "Bengali", "script": "Bengali"},
+    "bn-bd": {"nllb": "ben_Beng", "name": "Bengali", "script": "Bengali"},
+    # Marathi
+    "mar": {"nllb": "mar_Deva", "name": "Marathi", "script": "Devanagari"},
+    "mr": {"nllb": "mar_Deva", "name": "Marathi", "script": "Devanagari"},
+    "mr-in": {"nllb": "mar_Deva", "name": "Marathi", "script": "Devanagari"},
+    # Punjabi
+    "pan": {"nllb": "pan_Guru", "name": "Punjabi", "script": "Gurmukhi"},
+    "pa": {"nllb": "pan_Guru", "name": "Punjabi", "script": "Gurmukhi"},
+    "pa-in": {"nllb": "pan_Guru", "name": "Punjabi", "script": "Gurmukhi"},
+    # Urdu
+    "urd": {"nllb": "urd_Arab", "name": "Urdu", "script": "Arabic"},
+    "ur": {"nllb": "urd_Arab", "name": "Urdu", "script": "Arabic"},
+    "ur-pk": {"nllb": "urd_Arab", "name": "Urdu", "script": "Arabic"},
+    "ur-in": {"nllb": "urd_Arab", "name": "Urdu", "script": "Arabic"},
+}
+# List of supported languages for API responses
+SUPPORTED_LANGUAGES = [
+    {"code": "eng", "name": "English", "nllb_code": "eng_Latn"},
+    {"code": "hin", "name": "Hindi", "nllb_code": "hin_Deva"},
+    {"code": "tam", "name": "Tamil", "nllb_code": "tam_Taml"},
+    {"code": "tel", "name": "Telugu", "nllb_code": "tel_Telu"},
+    {"code": "kan", "name": "Kannada", "nllb_code": "kan_Knda"},
+    {"code": "mal", "name": "Malayalam", "nllb_code": "mal_Mlym"},
+    {"code": "guj", "name": "Gujarati", "nllb_code": "guj_Gujr"},
+    {"code": "ben", "name": "Bengali", "nllb_code": "ben_Beng"},
+    {"code": "mar", "name": "Marathi", "nllb_code": "mar_Deva"},
+    {"code": "pan", "name": "Punjabi", "nllb_code": "pan_Guru"},
+    {"code": "urd", "name": "Urdu", "nllb_code": "urd_Arab"},
+]
+# Whisper language code to our language code mapping
+# Whisper returns ISO 639-1 codes, we normalize to our codes
+WHISPER_LANG_MAP = {
+    "en": "eng",
+    "hi": "hin",
+    "ta": "tam",
+    "te": "tel",
+    "kn": "kan",
+    "ml": "mal",
+    "gu": "guj",
+    "bn": "ben",
+    "mr": "mar",
+    "pa": "pan",
+    "ur": "urd",
+}
+# =============================================================================
+# RUNTIME SETTINGS
+# =============================================================================
+# Model loading settings
+# Set to True to load models on startup (slower startup, faster first request)
+# Set to False for lazy loading (faster startup, slower first request)
+PRELOAD_MODELS = False
+# Maximum text length for translation (to avoid OOM errors)
+MAX_TRANSLATION_LENGTH = 5000  # characters
+# Audio extraction settings
+AUDIO_FORMAT = "wav"
+AUDIO_SAMPLE_RATE = 16000  # Whisper expects 16kHz
+# Temporary file settings
+TEMP_DIR = os.path.join(os.path.dirname(__file__), "temp")
+# =============================================================================
+# HELPER FUNCTIONS
+# =============================================================================
+def get_nllb_code(lang_code: str) -> str:
+    """Convert a language code to NLLB-200 format."""
+    lang_code = lang_code.lower().strip()
+    if lang_code in LANGUAGE_MAP:
+        return LANGUAGE_MAP[lang_code]["nllb"]
+    raise ValueError(f"Unsupported language code: {lang_code}")
+def get_language_name(lang_code: str) -> str:
+    """Get the full name of a language from its code."""
+    lang_code = lang_code.lower().strip()
+    if lang_code in LANGUAGE_MAP:
+        return LANGUAGE_MAP[lang_code]["name"]
+    return lang_code
+def normalize_whisper_lang(whisper_code: str) -> str:
+    """Convert Whisper's language code to our format."""
+    whisper_code = whisper_code.lower().strip()
+    return WHISPER_LANG_MAP.get(whisper_code, whisper_code)
+def is_english(lang_code: str) -> bool:
+    """Check if a language code represents English."""
+    lang_code = lang_code.lower().strip()
+    return lang_code in ["en", "eng", "english", "en-in", "en-us", "en-gb", "en-au"]

requirements.txt ADDED Viewed

	@@ -0,0 +1,48 @@

+# =============================================================================
+# Core Flask Dependencies
+# =============================================================================
+Flask==3.0.0
+flask-cors==4.0.0
+gunicorn==21.2.0
+python-dotenv==1.0.0
+Werkzeug==3.0.1
+# =============================================================================
+# HTTP Clients
+# =============================================================================
+requests>=2.31.0
+httpx>=0.24.0,<0.26.0
+# =============================================================================
+# YouTube Download
+# =============================================================================
+yt-dlp>=2024.1.1
+# =============================================================================
+# Groq API for Summarization (FREE)
+# =============================================================================
+groq==0.4.1
+# =============================================================================
+# ML Models (All FREE, run locally)
+# =============================================================================
+# PyTorch - CPU version for HF Spaces free tier
+--extra-index-url https://download.pytorch.org/whl/cpu
+torch>=2.0.0
+torchaudio>=2.0.0
+# Hugging Face Transformers
+transformers>=4.36.0
+# Tokenization for NLLB
+sentencepiece>=0.1.99
+# Audio processing
+soundfile>=0.12.0
+librosa>=0.10.0
+# =============================================================================
+# Language Detection
+# =============================================================================
+langdetect>=1.0.9

services/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Services package for YouTube Summarizer API

services/speech_to_text.py ADDED Viewed

	@@ -0,0 +1,303 @@

+"""
+Speech-to-Text Service using OpenAI Whisper (Local Model)
+This service provides LOCAL speech-to-text transcription using Whisper.
+NO API CALLS - everything runs on your machine for FREE!
+Features:
+- Extracts audio from YouTube videos using yt-dlp
+- Transcribes audio using Whisper (small model by default)
+- Detects the language of the audio automatically
+- Returns both transcript and detected language
+Requirements:
+- FFmpeg must be installed on the system
+- Sufficient RAM (~2GB for whisper-small)
+"""
+import os
+import tempfile
+import logging
+from typing import Optional, Tuple
+import torch
+from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
+import yt_dlp
+from config import (
+    WHISPER_MODEL,
+    AUDIO_FORMAT,
+    AUDIO_SAMPLE_RATE,
+    normalize_whisper_lang,
+)
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+def get_ffmpeg_path() -> Optional[str]:
+    """
+    Get the path to FFmpeg executable directory.
+    Uses static-ffmpeg which provides both ffmpeg and ffprobe.
+    Falls back to system PATH or imageio-ffmpeg.
+    """
+    import shutil
+    # Check if ffmpeg AND ffprobe are in system PATH
+    ffmpeg_path = shutil.which("ffmpeg")
+    ffprobe_path = shutil.which("ffprobe")
+    if ffmpeg_path and ffprobe_path:
+        logger.info(f"Using system FFmpeg: {ffmpeg_path}")
+        return os.path.dirname(ffmpeg_path)
+    # Try static-ffmpeg (provides both ffmpeg and ffprobe)
+    try:
+        import static_ffmpeg
+        # This downloads ffmpeg/ffprobe if not already present
+        ffmpeg_path, ffprobe_path = static_ffmpeg.run.get_or_fetch_platform_executables_else_raise()
+        if ffmpeg_path and os.path.exists(ffmpeg_path):
+            ffmpeg_dir = os.path.dirname(ffmpeg_path)
+            logger.info(f"Using static-ffmpeg: {ffmpeg_dir}")
+            return ffmpeg_dir
+    except ImportError:
+        logger.warning("static-ffmpeg not installed")
+    except Exception as e:
+        logger.warning(f"static-ffmpeg error: {e}")
+    # Fall back to imageio-ffmpeg (only has ffmpeg, not ffprobe)
+    try:
+        import imageio_ffmpeg
+        ffmpeg_path = imageio_ffmpeg.get_ffmpeg_exe()
+        if ffmpeg_path and os.path.exists(ffmpeg_path):
+            logger.warning("Using imageio-ffmpeg (may not have ffprobe)")
+            return os.path.dirname(ffmpeg_path)
+    except ImportError:
+        pass
+    return None
+class SpeechToTextService:
+    """
+    Service for converting speech to text using local Whisper model.
+    The model is lazily loaded on first use to save memory during startup.
+    All processing happens locally - no API costs!
+    """
+    def __init__(self, model_name: str = WHISPER_MODEL):
+        """
+        Initialize the speech-to-text service.
+        Args:
+            model_name: Hugging Face model identifier for Whisper
+        """
+        self.model_name = model_name
+        self._pipe = None  # Lazy-loaded pipeline
+        self._device = "cuda" if torch.cuda.is_available() else "cpu"
+        self._torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
+        logger.info(f"SpeechToTextService initialized (device: {self._device})")
+    def _load_model(self):
+        """
+        Load the Whisper model and processor.
+        Called lazily on first transcription request.
+        """
+        if self._pipe is not None:
+            return
+        logger.info(f"Loading Whisper model: {self.model_name}")
+        logger.info("This may take a few minutes on first run (downloading model)...")
+        try:
+            # Load model with optimizations for CPU/GPU
+            model = AutoModelForSpeechSeq2Seq.from_pretrained(
+                self.model_name,
+                torch_dtype=self._torch_dtype,
+                low_cpu_mem_usage=True,
+                use_safetensors=True
+            )
+            model.to(self._device)
+            # Load processor
+            processor = AutoProcessor.from_pretrained(self.model_name)
+            # Create pipeline for easy inference
+            self._pipe = pipeline(
+                "automatic-speech-recognition",
+                model=model,
+                tokenizer=processor.tokenizer,
+                feature_extractor=processor.feature_extractor,
+                torch_dtype=self._torch_dtype,
+                device=self._device,
+                return_timestamps=False
+            )
+            logger.info("Whisper model loaded successfully!")
+        except Exception as e:
+            logger.error(f"Failed to load Whisper model: {e}")
+            raise Exception(f"Could not load Whisper model: {str(e)}")
+    def extract_audio_from_youtube(self, url: str) -> str:
+        """
+        Extract audio from a YouTube video.
+        Args:
+            url: YouTube video URL
+        Returns:
+            Path to the extracted audio file (WAV format)
+        Raises:
+            Exception: If audio extraction fails
+        """
+        logger.info(f"Extracting audio from: {url}")
+        # Get FFmpeg path (system or imageio-ffmpeg)
+        ffmpeg_path = get_ffmpeg_path()
+        if not ffmpeg_path:
+            raise Exception("FFmpeg not found. Please install FFmpeg or run: pip install imageio-ffmpeg")
+        logger.info(f"Using FFmpeg: {ffmpeg_path}")
+        # Create temporary directory for audio file
+        temp_dir = tempfile.mkdtemp()
+        output_template = os.path.join(temp_dir, "audio.%(ext)s")
+        ydl_opts = {
+            "format": "bestaudio/best",
+            "outtmpl": output_template,
+            "postprocessors": [{
+                "key": "FFmpegExtractAudio",
+                "preferredcodec": AUDIO_FORMAT,
+                "preferredquality": "192",
+            }],
+            "ffmpeg_location": ffmpeg_path,  # yt-dlp needs the directory containing ffmpeg and ffprobe
+            "quiet": True,
+            "no_warnings": True,
+        }
+        try:
+            with yt_dlp.YoutubeDL(ydl_opts) as ydl:
+                ydl.download([url])
+            # Find the extracted audio file
+            audio_path = os.path.join(temp_dir, f"audio.{AUDIO_FORMAT}")
+            if not os.path.exists(audio_path):
+                raise Exception("Audio file was not created")
+            logger.info(f"Audio extracted to: {audio_path}")
+            return audio_path
+        except Exception as e:
+            logger.error(f"Audio extraction failed: {e}")
+            raise Exception(f"Could not extract audio: {str(e)}")
+    def transcribe_audio(self, audio_path: str) -> dict:
+        """
+        Transcribe an audio file using Whisper.
+        Args:
+            audio_path: Path to the audio file
+        Returns:
+            Dictionary with:
+                - text: The transcribed text
+                - language: Detected language code (normalized)
+                - raw_language: Original Whisper language code
+        """
+        # Ensure model is loaded
+        self._load_model()
+        logger.info(f"Transcribing audio: {audio_path}")
+        try:
+            # Run transcription
+            result = self._pipe(
+                audio_path,
+                generate_kwargs={
+                    "task": "transcribe",
+                    "language": None,  # Auto-detect language
+                }
+            )
+            # Extract text
+            text = result.get("text", "").strip()
+            if not text:
+                raise Exception("Transcription produced empty text")
+            # Try to get detected language from the model
+            # Note: Whisper pipeline may not always return language info
+            raw_language = "en"  # Default to English
+            # Normalize the language code
+            language = normalize_whisper_lang(raw_language)
+            logger.info(f"Transcription complete. Language: {language}")
+            return {
+                "text": text,
+                "language": language,
+                "raw_language": raw_language
+            }
+        except Exception as e:
+            logger.error(f"Transcription failed: {e}")
+            raise Exception(f"Could not transcribe audio: {str(e)}")
+    def transcribe_youtube_video(self, url: str) -> dict:
+        """
+        Full pipeline: Extract audio from YouTube and transcribe it.
+        Args:
+            url: YouTube video URL
+        Returns:
+            Dictionary with:
+                - text: The transcribed text
+                - language: Detected language code
+                - word_count: Number of words in transcript
+        """
+        audio_path = None
+        try:
+            # Step 1: Extract audio
+            audio_path = self.extract_audio_from_youtube(url)
+            # Step 2: Transcribe
+            result = self.transcribe_audio(audio_path)
+            # Add word count
+            result["word_count"] = len(result["text"].split())
+            return result
+        finally:
+            # Cleanup: Remove temporary audio file
+            if audio_path and os.path.exists(audio_path):
+                try:
+                    os.remove(audio_path)
+                    # Also remove the parent temp directory
+                    temp_dir = os.path.dirname(audio_path)
+                    if os.path.exists(temp_dir):
+                        os.rmdir(temp_dir)
+                except:
+                    pass  # Ignore cleanup errors
+    def is_model_loaded(self) -> bool:
+        """Check if the Whisper model is currently loaded."""
+        return self._pipe is not None
+    def warmup(self):
+        """
+        Pre-load the model to avoid delay on first request.
+        Call this during application startup if desired.
+        """
+        logger.info("Warming up SpeechToTextService...")
+        self._load_model()
+        logger.info("SpeechToTextService warmup complete!")

services/summarizer.py ADDED Viewed

	@@ -0,0 +1,141 @@

+import os
+from groq import Groq
+from dotenv import load_dotenv
+load_dotenv()
+class SummarizerService:
+    """Service for generating AI-powered summaries using Groq LLaMA"""
+    def __init__(self):
+        api_key = os.getenv("GROQ_API_KEY")
+        if not api_key:
+            raise Exception("GROQ_API_KEY not found in environment variables")
+        self.client = Groq(api_key=api_key.strip())
+    def chunk_text(self, text: str, max_chars: int = 2500) -> list:
+        """
+        Split text into smaller chunks to avoid token limits
+        Args:
+            text: Text to chunk
+            max_chars: Maximum characters per chunk
+        Returns:
+            List of text chunks
+        """
+        words = text.split()
+        chunks = []
+        current_chunk = []
+        current_length = 0
+        for word in words:
+            word_length = len(word) + 1  # +1 for space
+            if current_length + word_length > max_chars and current_chunk:
+                chunks.append(" ".join(current_chunk))
+                current_chunk = [word]
+                current_length = word_length
+            else:
+                current_chunk.append(word)
+                current_length += word_length
+        if current_chunk:
+            chunks.append(" ".join(current_chunk))
+        return chunks
+    def summarize(
+        self,
+        text: str,
+        summary_type: str = "general",
+        chunk_size: int = 2500,
+        max_tokens: int = 500
+    ) -> str:
+        """
+        Summarize text using Groq's LLaMA model with chunking for large texts
+        Args:
+            text: Text to summarize
+            summary_type: Type of summary (general, detailed, bullet_points, key_takeaways)
+            chunk_size: Maximum characters per chunk
+            max_tokens: Maximum tokens for summary generation
+        Returns:
+            Generated summary text
+        """
+        # Check if text is too long and needs chunking
+        if len(text) > 3000:
+            chunks = self.chunk_text(text, max_chars=chunk_size)
+            chunk_summaries = []
+            for i, chunk in enumerate(chunks):
+                try:
+                    # Summarize each chunk
+                    prompt = f"Please provide a concise summary of this part of a video transcript:\n\n{chunk}"
+                    response = self.client.chat.completions.create(
+                        model="llama-3.1-8b-instant",
+                        messages=[
+                            {"role": "user", "content": prompt}
+                        ],
+                        max_tokens=min(300, max_tokens // 2),
+                        temperature=0.1
+                    )
+                    chunk_summaries.append(response.choices[0].message.content)
+                except Exception as e:
+                    raise Exception(f"Error summarizing chunk {i+1}: {str(e)}")
+            # Combine all chunk summaries
+            combined_summary = "\n\n".join(chunk_summaries)
+            # Create final summary from combined chunks
+            final_prompts = {
+                "general": f"Please create a cohesive summary from these section summaries of a video:\n\n{combined_summary}",
+                "detailed": f"Please create a detailed, well-structured summary from these section summaries:\n\n{combined_summary}",
+                "bullet_points": f"Please organize these section summaries into clear bullet points:\n\n{combined_summary}",
+                "key_takeaways": f"Please extract the main insights and key takeaways from these summaries:\n\n{combined_summary}"
+            }
+            try:
+                final_response = self.client.chat.completions.create(
+                    model="llama-3.1-8b-instant",
+                    messages=[
+                        {"role": "user", "content": final_prompts[summary_type]}
+                    ],
+                    max_tokens=max_tokens,
+                    temperature=0.1
+                )
+                return final_response.choices[0].message.content
+            except Exception as e:
+                # If final summary fails, return the combined chunk summaries
+                return combined_summary
+        else:
+            # Original logic for shorter texts
+            prompts = {
+                "general": f"Please provide a clear and concise summary of the following video transcript:\n\n{text}",
+                "detailed": f"Please provide a detailed summary with key points and main topics from the following video transcript:\n\n{text}",
+                "bullet_points": f"Please summarize the following video transcript in bullet points, highlighting the main topics:\n\n{text}",
+                "key_takeaways": f"Please extract the key takeaways and main insights from the following video transcript:\n\n{text}"
+            }
+            try:
+                response = self.client.chat.completions.create(
+                    model="llama-3.1-8b-instant",
+                    messages=[
+                        {"role": "user", "content": prompts[summary_type]}
+                    ],
+                    max_tokens=max_tokens,
+                    temperature=0.1
+                )
+                return response.choices[0].message.content
+            except Exception as e:
+                raise Exception(f"Error generating summary: {str(e)}")

services/transcript.py ADDED Viewed

	@@ -0,0 +1,241 @@

+"""
+Transcript Service for YouTube Videos
+This service extracts transcripts from YouTube videos using multiple methods:
+1. First, try to get existing subtitles/captions (fastest, no model needed)
+2. If no subtitles available, fallback to audio extraction + Whisper transcription
+The fallback uses the SpeechToTextService for local Whisper transcription.
+"""
+import re
+import os
+import tempfile
+import logging
+from typing import Optional, Tuple
+import yt_dlp
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class TranscriptService:
+    """
+    Service for extracting transcripts from YouTube videos.
+    Supports two methods:
+    1. Subtitle extraction (fast, no ML models)
+    2. Audio transcription via Whisper (slower, requires SpeechToTextService)
+    """
+    def __init__(self):
+        """Initialize the transcript service."""
+        self._speech_to_text = None  # Lazy-loaded
+    def _get_speech_to_text_service(self):
+        """Lazy-load the SpeechToTextService to avoid loading Whisper unless needed."""
+        if self._speech_to_text is None:
+            from services.speech_to_text import SpeechToTextService
+            self._speech_to_text = SpeechToTextService()
+        return self._speech_to_text
+    def extract_video_id(self, url: str) -> str:
+        """
+        Extract video ID from YouTube URL.
+        Args:
+            url: YouTube URL in various formats
+        Returns:
+            11-character video ID
+        Raises:
+            ValueError: If URL is invalid
+        """
+        regex = r"(?:v=|\/|youtu\.be\/)([0-9A-Za-z_-]{11}).*"
+        match = re.search(regex, url)
+        if match:
+            return match.group(1)
+        raise ValueError("Invalid YouTube URL")
+    def clean_autogen_transcript(self, text: str) -> str:
+        """
+        Clean auto-generated YouTube captions.
+        Removes:
+        - <c>...</c> tags
+        - Timestamps like <00:00:06.480>
+        - Multiple spaces
+        Args:
+            text: Raw VTT subtitle text
+        Returns:
+            Cleaned transcript text
+        """
+        # Remove <c>...</c> tags
+        text = re.sub(r"</?c>", "", text)
+        # Remove timestamps like <00:00:06.480>
+        text = re.sub(r"<\d{2}:\d{2}:\d{2}\.\d{3}>", "", text)
+        # Collapse multiple spaces
+        text = re.sub(r"\s+", " ", text).strip()
+        return text
+    def get_subtitles(self, url: str, lang: str = "en") -> Optional[dict]:
+        """
+        Try to get existing subtitles from YouTube.
+        Args:
+            url: YouTube video URL
+            lang: Preferred language code (default: "en")
+        Returns:
+            Dictionary with transcript and language, or None if no subtitles
+        """
+        with tempfile.TemporaryDirectory() as temp_dir:
+            ydl_opts = {
+                "skip_download": True,
+                "writesubtitles": True,
+                "writeautomaticsub": True,
+                "subtitlesformat": "vtt",
+                "outtmpl": os.path.join(temp_dir, "%(id)s.%(ext)s"),
+                "quiet": True,
+            }
+            try:
+                with yt_dlp.YoutubeDL(ydl_opts) as ydl:
+                    info = ydl.extract_info(url, download=False)
+                    ydl.download([url])
+                    # Find subtitle file
+                    video_id = info["id"]
+                    sub_file = None
+                    detected_lang = "eng"
+                    for file in os.listdir(temp_dir):
+                        if file.startswith(video_id) and file.endswith(".vtt"):
+                            sub_file = os.path.join(temp_dir, file)
+                            # Try to extract language from filename
+                            # Format: videoId.lang.vtt
+                            parts = file.split(".")
+                            if len(parts) >= 3:
+                                detected_lang = parts[-2]
+                            break
+                    if not sub_file:
+                        logger.info("No subtitle file found")
+                        return None
+                    # Read and clean VTT file
+                    lines = []
+                    with open(sub_file, "r", encoding="utf-8") as f:
+                        for line in f:
+                            line = line.strip()
+                            if not line:
+                                continue
+                            if line.startswith("WEBVTT"):
+                                continue
+                            if "-->" in line:
+                                continue
+                            if re.match(r"^\d+$", line):
+                                continue
+                            lines.append(line)
+                    raw_text = " ".join(lines)
+                    clean_text = self.clean_autogen_transcript(raw_text)
+                    if not clean_text or len(clean_text.strip()) < 50:
+                        logger.info("Extracted subtitles too short")
+                        return None
+                    # Map common language codes
+                    lang_map = {
+                        "en": "eng", "en-US": "eng", "en-GB": "eng",
+                        "hi": "hin", "hi-IN": "hin",
+                        "ta": "tam", "ta-IN": "tam",
+                        "te": "tel", "te-IN": "tel",
+                        "kn": "kan", "kn-IN": "kan",
+                        "ml": "mal", "ml-IN": "mal",
+                        "gu": "guj", "gu-IN": "guj",
+                        "bn": "ben", "bn-IN": "ben",
+                        "mr": "mar", "mr-IN": "mar",
+                        "pa": "pan", "pa-IN": "pan",
+                        "ur": "urd", "ur-PK": "urd",
+                    }
+                    normalized_lang = lang_map.get(detected_lang, detected_lang)
+                    logger.info(f"Subtitles extracted successfully (language: {normalized_lang})")
+                    return {
+                        "transcript": clean_text,
+                        "language": normalized_lang,
+                        "source": "subtitles",
+                        "word_count": len(clean_text.split())
+                    }
+            except Exception as e:
+                logger.warning(f"Subtitle extraction failed: {e}")
+                return None
+    def get_video_transcript(self, url: str, use_whisper_fallback: bool = True) -> dict:
+        """
+        Get transcript from a YouTube video.
+        First tries to get subtitles. If unavailable and use_whisper_fallback is True,
+        falls back to audio extraction and Whisper transcription.
+        Args:
+            url: YouTube video URL
+            use_whisper_fallback: Whether to use Whisper if no subtitles (default: True)
+        Returns:
+            Dictionary with:
+                - transcript: The transcript text
+                - language: Detected/extracted language code
+                - source: "subtitles" or "whisper"
+                - word_count: Number of words
+        Raises:
+            Exception: If transcript cannot be obtained
+        """
+        # Try subtitles first (faster, no model needed)
+        logger.info("Attempting to get subtitles...")
+        result = self.get_subtitles(url)
+        if result:
+            return result
+        # Fallback to Whisper transcription
+        if use_whisper_fallback:
+            logger.info("No subtitles found. Falling back to Whisper transcription...")
+            try:
+                stt_service = self._get_speech_to_text_service()
+                whisper_result = stt_service.transcribe_youtube_video(url)
+                return {
+                    "transcript": whisper_result["text"],
+                    "language": whisper_result["language"],
+                    "source": "whisper",
+                    "word_count": whisper_result["word_count"]
+                }
+            except Exception as e:
+                logger.error(f"Whisper transcription failed: {e}")
+                raise Exception(f"Could not retrieve transcript: {str(e)}")
+        raise Exception("No subtitles available and Whisper fallback is disabled")
+    def get_video_transcript_legacy(self, url: str, lang: str = "en") -> str:
+        """
+        Legacy method for backward compatibility.
+        Returns only the transcript text (no language info).
+        """
+        result = self.get_video_transcript(url, use_whisper_fallback=True)
+        return result["transcript"]

services/translation.py ADDED Viewed

	@@ -0,0 +1,330 @@

+"""
+Translation Service using NLLB-200 (Local Model)
+This service provides LOCAL translation between English and Indian languages.
+NO API CALLS - everything runs on your machine for FREE!
+Supported Languages:
+- English (eng)
+- Hindi (hin)
+- Tamil (tam)
+- Telugu (tel)
+- Kannada (kan)
+- Malayalam (mal)
+- Gujarati (guj)
+- Bengali (ben)
+- Marathi (mar)
+- Punjabi (pan)
+- Urdu (urd)
+Model Used: facebook/nllb-200-distilled-600M (~2.4GB)
+This is the smallest NLLB model, optimized for lower RAM usage.
+"""
+import logging
+from typing import Optional
+import torch
+from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
+from langdetect import detect, LangDetectException
+from config import (
+    NLLB_MODEL,
+    LANGUAGE_MAP,
+    SUPPORTED_LANGUAGES,
+    MAX_TRANSLATION_LENGTH,
+    get_nllb_code,
+    get_language_name,
+    is_english,
+)
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class TranslationService:
+    """
+    Service for translating text between languages using NLLB-200.
+    The model is lazily loaded on first use to save memory during startup.
+    All processing happens locally - no API costs!
+    """
+    def __init__(self, model_name: str = NLLB_MODEL):
+        """
+        Initialize the translation service.
+        Args:
+            model_name: Hugging Face model identifier for NLLB-200
+        """
+        self.model_name = model_name
+        self._model = None
+        self._tokenizer = None
+        self._device = "cuda" if torch.cuda.is_available() else "cpu"
+        logger.info(f"TranslationService initialized (device: {self._device})")
+    def _load_model(self):
+        """
+        Load the NLLB-200 model and tokenizer.
+        Called lazily on first translation request.
+        """
+        if self._model is not None:
+            return
+        logger.info(f"Loading NLLB-200 model: {self.model_name}")
+        logger.info("This may take a few minutes on first run (downloading ~2.4GB model)...")
+        try:
+            # Load tokenizer
+            self._tokenizer = AutoTokenizer.from_pretrained(self.model_name)
+            # Load model with memory optimizations
+            self._model = AutoModelForSeq2SeqLM.from_pretrained(
+                self.model_name,
+                torch_dtype=torch.float32,  # Use float32 for CPU compatibility
+                low_cpu_mem_usage=True
+            )
+            self._model.to(self._device)
+            logger.info("NLLB-200 model loaded successfully!")
+        except Exception as e:
+            logger.error(f"Failed to load NLLB-200 model: {e}")
+            raise Exception(f"Could not load translation model: {str(e)}")
+    def detect_language(self, text: str) -> dict:
+        """
+        Detect the language of the given text.
+        Args:
+            text: Text to detect language for
+        Returns:
+            Dictionary with:
+                - code: Normalized language code (e.g., "hin")
+                - name: Language name (e.g., "Hindi")
+                - confidence: Detection confidence (if available)
+        """
+        try:
+            # Use langdetect library
+            detected = detect(text)
+            # Map to our language codes
+            lang_mapping = {
+                "en": "eng",
+                "hi": "hin",
+                "ta": "tam",
+                "te": "tel",
+                "kn": "kan",
+                "ml": "mal",
+                "gu": "guj",
+                "bn": "ben",
+                "mr": "mar",
+                "pa": "pan",
+                "ur": "urd",
+            }
+            code = lang_mapping.get(detected, detected)
+            name = get_language_name(code)
+            logger.info(f"Detected language: {name} ({code})")
+            return {
+                "code": code,
+                "name": name,
+                "raw_code": detected
+            }
+        except LangDetectException as e:
+            logger.warning(f"Language detection failed: {e}")
+            # Default to English if detection fails
+            return {
+                "code": "eng",
+                "name": "English",
+                "raw_code": "en"
+            }
+    def translate(
+        self,
+        text: str,
+        source_lang: str,
+        target_lang: str,
+        max_length: int = 1024
+    ) -> str:
+        """
+        Translate text from source language to target language.
+        Args:
+            text: Text to translate
+            source_lang: Source language code (e.g., "hin", "eng")
+            target_lang: Target language code (e.g., "eng", "tam")
+            max_length: Maximum output length
+        Returns:
+            Translated text
+        Raises:
+            ValueError: If language codes are invalid
+            Exception: If translation fails
+        """
+        # Ensure model is loaded
+        self._load_model()
+        # Validate and get NLLB codes
+        try:
+            source_nllb = get_nllb_code(source_lang)
+            target_nllb = get_nllb_code(target_lang)
+        except ValueError as e:
+            raise ValueError(str(e))
+        logger.info(f"Translating from {source_lang} to {target_lang}")
+        # Handle long texts by chunking
+        if len(text) > MAX_TRANSLATION_LENGTH:
+            logger.info(f"Text too long ({len(text)} chars), chunking...")
+            return self._translate_long_text(text, source_lang, target_lang, max_length)
+        try:
+            # Set source language for tokenizer
+            self._tokenizer.src_lang = source_nllb
+            # Tokenize input
+            inputs = self._tokenizer(
+                text,
+                return_tensors="pt",
+                padding=True,
+                truncation=True,
+                max_length=max_length
+            )
+            inputs = {k: v.to(self._device) for k, v in inputs.items()}
+            # Get target language token ID
+            forced_bos_token_id = self._tokenizer.convert_tokens_to_ids(target_nllb)
+            # Generate translation
+            with torch.no_grad():
+                outputs = self._model.generate(
+                    **inputs,
+                    forced_bos_token_id=forced_bos_token_id,
+                    max_length=max_length,
+                    num_beams=5,
+                    length_penalty=1.0,
+                    early_stopping=True
+                )
+            # Decode output
+            translated = self._tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
+            logger.info(f"Translation complete ({len(translated)} chars)")
+            return translated.strip()
+        except Exception as e:
+            logger.error(f"Translation failed: {e}")
+            raise Exception(f"Could not translate text: {str(e)}")
+    def _translate_long_text(
+        self,
+        text: str,
+        source_lang: str,
+        target_lang: str,
+        max_length: int = 1024
+    ) -> str:
+        """
+        Translate long text by splitting into chunks.
+        Args:
+            text: Long text to translate
+            source_lang: Source language code
+            target_lang: Target language code
+            max_length: Maximum output length per chunk
+        Returns:
+            Concatenated translated text
+        """
+        # Split text into sentences (rough approximation)
+        sentences = text.replace("।", ".").replace("॥", ".").split(".")
+        chunks = []
+        current_chunk = ""
+        for sentence in sentences:
+            sentence = sentence.strip()
+            if not sentence:
+                continue
+            # Check if adding this sentence would exceed limit
+            if len(current_chunk) + len(sentence) + 2 > MAX_TRANSLATION_LENGTH:
+                if current_chunk:
+                    chunks.append(current_chunk)
+                current_chunk = sentence
+            else:
+                current_chunk = current_chunk + ". " + sentence if current_chunk else sentence
+        if current_chunk:
+            chunks.append(current_chunk)
+        # Translate each chunk
+        translated_chunks = []
+        for i, chunk in enumerate(chunks):
+            logger.info(f"Translating chunk {i+1}/{len(chunks)}")
+            translated = self.translate(chunk, source_lang, target_lang, max_length)
+            translated_chunks.append(translated)
+        return " ".join(translated_chunks)
+    def translate_to_english(self, text: str, source_lang: str) -> str:
+        """
+        Convenience method to translate text to English.
+        Args:
+            text: Text to translate
+            source_lang: Source language code
+        Returns:
+            English translation
+        """
+        if is_english(source_lang):
+            return text  # Already English
+        return self.translate(text, source_lang, "eng")
+    def translate_from_english(self, text: str, target_lang: str) -> str:
+        """
+        Convenience method to translate English text to another language.
+        Args:
+            text: English text to translate
+            target_lang: Target language code
+        Returns:
+            Translated text in target language
+        """
+        if is_english(target_lang):
+            return text  # Already English
+        return self.translate(text, "eng", target_lang)
+    def get_supported_languages(self) -> list:
+        """
+        Get list of supported languages.
+        Returns:
+            List of language dictionaries with code, name, and nllb_code
+        """
+        return SUPPORTED_LANGUAGES.copy()
+    def is_model_loaded(self) -> bool:
+        """Check if the NLLB model is currently loaded."""
+        return self._model is not None
+    def warmup(self):
+        """
+        Pre-load the model to avoid delay on first request.
+        Call this during application startup if desired.
+        """
+        logger.info("Warming up TranslationService...")
+        self._load_model()
+        logger.info("TranslationService warmup complete!")