bskrishna2006 commited on
Commit
dfbb2da
·
0 Parent(s):

Initial backend deployment

Browse files
.gitignore ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ env/
8
+ venv/
9
+ .venv/
10
+
11
+ # Environment
12
+ .env
13
+ .env.local
14
+
15
+ # IDE
16
+ .vscode/
17
+ .idea/
18
+ *.swp
19
+
20
+ # Logs
21
+ *.log
22
+
23
+ # Temp files
24
+ temp/
25
+ *.tmp
26
+
27
+ # Models cache (will be in container)
28
+ .cache/
DEPLOY.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚀 Deploying to Hugging Face Spaces
2
+
3
+ This guide will help you deploy the YouTube Summarizer API to Hugging Face Spaces for FREE cloud hosting.
4
+
5
+ ## Prerequisites
6
+
7
+ 1. A [Hugging Face account](https://huggingface.co/join)
8
+ 2. Git installed on your system
9
+ 3. Your Groq API key (from https://console.groq.com)
10
+
11
+ ---
12
+
13
+ ## Step 1: Create a Hugging Face Space
14
+
15
+ 1. Go to [huggingface.co/new-space](https://huggingface.co/new-space)
16
+
17
+ 2. Fill in the form:
18
+ - **Owner**: Select your username
19
+ - **Space name**: `youtube-summarizer-api`
20
+ - **License**: MIT
21
+ - **SDK**: Select **Docker**
22
+ - **Hardware**: CPU basic (Free)
23
+ - Leave other options as default
24
+
25
+ 3. Click **"Create Space"**
26
+
27
+ ---
28
+
29
+ ## Step 2: Clone and Push
30
+
31
+ Open PowerShell/Terminal and run:
32
+
33
+ ```powershell
34
+ # Navigate to the deploy folder
35
+ cd "c:\Users\Krishna\Desktop\Updated Yt summarizer\backend\deploy"
36
+
37
+ # Initialize git repository
38
+ git init
39
+
40
+ # Add Hugging Face as remote (replace YOUR_USERNAME with your HF username)
41
+ git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/youtube-summarizer-api
42
+
43
+ # Add all files
44
+ git add .
45
+
46
+ # Commit
47
+ git commit -m "Initial deployment"
48
+
49
+ # Push to Hugging Face (you'll be prompted for credentials)
50
+ git push -u origin main
51
+ ```
52
+
53
+ **For authentication**, you'll need to use:
54
+ - Username: Your Hugging Face username
55
+ - Password: Your Hugging Face Access Token (create one at Settings → Access Tokens)
56
+
57
+ ---
58
+
59
+ ## Step 3: Add Your Groq API Key
60
+
61
+ 1. Go to your Space: `https://huggingface.co/spaces/YOUR_USERNAME/youtube-summarizer-api`
62
+
63
+ 2. Click **Settings** (gear icon)
64
+
65
+ 3. Scroll to **Variables and secrets**
66
+
67
+ 4. Click **"New secret"** and add:
68
+ - **Name**: `GROQ_API_KEY`
69
+ - **Value**: Your Groq API key
70
+
71
+ 5. Click **Save**
72
+
73
+ ---
74
+
75
+ ## Step 4: Wait for Build
76
+
77
+ The first build takes **10-15 minutes** because it:
78
+ 1. Builds the Docker image
79
+ 2. Installs all dependencies
80
+ 3. Sets up the environment
81
+
82
+ You can watch the build progress in the "Logs" tab of your Space.
83
+
84
+ ---
85
+
86
+ ## Step 5: Test Your API
87
+
88
+ Once the status shows **"Running"**, your API is live!
89
+
90
+ ### Test health check:
91
+ ```bash
92
+ curl https://YOUR_USERNAME-youtube-summarizer-api.hf.space/api/health
93
+ ```
94
+
95
+ ### Test full pipeline:
96
+ ```bash
97
+ curl -X POST https://YOUR_USERNAME-youtube-summarizer-api.hf.space/api/process \
98
+ -H "Content-Type: application/json" \
99
+ -d '{"url": "https://www.youtube.com/watch?v=jNQXAC9IVRw", "summary_type": "general"}'
100
+ ```
101
+
102
+ ---
103
+
104
+ ## Step 6: Update Your Frontend
105
+
106
+ Update your frontend `.env` file:
107
+
108
+ ```env
109
+ VITE_API_URL=https://YOUR_USERNAME-youtube-summarizer-api.hf.space
110
+ ```
111
+
112
+ Then restart your frontend dev server.
113
+
114
+ ---
115
+
116
+ ## Troubleshooting
117
+
118
+ ### Build Failed?
119
+ - Check the "Logs" tab for error messages
120
+ - Make sure all files are properly committed
121
+
122
+ ### API Not Responding?
123
+ - The Space may be sleeping (wakes up on first request, takes ~30s)
124
+ - Check if GROQ_API_KEY secret is set
125
+
126
+ ### Out of Memory?
127
+ - The free tier has 16GB RAM, which should be enough
128
+ - Consider upgrading to paid tier if needed
129
+
130
+ ---
131
+
132
+ ## API Endpoints Summary
133
+
134
+ | Endpoint | Method | Description |
135
+ |----------|--------|-------------|
136
+ | `/` | GET | Health check |
137
+ | `/api/health` | GET | Detailed status |
138
+ | `/api/languages` | GET | Supported languages |
139
+ | `/api/transcript` | POST | Extract transcript |
140
+ | `/api/translate` | POST | Translate text |
141
+ | `/api/summarize` | POST | Generate summary |
142
+ | `/api/process` | POST | Full pipeline |
143
+
144
+ ---
145
+
146
+ ## Cost
147
+
148
+ | Tier | Cost | RAM | GPU |
149
+ |------|------|-----|-----|
150
+ | **Free** | $0 | 16GB | CPU only |
151
+ | Upgraded | $0.60/hr | 16GB | GPU |
152
+
153
+ The free tier is sufficient for this application!
154
+
155
+ ---
156
+
157
+ ## Need Help?
158
+
159
+ - Hugging Face Docs: https://huggingface.co/docs/hub/spaces
160
+ - Docker Spaces: https://huggingface.co/docs/hub/spaces-sdks-docker
Dockerfile ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use Python 3.10 slim image for smaller size
2
+ FROM python:3.10-slim
3
+
4
+ # Set environment variables
5
+ ENV PYTHONDONTWRITEBYTECODE=1
6
+ ENV PYTHONUNBUFFERED=1
7
+ ENV TRANSFORMERS_CACHE=/app/.cache
8
+ ENV HF_HOME=/app/.cache
9
+
10
+ # Install system dependencies
11
+ RUN apt-get update && apt-get install -y --no-install-recommends \
12
+ ffmpeg \
13
+ git \
14
+ && rm -rf /var/lib/apt/lists/*
15
+
16
+ # Create non-root user for security
17
+ RUN useradd -m -u 1000 appuser
18
+
19
+ # Set working directory
20
+ WORKDIR /app
21
+
22
+ # Copy requirements first (for Docker layer caching)
23
+ COPY requirements.txt .
24
+
25
+ # Install Python dependencies
26
+ RUN pip install --no-cache-dir --upgrade pip && \
27
+ pip install --no-cache-dir -r requirements.txt
28
+
29
+ # Copy application code
30
+ COPY . .
31
+
32
+ # Create cache directory with proper permissions
33
+ RUN mkdir -p /app/.cache && chown -R appuser:appuser /app
34
+
35
+ # Switch to non-root user
36
+ USER appuser
37
+
38
+ # Expose port (Hugging Face Spaces uses 7860)
39
+ EXPOSE 7860
40
+
41
+ # Health check
42
+ HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
43
+ CMD python -c "import requests; requests.get('http://localhost:7860/api/health')" || exit 1
44
+
45
+ # Run with gunicorn for production
46
+ # - timeout 600s for long model loading times
47
+ # - workers 1 to save memory (models are heavy)
48
+ # - threads 4 for concurrent requests
49
+ CMD ["gunicorn", "--bind", "0.0.0.0:7860", "--timeout", "600", "--workers", "1", "--threads", "4", "app:app"]
README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: YouTube Summarizer API
3
+ emoji: 🎬
4
+ colorFrom: purple
5
+ colorTo: blue
6
+ sdk: docker
7
+ app_port: 7860
8
+ ---
9
+
10
+ # YouTube Video Summarizer API
11
+
12
+ A multilingual Flask API for summarizing YouTube videos using AI.
13
+
14
+ ## Features
15
+ - 🎤 **Speech-to-Text**: Whisper for videos without subtitles
16
+ - 🌐 **11 Languages**: English + 10 Indian languages
17
+ - 🔄 **Translation**: NLLB-200 for multilingual support
18
+ - 🤖 **AI Summarization**: Groq LLaMA 3.1
19
+
20
+ ## API Endpoints
21
+
22
+ | Method | Endpoint | Description |
23
+ |--------|----------|-------------|
24
+ | GET | `/` | Health check |
25
+ | GET | `/api/health` | API status |
26
+ | GET | `/api/languages` | Supported languages |
27
+ | POST | `/api/transcript` | Extract transcript |
28
+ | POST | `/api/translate` | Translate text |
29
+ | POST | `/api/summarize` | Generate summary |
30
+ | POST | `/api/process` | Full pipeline |
31
+
32
+ ## Usage
33
+
34
+ ```bash
35
+ curl -X POST https://YOUR-SPACE.hf.space/api/process \
36
+ -H "Content-Type: application/json" \
37
+ -d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "summary_type": "bullet_points"}'
38
+ ```
39
+
40
+ ## Models Used
41
+ - **Whisper**: openai/whisper-small (~500MB)
42
+ - **NLLB-200**: facebook/nllb-200-distilled-600M (~2.4GB)
43
+ - **Summarization**: Groq API (LLaMA 3.1)
app.py ADDED
@@ -0,0 +1,459 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ YouTube Video Summarizer API - Hugging Face Spaces Edition
3
+
4
+ Flask backend deployed on Hugging Face Spaces.
5
+ Provides multilingual YouTube video summarization using:
6
+ - Whisper (speech-to-text)
7
+ - NLLB-200 (translation)
8
+ - Groq API (summarization)
9
+
10
+ All ML models are FREE and run locally on HF Spaces infrastructure.
11
+ """
12
+
13
+ from flask import Flask, request, jsonify
14
+ from flask_cors import CORS
15
+ from dotenv import load_dotenv
16
+ import os
17
+ import logging
18
+
19
+ from services.transcript import TranscriptService
20
+ from services.summarizer import SummarizerService
21
+ from config import (
22
+ SUPPORTED_LANGUAGES,
23
+ get_language_name,
24
+ is_english,
25
+ )
26
+
27
+ # Load environment variables
28
+ load_dotenv()
29
+
30
+ # Configure logging
31
+ logging.basicConfig(level=logging.INFO)
32
+ logger = logging.getLogger(__name__)
33
+
34
+ app = Flask(__name__)
35
+
36
+ # Enable CORS for all origins (allow frontend from any domain)
37
+ CORS(app, resources={
38
+ r"/*": {
39
+ "origins": "*",
40
+ "methods": ["GET", "POST", "OPTIONS"],
41
+ "allow_headers": ["Content-Type", "Authorization"]
42
+ }
43
+ })
44
+
45
+ # Initialize services (lazy-loaded for heavy models)
46
+ transcript_service = TranscriptService()
47
+ summarizer_service = SummarizerService()
48
+
49
+ # Translation service is lazy-loaded to avoid loading 2.4GB model on startup
50
+ _translation_service = None
51
+
52
+ def get_translation_service():
53
+ """Lazy-load the translation service."""
54
+ global _translation_service
55
+ if _translation_service is None:
56
+ from services.translation import TranslationService
57
+ _translation_service = TranslationService()
58
+ return _translation_service
59
+
60
+
61
+ # =============================================================================
62
+ # ROOT & HEALTH ENDPOINTS
63
+ # =============================================================================
64
+
65
+ @app.route('/', methods=['GET'])
66
+ def root():
67
+ """Root endpoint - serves as health check for HF Spaces"""
68
+ return jsonify({
69
+ 'status': 'healthy',
70
+ 'service': 'YouTube Summarizer API',
71
+ 'version': '2.0.0',
72
+ 'docs': '/api/health for detailed status'
73
+ }), 200
74
+
75
+
76
+ @app.route('/api/health', methods=['GET'])
77
+ def health_check():
78
+ """Detailed health check endpoint"""
79
+ return jsonify({
80
+ 'status': 'healthy',
81
+ 'message': 'YouTube Summarizer API is running on Hugging Face Spaces',
82
+ 'version': '2.0.0',
83
+ 'features': ['multilingual', 'whisper', 'translation'],
84
+ 'models': {
85
+ 'whisper': 'openai/whisper-small',
86
+ 'translation': 'facebook/nllb-200-distilled-600M',
87
+ 'summarization': 'groq/llama-3.1-8b-instant'
88
+ }
89
+ }), 200
90
+
91
+
92
+ @app.route('/api/languages', methods=['GET'])
93
+ def get_languages():
94
+ """Get list of supported languages"""
95
+ return jsonify({
96
+ 'success': True,
97
+ 'languages': SUPPORTED_LANGUAGES
98
+ }), 200
99
+
100
+
101
+ @app.route('/api/warmup', methods=['POST'])
102
+ def warmup_models():
103
+ """
104
+ Pre-load ML models to avoid delay on first request.
105
+ This can take 2-5 minutes on first run (downloading models).
106
+ """
107
+ try:
108
+ results = {}
109
+ data = request.get_json() or {}
110
+
111
+ if data.get('translation', False):
112
+ logger.info("Warming up translation model...")
113
+ translation_service = get_translation_service()
114
+ translation_service.warmup()
115
+ results['translation'] = 'loaded'
116
+
117
+ if data.get('whisper', False):
118
+ logger.info("Warming up Whisper model...")
119
+ from services.speech_to_text import SpeechToTextService
120
+ stt = SpeechToTextService()
121
+ stt.warmup()
122
+ results['whisper'] = 'loaded'
123
+
124
+ return jsonify({
125
+ 'success': True,
126
+ 'message': 'Models warmed up successfully',
127
+ 'models': results
128
+ }), 200
129
+
130
+ except Exception as e:
131
+ logger.error(f"Warmup failed: {e}")
132
+ return jsonify({
133
+ 'error': 'Warmup failed',
134
+ 'message': str(e)
135
+ }), 500
136
+
137
+
138
+ # =============================================================================
139
+ # TRANSCRIPT ENDPOINTS
140
+ # =============================================================================
141
+
142
+ @app.route('/api/transcript', methods=['POST'])
143
+ def get_transcript():
144
+ """
145
+ Extract transcript from YouTube video (multilingual).
146
+
147
+ Request: { "url": "youtube_url", "use_whisper": true }
148
+ Response: { "success": true, "transcript": "...", "language": "tam", ... }
149
+ """
150
+ try:
151
+ data = request.get_json()
152
+
153
+ if not data or 'url' not in data:
154
+ return jsonify({
155
+ 'error': 'Missing YouTube URL',
156
+ 'message': 'Please provide a valid YouTube URL'
157
+ }), 400
158
+
159
+ url = data['url']
160
+ use_whisper = data.get('use_whisper', True)
161
+
162
+ video_id = transcript_service.extract_video_id(url)
163
+ result = transcript_service.get_video_transcript(url, use_whisper_fallback=use_whisper)
164
+
165
+ return jsonify({
166
+ 'success': True,
167
+ 'video_id': video_id,
168
+ 'transcript': result['transcript'],
169
+ 'language': result['language'],
170
+ 'language_name': get_language_name(result['language']),
171
+ 'source': result['source'],
172
+ 'word_count': result['word_count']
173
+ }), 200
174
+
175
+ except ValueError as e:
176
+ return jsonify({'error': 'Invalid URL', 'message': str(e)}), 400
177
+ except Exception as e:
178
+ logger.error(f"Transcript extraction failed: {e}")
179
+ return jsonify({'error': 'Transcript extraction failed', 'message': str(e)}), 500
180
+
181
+
182
+ # =============================================================================
183
+ # TRANSLATION ENDPOINTS
184
+ # =============================================================================
185
+
186
+ @app.route('/api/translate', methods=['POST'])
187
+ def translate_text():
188
+ """
189
+ Translate text between languages.
190
+
191
+ Request: { "text": "Hello", "source_lang": "eng", "target_lang": "hin" }
192
+ Response: { "success": true, "translated_text": "नमस्ते", ... }
193
+ """
194
+ try:
195
+ data = request.get_json()
196
+
197
+ if not data or 'text' not in data:
198
+ return jsonify({
199
+ 'error': 'Missing text',
200
+ 'message': 'Please provide text to translate'
201
+ }), 400
202
+
203
+ text = data['text']
204
+ source_lang = data.get('source_lang', 'eng')
205
+ target_lang = data.get('target_lang', 'hin')
206
+
207
+ translation_service = get_translation_service()
208
+ translated = translation_service.translate(text, source_lang, target_lang)
209
+
210
+ return jsonify({
211
+ 'success': True,
212
+ 'translated_text': translated,
213
+ 'source_lang': source_lang,
214
+ 'source_lang_name': get_language_name(source_lang),
215
+ 'target_lang': target_lang,
216
+ 'target_lang_name': get_language_name(target_lang)
217
+ }), 200
218
+
219
+ except ValueError as e:
220
+ return jsonify({'error': 'Invalid language', 'message': str(e)}), 400
221
+ except Exception as e:
222
+ logger.error(f"Translation failed: {e}")
223
+ return jsonify({'error': 'Translation failed', 'message': str(e)}), 500
224
+
225
+
226
+ @app.route('/api/detect-language', methods=['POST'])
227
+ def detect_language():
228
+ """Detect the language of given text."""
229
+ try:
230
+ data = request.get_json()
231
+
232
+ if not data or 'text' not in data:
233
+ return jsonify({
234
+ 'error': 'Missing text',
235
+ 'message': 'Please provide text for language detection'
236
+ }), 400
237
+
238
+ translation_service = get_translation_service()
239
+ result = translation_service.detect_language(data['text'])
240
+
241
+ return jsonify({
242
+ 'success': True,
243
+ 'language': result['code'],
244
+ 'language_name': result['name']
245
+ }), 200
246
+
247
+ except Exception as e:
248
+ logger.error(f"Language detection failed: {e}")
249
+ return jsonify({'error': 'Language detection failed', 'message': str(e)}), 500
250
+
251
+
252
+ # =============================================================================
253
+ # SUMMARIZATION ENDPOINTS
254
+ # =============================================================================
255
+
256
+ @app.route('/api/summarize', methods=['POST'])
257
+ def summarize():
258
+ """
259
+ Generate summary from transcript.
260
+
261
+ Request: { "transcript": "...", "summary_type": "general" }
262
+ Response: { "success": true, "summary": "...", "statistics": {...} }
263
+ """
264
+ try:
265
+ data = request.get_json()
266
+
267
+ if not data or 'transcript' not in data:
268
+ return jsonify({
269
+ 'error': 'Missing transcript',
270
+ 'message': 'Please provide transcript text'
271
+ }), 400
272
+
273
+ transcript = data['transcript']
274
+ summary_type = data.get('summary_type', 'general')
275
+ chunk_size = data.get('chunk_size', 2500)
276
+ max_tokens = data.get('max_tokens', 500)
277
+
278
+ valid_types = ['general', 'detailed', 'bullet_points', 'key_takeaways']
279
+ if summary_type not in valid_types:
280
+ return jsonify({
281
+ 'error': 'Invalid summary type',
282
+ 'message': f'Must be one of: {", ".join(valid_types)}'
283
+ }), 400
284
+
285
+ summary = summarizer_service.summarize(
286
+ text=transcript,
287
+ summary_type=summary_type,
288
+ chunk_size=chunk_size,
289
+ max_tokens=max_tokens
290
+ )
291
+
292
+ summary_word_count = len(summary.split())
293
+ original_word_count = len(transcript.split())
294
+ compression_ratio = (summary_word_count / original_word_count) * 100 if original_word_count > 0 else 0
295
+
296
+ return jsonify({
297
+ 'success': True,
298
+ 'summary': summary,
299
+ 'statistics': {
300
+ 'original_word_count': original_word_count,
301
+ 'summary_word_count': summary_word_count,
302
+ 'compression_ratio': round(compression_ratio, 1),
303
+ 'reading_time_minutes': max(1, summary_word_count // 200)
304
+ }
305
+ }), 200
306
+
307
+ except Exception as e:
308
+ logger.error(f"Summarization failed: {e}")
309
+ return jsonify({'error': 'Summarization failed', 'message': str(e)}), 500
310
+
311
+
312
+ # =============================================================================
313
+ # FULL PIPELINE ENDPOINT
314
+ # =============================================================================
315
+
316
+ @app.route('/api/process', methods=['POST'])
317
+ def process_video():
318
+ """
319
+ Full multilingual pipeline: Transcript → Translation → Summary → Translation
320
+
321
+ Request: {
322
+ "url": "youtube_url",
323
+ "summary_type": "general",
324
+ "target_language": "hin" (optional)
325
+ }
326
+ """
327
+ try:
328
+ data = request.get_json()
329
+
330
+ if not data or 'url' not in data:
331
+ return jsonify({
332
+ 'error': 'Missing YouTube URL',
333
+ 'message': 'Please provide a valid YouTube URL'
334
+ }), 400
335
+
336
+ url = data['url']
337
+ summary_type = data.get('summary_type', 'general')
338
+ target_language = data.get('target_language', 'eng')
339
+ chunk_size = data.get('chunk_size', 2500)
340
+ max_tokens = data.get('max_tokens', 500)
341
+
342
+ # Step 1: Extract video ID
343
+ video_id = transcript_service.extract_video_id(url)
344
+ logger.info(f"Processing video: {video_id}")
345
+
346
+ # Step 2: Get transcript with language
347
+ logger.info("Step 1/4: Extracting transcript...")
348
+ transcript_result = transcript_service.get_video_transcript(url, use_whisper_fallback=True)
349
+
350
+ original_transcript = transcript_result['transcript']
351
+ original_language = transcript_result['language']
352
+ original_word_count = transcript_result['word_count']
353
+
354
+ # Step 3: Translate to English if needed
355
+ english_transcript = original_transcript
356
+
357
+ if not is_english(original_language):
358
+ logger.info("Step 2/4: Translating to English...")
359
+ translation_service = get_translation_service()
360
+ english_transcript = translation_service.translate_to_english(
361
+ original_transcript,
362
+ original_language
363
+ )
364
+ else:
365
+ logger.info("Step 2/4: Skipped (already English)")
366
+
367
+ # Step 4: Summarize in English
368
+ logger.info("Step 3/4: Generating summary...")
369
+ summary = summarizer_service.summarize(
370
+ text=english_transcript,
371
+ summary_type=summary_type,
372
+ chunk_size=chunk_size,
373
+ max_tokens=max_tokens
374
+ )
375
+
376
+ # Step 5: Translate summary to target language
377
+ final_summary = summary
378
+ summary_language = "eng"
379
+
380
+ if not is_english(target_language):
381
+ logger.info(f"Step 4/4: Translating summary to {target_language}...")
382
+ translation_service = get_translation_service()
383
+ final_summary = translation_service.translate_from_english(summary, target_language)
384
+ summary_language = target_language
385
+ else:
386
+ logger.info("Step 4/4: Skipped (English output)")
387
+
388
+ # Calculate statistics
389
+ summary_word_count = len(final_summary.split())
390
+ compression_ratio = (summary_word_count / original_word_count) * 100 if original_word_count > 0 else 0
391
+
392
+ response = {
393
+ 'success': True,
394
+ 'video_id': video_id,
395
+ 'original_language': original_language,
396
+ 'original_language_name': get_language_name(original_language),
397
+ 'transcript': original_transcript,
398
+ 'transcript_source': transcript_result['source'],
399
+ 'summary': final_summary,
400
+ 'summary_language': summary_language,
401
+ 'summary_language_name': get_language_name(summary_language),
402
+ 'statistics': {
403
+ 'original_word_count': original_word_count,
404
+ 'summary_word_count': summary_word_count,
405
+ 'compression_ratio': round(compression_ratio, 1),
406
+ 'reading_time_minutes': max(1, summary_word_count // 200)
407
+ }
408
+ }
409
+
410
+ if not is_english(original_language):
411
+ response['english_transcript'] = english_transcript
412
+ if not is_english(target_language):
413
+ response['english_summary'] = summary
414
+
415
+ logger.info("Processing complete!")
416
+ return jsonify(response), 200
417
+
418
+ except ValueError as e:
419
+ return jsonify({'error': 'Invalid URL', 'message': str(e)}), 400
420
+ except Exception as e:
421
+ logger.error(f"Processing failed: {e}")
422
+ return jsonify({'error': 'Processing failed', 'message': str(e)}), 500
423
+
424
+
425
+ # =============================================================================
426
+ # ERROR HANDLERS
427
+ # =============================================================================
428
+
429
+ @app.errorhandler(404)
430
+ def not_found(error):
431
+ return jsonify({
432
+ 'error': 'Not found',
433
+ 'message': 'The requested endpoint does not exist'
434
+ }), 404
435
+
436
+
437
+ @app.errorhandler(500)
438
+ def internal_error(error):
439
+ return jsonify({
440
+ 'error': 'Internal server error',
441
+ 'message': 'An unexpected error occurred'
442
+ }), 500
443
+
444
+
445
+ # =============================================================================
446
+ # MAIN (for local testing only - gunicorn is used in production)
447
+ # =============================================================================
448
+
449
+ if __name__ == '__main__':
450
+ port = int(os.environ.get('PORT', 7860))
451
+
452
+ if not os.getenv('GROQ_API_KEY'):
453
+ print("⚠️ Warning: GROQ_API_KEY not found")
454
+ print("Set it in HF Spaces Settings → Secrets")
455
+
456
+ print("🚀 Starting YouTube Summarizer API...")
457
+ print(f"📡 API available at: http://localhost:{port}")
458
+
459
+ app.run(debug=False, host='0.0.0.0', port=port)
config.py ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Configuration module for multilingual YouTube summarizer.
3
+ Contains model names, language mappings, and settings.
4
+
5
+ All models used are FREE and run LOCALLY - no API costs!
6
+ """
7
+
8
+ import os
9
+
10
+ # =============================================================================
11
+ # MODEL CONFIGURATION
12
+ # =============================================================================
13
+
14
+ # Whisper model for speech-to-text (runs locally)
15
+ # Options: "openai/whisper-tiny", "openai/whisper-small", "openai/whisper-medium"
16
+ # Smaller = faster but less accurate, larger = slower but more accurate
17
+ WHISPER_MODEL = "openai/whisper-small"
18
+
19
+ # NLLB-200 model for translation (runs locally)
20
+ # Using distilled version for lower RAM usage (~2.4GB)
21
+ NLLB_MODEL = "facebook/nllb-200-distilled-600M"
22
+
23
+ # Groq model for summarization (free API)
24
+ GROQ_MODEL = "llama-3.1-8b-instant"
25
+
26
+ # =============================================================================
27
+ # LANGUAGE CONFIGURATION
28
+ # =============================================================================
29
+
30
+ # Mapping from simple language codes to NLLB-200 language codes
31
+ # NLLB uses format: language_Script (e.g., hin_Deva for Hindi in Devanagari)
32
+ LANGUAGE_MAP = {
33
+ # English (including regional variants)
34
+ "eng": {"nllb": "eng_Latn", "name": "English", "script": "Latin"},
35
+ "en": {"nllb": "eng_Latn", "name": "English", "script": "Latin"},
36
+ "en-in": {"nllb": "eng_Latn", "name": "English", "script": "Latin"},
37
+ "en-us": {"nllb": "eng_Latn", "name": "English", "script": "Latin"},
38
+ "en-gb": {"nllb": "eng_Latn", "name": "English", "script": "Latin"},
39
+ "en-au": {"nllb": "eng_Latn", "name": "English", "script": "Latin"},
40
+ "english": {"nllb": "eng_Latn", "name": "English", "script": "Latin"},
41
+
42
+ # Hindi (including regional variants)
43
+ "hin": {"nllb": "hin_Deva", "name": "Hindi", "script": "Devanagari"},
44
+ "hi": {"nllb": "hin_Deva", "name": "Hindi", "script": "Devanagari"},
45
+ "hi-in": {"nllb": "hin_Deva", "name": "Hindi", "script": "Devanagari"},
46
+
47
+ # Tamil
48
+ "tam": {"nllb": "tam_Taml", "name": "Tamil", "script": "Tamil"},
49
+ "ta": {"nllb": "tam_Taml", "name": "Tamil", "script": "Tamil"},
50
+ "ta-in": {"nllb": "tam_Taml", "name": "Tamil", "script": "Tamil"},
51
+
52
+ # Telugu
53
+ "tel": {"nllb": "tel_Telu", "name": "Telugu", "script": "Telugu"},
54
+ "te": {"nllb": "tel_Telu", "name": "Telugu", "script": "Telugu"},
55
+ "te-in": {"nllb": "tel_Telu", "name": "Telugu", "script": "Telugu"},
56
+
57
+ # Kannada
58
+ "kan": {"nllb": "kan_Knda", "name": "Kannada", "script": "Kannada"},
59
+ "kn": {"nllb": "kan_Knda", "name": "Kannada", "script": "Kannada"},
60
+ "kn-in": {"nllb": "kan_Knda", "name": "Kannada", "script": "Kannada"},
61
+
62
+ # Malayalam
63
+ "mal": {"nllb": "mal_Mlym", "name": "Malayalam", "script": "Malayalam"},
64
+ "ml": {"nllb": "mal_Mlym", "name": "Malayalam", "script": "Malayalam"},
65
+ "ml-in": {"nllb": "mal_Mlym", "name": "Malayalam", "script": "Malayalam"},
66
+ "saahjz": {"nllb": "saahjz_Deva", "name": "Sahaj", "script": "Devanagari"}, xxxc b
67
+ # Bengali
68
+ "ben": {"nllb": "ben_Beng", "name": "Bengali", "script": "Bengali"},
69
+ "bn": {"nllb": "ben_Beng", "name": "Bengali", "script": "Bengali"},
70
+ "bn-in": {"nllb": "ben_Beng", "name": "Bengali", "script": "Bengali"},
71
+ "bn-bd": {"nllb": "ben_Beng", "name": "Bengali", "script": "Bengali"},
72
+
73
+ # Marathi
74
+ "mar": {"nllb": "mar_Deva", "name": "Marathi", "script": "Devanagari"},
75
+ "mr": {"nllb": "mar_Deva", "name": "Marathi", "script": "Devanagari"},
76
+ "mr-in": {"nllb": "mar_Deva", "name": "Marathi", "script": "Devanagari"},
77
+
78
+ # Punjabi
79
+ "pan": {"nllb": "pan_Guru", "name": "Punjabi", "script": "Gurmukhi"},
80
+ "pa": {"nllb": "pan_Guru", "name": "Punjabi", "script": "Gurmukhi"},
81
+ "pa-in": {"nllb": "pan_Guru", "name": "Punjabi", "script": "Gurmukhi"},
82
+
83
+ # Urdu
84
+ "urd": {"nllb": "urd_Arab", "name": "Urdu", "script": "Arabic"},
85
+ "ur": {"nllb": "urd_Arab", "name": "Urdu", "script": "Arabic"},
86
+ "ur-pk": {"nllb": "urd_Arab", "name": "Urdu", "script": "Arabic"},
87
+ "ur-in": {"nllb": "urd_Arab", "name": "Urdu", "script": "Arabic"},
88
+ }
89
+
90
+ # List of supported languages for API responses
91
+ SUPPORTED_LANGUAGES = [
92
+ {"code": "eng", "name": "English", "nllb_code": "eng_Latn"},
93
+ {"code": "hin", "name": "Hindi", "nllb_code": "hin_Deva"},
94
+ {"code": "tam", "name": "Tamil", "nllb_code": "tam_Taml"},
95
+ {"code": "tel", "name": "Telugu", "nllb_code": "tel_Telu"},
96
+ {"code": "kan", "name": "Kannada", "nllb_code": "kan_Knda"},
97
+ {"code": "mal", "name": "Malayalam", "nllb_code": "mal_Mlym"},
98
+ {"code": "guj", "name": "Gujarati", "nllb_code": "guj_Gujr"},
99
+ {"code": "ben", "name": "Bengali", "nllb_code": "ben_Beng"},
100
+ {"code": "mar", "name": "Marathi", "nllb_code": "mar_Deva"},
101
+ {"code": "pan", "name": "Punjabi", "nllb_code": "pan_Guru"},
102
+ {"code": "urd", "name": "Urdu", "nllb_code": "urd_Arab"},
103
+ ]
104
+
105
+ # Whisper language code to our language code mapping
106
+ # Whisper returns ISO 639-1 codes, we normalize to our codes
107
+ WHISPER_LANG_MAP = {
108
+ "en": "eng",
109
+ "hi": "hin",
110
+ "ta": "tam",
111
+ "te": "tel",
112
+ "kn": "kan",
113
+ "ml": "mal",
114
+ "gu": "guj",
115
+ "bn": "ben",
116
+ "mr": "mar",
117
+ "pa": "pan",
118
+ "ur": "urd",
119
+ }
120
+
121
+ # =============================================================================
122
+ # RUNTIME SETTINGS
123
+ # =============================================================================
124
+
125
+ # Model loading settings
126
+ # Set to True to load models on startup (slower startup, faster first request)
127
+ # Set to False for lazy loading (faster startup, slower first request)
128
+ PRELOAD_MODELS = False
129
+
130
+ # Maximum text length for translation (to avoid OOM errors)
131
+ MAX_TRANSLATION_LENGTH = 5000 # characters
132
+
133
+ # Audio extraction settings
134
+ AUDIO_FORMAT = "wav"
135
+ AUDIO_SAMPLE_RATE = 16000 # Whisper expects 16kHz
136
+
137
+ # Temporary file settings
138
+ TEMP_DIR = os.path.join(os.path.dirname(__file__), "temp")
139
+
140
+ # =============================================================================
141
+ # HELPER FUNCTIONS
142
+ # =============================================================================
143
+
144
+ def get_nllb_code(lang_code: str) -> str:
145
+ """Convert a language code to NLLB-200 format."""
146
+ lang_code = lang_code.lower().strip()
147
+ if lang_code in LANGUAGE_MAP:
148
+ return LANGUAGE_MAP[lang_code]["nllb"]
149
+ raise ValueError(f"Unsupported language code: {lang_code}")
150
+
151
+
152
+ def get_language_name(lang_code: str) -> str:
153
+ """Get the full name of a language from its code."""
154
+ lang_code = lang_code.lower().strip()
155
+ if lang_code in LANGUAGE_MAP:
156
+ return LANGUAGE_MAP[lang_code]["name"]
157
+ return lang_code
158
+
159
+
160
+ def normalize_whisper_lang(whisper_code: str) -> str:
161
+ """Convert Whisper's language code to our format."""
162
+ whisper_code = whisper_code.lower().strip()
163
+ return WHISPER_LANG_MAP.get(whisper_code, whisper_code)
164
+
165
+
166
+ def is_english(lang_code: str) -> bool:
167
+ """Check if a language code represents English."""
168
+ lang_code = lang_code.lower().strip()
169
+ return lang_code in ["en", "eng", "english", "en-in", "en-us", "en-gb", "en-au"]
requirements.txt ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # =============================================================================
2
+ # Core Flask Dependencies
3
+ # =============================================================================
4
+ Flask==3.0.0
5
+ flask-cors==4.0.0
6
+ gunicorn==21.2.0
7
+ python-dotenv==1.0.0
8
+ Werkzeug==3.0.1
9
+
10
+ # =============================================================================
11
+ # HTTP Clients
12
+ # =============================================================================
13
+ requests>=2.31.0
14
+ httpx>=0.24.0,<0.26.0
15
+
16
+ # =============================================================================
17
+ # YouTube Download
18
+ # =============================================================================
19
+ yt-dlp>=2024.1.1
20
+
21
+ # =============================================================================
22
+ # Groq API for Summarization (FREE)
23
+ # =============================================================================
24
+ groq==0.4.1
25
+
26
+ # =============================================================================
27
+ # ML Models (All FREE, run locally)
28
+ # =============================================================================
29
+
30
+ # PyTorch - CPU version for HF Spaces free tier
31
+ --extra-index-url https://download.pytorch.org/whl/cpu
32
+ torch>=2.0.0
33
+ torchaudio>=2.0.0
34
+
35
+ # Hugging Face Transformers
36
+ transformers>=4.36.0
37
+
38
+ # Tokenization for NLLB
39
+ sentencepiece>=0.1.99
40
+
41
+ # Audio processing
42
+ soundfile>=0.12.0
43
+ librosa>=0.10.0
44
+
45
+ # =============================================================================
46
+ # Language Detection
47
+ # =============================================================================
48
+ langdetect>=1.0.9
services/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # Services package for YouTube Summarizer API
services/speech_to_text.py ADDED
@@ -0,0 +1,303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Speech-to-Text Service using OpenAI Whisper (Local Model)
3
+
4
+ This service provides LOCAL speech-to-text transcription using Whisper.
5
+ NO API CALLS - everything runs on your machine for FREE!
6
+
7
+ Features:
8
+ - Extracts audio from YouTube videos using yt-dlp
9
+ - Transcribes audio using Whisper (small model by default)
10
+ - Detects the language of the audio automatically
11
+ - Returns both transcript and detected language
12
+
13
+ Requirements:
14
+ - FFmpeg must be installed on the system
15
+ - Sufficient RAM (~2GB for whisper-small)
16
+ """
17
+
18
+ import os
19
+ import tempfile
20
+ import logging
21
+ from typing import Optional, Tuple
22
+
23
+ import torch
24
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
25
+ import yt_dlp
26
+
27
+ from config import (
28
+ WHISPER_MODEL,
29
+ AUDIO_FORMAT,
30
+ AUDIO_SAMPLE_RATE,
31
+ normalize_whisper_lang,
32
+ )
33
+
34
+ # Configure logging
35
+ logging.basicConfig(level=logging.INFO)
36
+ logger = logging.getLogger(__name__)
37
+
38
+
39
+ def get_ffmpeg_path() -> Optional[str]:
40
+ """
41
+ Get the path to FFmpeg executable directory.
42
+ Uses static-ffmpeg which provides both ffmpeg and ffprobe.
43
+ Falls back to system PATH or imageio-ffmpeg.
44
+ """
45
+ import shutil
46
+
47
+ # Check if ffmpeg AND ffprobe are in system PATH
48
+ ffmpeg_path = shutil.which("ffmpeg")
49
+ ffprobe_path = shutil.which("ffprobe")
50
+ if ffmpeg_path and ffprobe_path:
51
+ logger.info(f"Using system FFmpeg: {ffmpeg_path}")
52
+ return os.path.dirname(ffmpeg_path)
53
+
54
+ # Try static-ffmpeg (provides both ffmpeg and ffprobe)
55
+ try:
56
+ import static_ffmpeg
57
+ # This downloads ffmpeg/ffprobe if not already present
58
+ ffmpeg_path, ffprobe_path = static_ffmpeg.run.get_or_fetch_platform_executables_else_raise()
59
+ if ffmpeg_path and os.path.exists(ffmpeg_path):
60
+ ffmpeg_dir = os.path.dirname(ffmpeg_path)
61
+ logger.info(f"Using static-ffmpeg: {ffmpeg_dir}")
62
+ return ffmpeg_dir
63
+ except ImportError:
64
+ logger.warning("static-ffmpeg not installed")
65
+ except Exception as e:
66
+ logger.warning(f"static-ffmpeg error: {e}")
67
+
68
+ # Fall back to imageio-ffmpeg (only has ffmpeg, not ffprobe)
69
+ try:
70
+ import imageio_ffmpeg
71
+ ffmpeg_path = imageio_ffmpeg.get_ffmpeg_exe()
72
+ if ffmpeg_path and os.path.exists(ffmpeg_path):
73
+ logger.warning("Using imageio-ffmpeg (may not have ffprobe)")
74
+ return os.path.dirname(ffmpeg_path)
75
+ except ImportError:
76
+ pass
77
+
78
+ return None
79
+
80
+
81
+ class SpeechToTextService:
82
+ """
83
+ Service for converting speech to text using local Whisper model.
84
+
85
+ The model is lazily loaded on first use to save memory during startup.
86
+ All processing happens locally - no API costs!
87
+ """
88
+
89
+ def __init__(self, model_name: str = WHISPER_MODEL):
90
+ """
91
+ Initialize the speech-to-text service.
92
+
93
+ Args:
94
+ model_name: Hugging Face model identifier for Whisper
95
+ """
96
+ self.model_name = model_name
97
+ self._pipe = None # Lazy-loaded pipeline
98
+ self._device = "cuda" if torch.cuda.is_available() else "cpu"
99
+ self._torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
100
+
101
+ logger.info(f"SpeechToTextService initialized (device: {self._device})")
102
+
103
+ def _load_model(self):
104
+ """
105
+ Load the Whisper model and processor.
106
+ Called lazily on first transcription request.
107
+ """
108
+ if self._pipe is not None:
109
+ return
110
+
111
+ logger.info(f"Loading Whisper model: {self.model_name}")
112
+ logger.info("This may take a few minutes on first run (downloading model)...")
113
+
114
+ try:
115
+ # Load model with optimizations for CPU/GPU
116
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(
117
+ self.model_name,
118
+ torch_dtype=self._torch_dtype,
119
+ low_cpu_mem_usage=True,
120
+ use_safetensors=True
121
+ )
122
+ model.to(self._device)
123
+
124
+ # Load processor
125
+ processor = AutoProcessor.from_pretrained(self.model_name)
126
+
127
+ # Create pipeline for easy inference
128
+ self._pipe = pipeline(
129
+ "automatic-speech-recognition",
130
+ model=model,
131
+ tokenizer=processor.tokenizer,
132
+ feature_extractor=processor.feature_extractor,
133
+ torch_dtype=self._torch_dtype,
134
+ device=self._device,
135
+ return_timestamps=False
136
+ )
137
+
138
+ logger.info("Whisper model loaded successfully!")
139
+
140
+ except Exception as e:
141
+ logger.error(f"Failed to load Whisper model: {e}")
142
+ raise Exception(f"Could not load Whisper model: {str(e)}")
143
+
144
+ def extract_audio_from_youtube(self, url: str) -> str:
145
+ """
146
+ Extract audio from a YouTube video.
147
+
148
+ Args:
149
+ url: YouTube video URL
150
+
151
+ Returns:
152
+ Path to the extracted audio file (WAV format)
153
+
154
+ Raises:
155
+ Exception: If audio extraction fails
156
+ """
157
+ logger.info(f"Extracting audio from: {url}")
158
+
159
+ # Get FFmpeg path (system or imageio-ffmpeg)
160
+ ffmpeg_path = get_ffmpeg_path()
161
+ if not ffmpeg_path:
162
+ raise Exception("FFmpeg not found. Please install FFmpeg or run: pip install imageio-ffmpeg")
163
+
164
+ logger.info(f"Using FFmpeg: {ffmpeg_path}")
165
+
166
+ # Create temporary directory for audio file
167
+ temp_dir = tempfile.mkdtemp()
168
+ output_template = os.path.join(temp_dir, "audio.%(ext)s")
169
+
170
+ ydl_opts = {
171
+ "format": "bestaudio/best",
172
+ "outtmpl": output_template,
173
+ "postprocessors": [{
174
+ "key": "FFmpegExtractAudio",
175
+ "preferredcodec": AUDIO_FORMAT,
176
+ "preferredquality": "192",
177
+ }],
178
+ "ffmpeg_location": ffmpeg_path, # yt-dlp needs the directory containing ffmpeg and ffprobe
179
+ "quiet": True,
180
+ "no_warnings": True,
181
+ }
182
+
183
+ try:
184
+ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
185
+ ydl.download([url])
186
+
187
+ # Find the extracted audio file
188
+ audio_path = os.path.join(temp_dir, f"audio.{AUDIO_FORMAT}")
189
+
190
+ if not os.path.exists(audio_path):
191
+ raise Exception("Audio file was not created")
192
+
193
+ logger.info(f"Audio extracted to: {audio_path}")
194
+ return audio_path
195
+
196
+ except Exception as e:
197
+ logger.error(f"Audio extraction failed: {e}")
198
+ raise Exception(f"Could not extract audio: {str(e)}")
199
+
200
+ def transcribe_audio(self, audio_path: str) -> dict:
201
+ """
202
+ Transcribe an audio file using Whisper.
203
+
204
+ Args:
205
+ audio_path: Path to the audio file
206
+
207
+ Returns:
208
+ Dictionary with:
209
+ - text: The transcribed text
210
+ - language: Detected language code (normalized)
211
+ - raw_language: Original Whisper language code
212
+ """
213
+ # Ensure model is loaded
214
+ self._load_model()
215
+
216
+ logger.info(f"Transcribing audio: {audio_path}")
217
+
218
+ try:
219
+ # Run transcription
220
+ result = self._pipe(
221
+ audio_path,
222
+ generate_kwargs={
223
+ "task": "transcribe",
224
+ "language": None, # Auto-detect language
225
+ }
226
+ )
227
+
228
+ # Extract text
229
+ text = result.get("text", "").strip()
230
+
231
+ if not text:
232
+ raise Exception("Transcription produced empty text")
233
+
234
+ # Try to get detected language from the model
235
+ # Note: Whisper pipeline may not always return language info
236
+ raw_language = "en" # Default to English
237
+
238
+ # Normalize the language code
239
+ language = normalize_whisper_lang(raw_language)
240
+
241
+ logger.info(f"Transcription complete. Language: {language}")
242
+
243
+ return {
244
+ "text": text,
245
+ "language": language,
246
+ "raw_language": raw_language
247
+ }
248
+
249
+ except Exception as e:
250
+ logger.error(f"Transcription failed: {e}")
251
+ raise Exception(f"Could not transcribe audio: {str(e)}")
252
+
253
+ def transcribe_youtube_video(self, url: str) -> dict:
254
+ """
255
+ Full pipeline: Extract audio from YouTube and transcribe it.
256
+
257
+ Args:
258
+ url: YouTube video URL
259
+
260
+ Returns:
261
+ Dictionary with:
262
+ - text: The transcribed text
263
+ - language: Detected language code
264
+ - word_count: Number of words in transcript
265
+ """
266
+ audio_path = None
267
+
268
+ try:
269
+ # Step 1: Extract audio
270
+ audio_path = self.extract_audio_from_youtube(url)
271
+
272
+ # Step 2: Transcribe
273
+ result = self.transcribe_audio(audio_path)
274
+
275
+ # Add word count
276
+ result["word_count"] = len(result["text"].split())
277
+
278
+ return result
279
+
280
+ finally:
281
+ # Cleanup: Remove temporary audio file
282
+ if audio_path and os.path.exists(audio_path):
283
+ try:
284
+ os.remove(audio_path)
285
+ # Also remove the parent temp directory
286
+ temp_dir = os.path.dirname(audio_path)
287
+ if os.path.exists(temp_dir):
288
+ os.rmdir(temp_dir)
289
+ except:
290
+ pass # Ignore cleanup errors
291
+
292
+ def is_model_loaded(self) -> bool:
293
+ """Check if the Whisper model is currently loaded."""
294
+ return self._pipe is not None
295
+
296
+ def warmup(self):
297
+ """
298
+ Pre-load the model to avoid delay on first request.
299
+ Call this during application startup if desired.
300
+ """
301
+ logger.info("Warming up SpeechToTextService...")
302
+ self._load_model()
303
+ logger.info("SpeechToTextService warmup complete!")
services/summarizer.py ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from groq import Groq
3
+ from dotenv import load_dotenv
4
+
5
+ load_dotenv()
6
+
7
+
8
+ class SummarizerService:
9
+ """Service for generating AI-powered summaries using Groq LLaMA"""
10
+
11
+ def __init__(self):
12
+ api_key = os.getenv("GROQ_API_KEY")
13
+ if not api_key:
14
+ raise Exception("GROQ_API_KEY not found in environment variables")
15
+
16
+ self.client = Groq(api_key=api_key.strip())
17
+
18
+ def chunk_text(self, text: str, max_chars: int = 2500) -> list:
19
+ """
20
+ Split text into smaller chunks to avoid token limits
21
+
22
+ Args:
23
+ text: Text to chunk
24
+ max_chars: Maximum characters per chunk
25
+
26
+ Returns:
27
+ List of text chunks
28
+ """
29
+ words = text.split()
30
+ chunks = []
31
+ current_chunk = []
32
+ current_length = 0
33
+
34
+ for word in words:
35
+ word_length = len(word) + 1 # +1 for space
36
+ if current_length + word_length > max_chars and current_chunk:
37
+ chunks.append(" ".join(current_chunk))
38
+ current_chunk = [word]
39
+ current_length = word_length
40
+ else:
41
+ current_chunk.append(word)
42
+ current_length += word_length
43
+
44
+ if current_chunk:
45
+ chunks.append(" ".join(current_chunk))
46
+
47
+ return chunks
48
+
49
+ def summarize(
50
+ self,
51
+ text: str,
52
+ summary_type: str = "general",
53
+ chunk_size: int = 2500,
54
+ max_tokens: int = 500
55
+ ) -> str:
56
+ """
57
+ Summarize text using Groq's LLaMA model with chunking for large texts
58
+
59
+ Args:
60
+ text: Text to summarize
61
+ summary_type: Type of summary (general, detailed, bullet_points, key_takeaways)
62
+ chunk_size: Maximum characters per chunk
63
+ max_tokens: Maximum tokens for summary generation
64
+
65
+ Returns:
66
+ Generated summary text
67
+ """
68
+ # Check if text is too long and needs chunking
69
+ if len(text) > 3000:
70
+ chunks = self.chunk_text(text, max_chars=chunk_size)
71
+ chunk_summaries = []
72
+
73
+ for i, chunk in enumerate(chunks):
74
+ try:
75
+ # Summarize each chunk
76
+ prompt = f"Please provide a concise summary of this part of a video transcript:\n\n{chunk}"
77
+
78
+ response = self.client.chat.completions.create(
79
+ model="llama-3.1-8b-instant",
80
+ messages=[
81
+ {"role": "user", "content": prompt}
82
+ ],
83
+ max_tokens=min(300, max_tokens // 2),
84
+ temperature=0.1
85
+ )
86
+
87
+ chunk_summaries.append(response.choices[0].message.content)
88
+
89
+ except Exception as e:
90
+ raise Exception(f"Error summarizing chunk {i+1}: {str(e)}")
91
+
92
+ # Combine all chunk summaries
93
+ combined_summary = "\n\n".join(chunk_summaries)
94
+
95
+ # Create final summary from combined chunks
96
+ final_prompts = {
97
+ "general": f"Please create a cohesive summary from these section summaries of a video:\n\n{combined_summary}",
98
+ "detailed": f"Please create a detailed, well-structured summary from these section summaries:\n\n{combined_summary}",
99
+ "bullet_points": f"Please organize these section summaries into clear bullet points:\n\n{combined_summary}",
100
+ "key_takeaways": f"Please extract the main insights and key takeaways from these summaries:\n\n{combined_summary}"
101
+ }
102
+
103
+ try:
104
+ final_response = self.client.chat.completions.create(
105
+ model="llama-3.1-8b-instant",
106
+ messages=[
107
+ {"role": "user", "content": final_prompts[summary_type]}
108
+ ],
109
+ max_tokens=max_tokens,
110
+ temperature=0.1
111
+ )
112
+
113
+ return final_response.choices[0].message.content
114
+
115
+ except Exception as e:
116
+ # If final summary fails, return the combined chunk summaries
117
+ return combined_summary
118
+
119
+ else:
120
+ # Original logic for shorter texts
121
+ prompts = {
122
+ "general": f"Please provide a clear and concise summary of the following video transcript:\n\n{text}",
123
+ "detailed": f"Please provide a detailed summary with key points and main topics from the following video transcript:\n\n{text}",
124
+ "bullet_points": f"Please summarize the following video transcript in bullet points, highlighting the main topics:\n\n{text}",
125
+ "key_takeaways": f"Please extract the key takeaways and main insights from the following video transcript:\n\n{text}"
126
+ }
127
+
128
+ try:
129
+ response = self.client.chat.completions.create(
130
+ model="llama-3.1-8b-instant",
131
+ messages=[
132
+ {"role": "user", "content": prompts[summary_type]}
133
+ ],
134
+ max_tokens=max_tokens,
135
+ temperature=0.1
136
+ )
137
+
138
+ return response.choices[0].message.content
139
+
140
+ except Exception as e:
141
+ raise Exception(f"Error generating summary: {str(e)}")
services/transcript.py ADDED
@@ -0,0 +1,241 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Transcript Service for YouTube Videos
3
+
4
+ This service extracts transcripts from YouTube videos using multiple methods:
5
+ 1. First, try to get existing subtitles/captions (fastest, no model needed)
6
+ 2. If no subtitles available, fallback to audio extraction + Whisper transcription
7
+
8
+ The fallback uses the SpeechToTextService for local Whisper transcription.
9
+ """
10
+
11
+ import re
12
+ import os
13
+ import tempfile
14
+ import logging
15
+ from typing import Optional, Tuple
16
+
17
+ import yt_dlp
18
+
19
+ # Configure logging
20
+ logging.basicConfig(level=logging.INFO)
21
+ logger = logging.getLogger(__name__)
22
+
23
+
24
+ class TranscriptService:
25
+ """
26
+ Service for extracting transcripts from YouTube videos.
27
+
28
+ Supports two methods:
29
+ 1. Subtitle extraction (fast, no ML models)
30
+ 2. Audio transcription via Whisper (slower, requires SpeechToTextService)
31
+ """
32
+
33
+ def __init__(self):
34
+ """Initialize the transcript service."""
35
+ self._speech_to_text = None # Lazy-loaded
36
+
37
+ def _get_speech_to_text_service(self):
38
+ """Lazy-load the SpeechToTextService to avoid loading Whisper unless needed."""
39
+ if self._speech_to_text is None:
40
+ from services.speech_to_text import SpeechToTextService
41
+ self._speech_to_text = SpeechToTextService()
42
+ return self._speech_to_text
43
+
44
+ def extract_video_id(self, url: str) -> str:
45
+ """
46
+ Extract video ID from YouTube URL.
47
+
48
+ Args:
49
+ url: YouTube URL in various formats
50
+
51
+ Returns:
52
+ 11-character video ID
53
+
54
+ Raises:
55
+ ValueError: If URL is invalid
56
+ """
57
+ regex = r"(?:v=|\/|youtu\.be\/)([0-9A-Za-z_-]{11}).*"
58
+ match = re.search(regex, url)
59
+ if match:
60
+ return match.group(1)
61
+ raise ValueError("Invalid YouTube URL")
62
+
63
+ def clean_autogen_transcript(self, text: str) -> str:
64
+ """
65
+ Clean auto-generated YouTube captions.
66
+
67
+ Removes:
68
+ - <c>...</c> tags
69
+ - Timestamps like <00:00:06.480>
70
+ - Multiple spaces
71
+
72
+ Args:
73
+ text: Raw VTT subtitle text
74
+
75
+ Returns:
76
+ Cleaned transcript text
77
+ """
78
+ # Remove <c>...</c> tags
79
+ text = re.sub(r"</?c>", "", text)
80
+
81
+ # Remove timestamps like <00:00:06.480>
82
+ text = re.sub(r"<\d{2}:\d{2}:\d{2}\.\d{3}>", "", text)
83
+
84
+ # Collapse multiple spaces
85
+ text = re.sub(r"\s+", " ", text).strip()
86
+
87
+ return text
88
+
89
+ def get_subtitles(self, url: str, lang: str = "en") -> Optional[dict]:
90
+ """
91
+ Try to get existing subtitles from YouTube.
92
+
93
+ Args:
94
+ url: YouTube video URL
95
+ lang: Preferred language code (default: "en")
96
+
97
+ Returns:
98
+ Dictionary with transcript and language, or None if no subtitles
99
+ """
100
+ with tempfile.TemporaryDirectory() as temp_dir:
101
+ ydl_opts = {
102
+ "skip_download": True,
103
+ "writesubtitles": True,
104
+ "writeautomaticsub": True,
105
+ "subtitlesformat": "vtt",
106
+ "outtmpl": os.path.join(temp_dir, "%(id)s.%(ext)s"),
107
+ "quiet": True,
108
+ }
109
+
110
+ try:
111
+ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
112
+ info = ydl.extract_info(url, download=False)
113
+ ydl.download([url])
114
+
115
+ # Find subtitle file
116
+ video_id = info["id"]
117
+ sub_file = None
118
+ detected_lang = "eng"
119
+
120
+ for file in os.listdir(temp_dir):
121
+ if file.startswith(video_id) and file.endswith(".vtt"):
122
+ sub_file = os.path.join(temp_dir, file)
123
+ # Try to extract language from filename
124
+ # Format: videoId.lang.vtt
125
+ parts = file.split(".")
126
+ if len(parts) >= 3:
127
+ detected_lang = parts[-2]
128
+ break
129
+
130
+ if not sub_file:
131
+ logger.info("No subtitle file found")
132
+ return None
133
+
134
+ # Read and clean VTT file
135
+ lines = []
136
+ with open(sub_file, "r", encoding="utf-8") as f:
137
+ for line in f:
138
+ line = line.strip()
139
+ if not line:
140
+ continue
141
+ if line.startswith("WEBVTT"):
142
+ continue
143
+ if "-->" in line:
144
+ continue
145
+ if re.match(r"^\d+$", line):
146
+ continue
147
+ lines.append(line)
148
+
149
+ raw_text = " ".join(lines)
150
+ clean_text = self.clean_autogen_transcript(raw_text)
151
+
152
+ if not clean_text or len(clean_text.strip()) < 50:
153
+ logger.info("Extracted subtitles too short")
154
+ return None
155
+
156
+ # Map common language codes
157
+ lang_map = {
158
+ "en": "eng", "en-US": "eng", "en-GB": "eng",
159
+ "hi": "hin", "hi-IN": "hin",
160
+ "ta": "tam", "ta-IN": "tam",
161
+ "te": "tel", "te-IN": "tel",
162
+ "kn": "kan", "kn-IN": "kan",
163
+ "ml": "mal", "ml-IN": "mal",
164
+ "gu": "guj", "gu-IN": "guj",
165
+ "bn": "ben", "bn-IN": "ben",
166
+ "mr": "mar", "mr-IN": "mar",
167
+ "pa": "pan", "pa-IN": "pan",
168
+ "ur": "urd", "ur-PK": "urd",
169
+ }
170
+
171
+ normalized_lang = lang_map.get(detected_lang, detected_lang)
172
+
173
+ logger.info(f"Subtitles extracted successfully (language: {normalized_lang})")
174
+
175
+ return {
176
+ "transcript": clean_text,
177
+ "language": normalized_lang,
178
+ "source": "subtitles",
179
+ "word_count": len(clean_text.split())
180
+ }
181
+
182
+ except Exception as e:
183
+ logger.warning(f"Subtitle extraction failed: {e}")
184
+ return None
185
+
186
+ def get_video_transcript(self, url: str, use_whisper_fallback: bool = True) -> dict:
187
+ """
188
+ Get transcript from a YouTube video.
189
+
190
+ First tries to get subtitles. If unavailable and use_whisper_fallback is True,
191
+ falls back to audio extraction and Whisper transcription.
192
+
193
+ Args:
194
+ url: YouTube video URL
195
+ use_whisper_fallback: Whether to use Whisper if no subtitles (default: True)
196
+
197
+ Returns:
198
+ Dictionary with:
199
+ - transcript: The transcript text
200
+ - language: Detected/extracted language code
201
+ - source: "subtitles" or "whisper"
202
+ - word_count: Number of words
203
+
204
+ Raises:
205
+ Exception: If transcript cannot be obtained
206
+ """
207
+ # Try subtitles first (faster, no model needed)
208
+ logger.info("Attempting to get subtitles...")
209
+ result = self.get_subtitles(url)
210
+
211
+ if result:
212
+ return result
213
+
214
+ # Fallback to Whisper transcription
215
+ if use_whisper_fallback:
216
+ logger.info("No subtitles found. Falling back to Whisper transcription...")
217
+
218
+ try:
219
+ stt_service = self._get_speech_to_text_service()
220
+ whisper_result = stt_service.transcribe_youtube_video(url)
221
+
222
+ return {
223
+ "transcript": whisper_result["text"],
224
+ "language": whisper_result["language"],
225
+ "source": "whisper",
226
+ "word_count": whisper_result["word_count"]
227
+ }
228
+
229
+ except Exception as e:
230
+ logger.error(f"Whisper transcription failed: {e}")
231
+ raise Exception(f"Could not retrieve transcript: {str(e)}")
232
+
233
+ raise Exception("No subtitles available and Whisper fallback is disabled")
234
+
235
+ def get_video_transcript_legacy(self, url: str, lang: str = "en") -> str:
236
+ """
237
+ Legacy method for backward compatibility.
238
+ Returns only the transcript text (no language info).
239
+ """
240
+ result = self.get_video_transcript(url, use_whisper_fallback=True)
241
+ return result["transcript"]
services/translation.py ADDED
@@ -0,0 +1,330 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Translation Service using NLLB-200 (Local Model)
3
+
4
+ This service provides LOCAL translation between English and Indian languages.
5
+ NO API CALLS - everything runs on your machine for FREE!
6
+
7
+ Supported Languages:
8
+ - English (eng)
9
+ - Hindi (hin)
10
+ - Tamil (tam)
11
+ - Telugu (tel)
12
+ - Kannada (kan)
13
+ - Malayalam (mal)
14
+ - Gujarati (guj)
15
+ - Bengali (ben)
16
+ - Marathi (mar)
17
+ - Punjabi (pan)
18
+ - Urdu (urd)
19
+
20
+ Model Used: facebook/nllb-200-distilled-600M (~2.4GB)
21
+ This is the smallest NLLB model, optimized for lower RAM usage.
22
+ """
23
+
24
+ import logging
25
+ from typing import Optional
26
+
27
+ import torch
28
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
29
+ from langdetect import detect, LangDetectException
30
+
31
+ from config import (
32
+ NLLB_MODEL,
33
+ LANGUAGE_MAP,
34
+ SUPPORTED_LANGUAGES,
35
+ MAX_TRANSLATION_LENGTH,
36
+ get_nllb_code,
37
+ get_language_name,
38
+ is_english,
39
+ )
40
+
41
+ # Configure logging
42
+ logging.basicConfig(level=logging.INFO)
43
+ logger = logging.getLogger(__name__)
44
+
45
+
46
+ class TranslationService:
47
+ """
48
+ Service for translating text between languages using NLLB-200.
49
+
50
+ The model is lazily loaded on first use to save memory during startup.
51
+ All processing happens locally - no API costs!
52
+ """
53
+
54
+ def __init__(self, model_name: str = NLLB_MODEL):
55
+ """
56
+ Initialize the translation service.
57
+
58
+ Args:
59
+ model_name: Hugging Face model identifier for NLLB-200
60
+ """
61
+ self.model_name = model_name
62
+ self._model = None
63
+ self._tokenizer = None
64
+ self._device = "cuda" if torch.cuda.is_available() else "cpu"
65
+
66
+ logger.info(f"TranslationService initialized (device: {self._device})")
67
+
68
+ def _load_model(self):
69
+ """
70
+ Load the NLLB-200 model and tokenizer.
71
+ Called lazily on first translation request.
72
+ """
73
+ if self._model is not None:
74
+ return
75
+
76
+ logger.info(f"Loading NLLB-200 model: {self.model_name}")
77
+ logger.info("This may take a few minutes on first run (downloading ~2.4GB model)...")
78
+
79
+ try:
80
+ # Load tokenizer
81
+ self._tokenizer = AutoTokenizer.from_pretrained(self.model_name)
82
+
83
+ # Load model with memory optimizations
84
+ self._model = AutoModelForSeq2SeqLM.from_pretrained(
85
+ self.model_name,
86
+ torch_dtype=torch.float32, # Use float32 for CPU compatibility
87
+ low_cpu_mem_usage=True
88
+ )
89
+ self._model.to(self._device)
90
+
91
+ logger.info("NLLB-200 model loaded successfully!")
92
+
93
+ except Exception as e:
94
+ logger.error(f"Failed to load NLLB-200 model: {e}")
95
+ raise Exception(f"Could not load translation model: {str(e)}")
96
+
97
+ def detect_language(self, text: str) -> dict:
98
+ """
99
+ Detect the language of the given text.
100
+
101
+ Args:
102
+ text: Text to detect language for
103
+
104
+ Returns:
105
+ Dictionary with:
106
+ - code: Normalized language code (e.g., "hin")
107
+ - name: Language name (e.g., "Hindi")
108
+ - confidence: Detection confidence (if available)
109
+ """
110
+ try:
111
+ # Use langdetect library
112
+ detected = detect(text)
113
+
114
+ # Map to our language codes
115
+ lang_mapping = {
116
+ "en": "eng",
117
+ "hi": "hin",
118
+ "ta": "tam",
119
+ "te": "tel",
120
+ "kn": "kan",
121
+ "ml": "mal",
122
+ "gu": "guj",
123
+ "bn": "ben",
124
+ "mr": "mar",
125
+ "pa": "pan",
126
+ "ur": "urd",
127
+ }
128
+
129
+ code = lang_mapping.get(detected, detected)
130
+ name = get_language_name(code)
131
+
132
+ logger.info(f"Detected language: {name} ({code})")
133
+
134
+ return {
135
+ "code": code,
136
+ "name": name,
137
+ "raw_code": detected
138
+ }
139
+
140
+ except LangDetectException as e:
141
+ logger.warning(f"Language detection failed: {e}")
142
+ # Default to English if detection fails
143
+ return {
144
+ "code": "eng",
145
+ "name": "English",
146
+ "raw_code": "en"
147
+ }
148
+
149
+ def translate(
150
+ self,
151
+ text: str,
152
+ source_lang: str,
153
+ target_lang: str,
154
+ max_length: int = 1024
155
+ ) -> str:
156
+ """
157
+ Translate text from source language to target language.
158
+
159
+ Args:
160
+ text: Text to translate
161
+ source_lang: Source language code (e.g., "hin", "eng")
162
+ target_lang: Target language code (e.g., "eng", "tam")
163
+ max_length: Maximum output length
164
+
165
+ Returns:
166
+ Translated text
167
+
168
+ Raises:
169
+ ValueError: If language codes are invalid
170
+ Exception: If translation fails
171
+ """
172
+ # Ensure model is loaded
173
+ self._load_model()
174
+
175
+ # Validate and get NLLB codes
176
+ try:
177
+ source_nllb = get_nllb_code(source_lang)
178
+ target_nllb = get_nllb_code(target_lang)
179
+ except ValueError as e:
180
+ raise ValueError(str(e))
181
+
182
+ logger.info(f"Translating from {source_lang} to {target_lang}")
183
+
184
+ # Handle long texts by chunking
185
+ if len(text) > MAX_TRANSLATION_LENGTH:
186
+ logger.info(f"Text too long ({len(text)} chars), chunking...")
187
+ return self._translate_long_text(text, source_lang, target_lang, max_length)
188
+
189
+ try:
190
+ # Set source language for tokenizer
191
+ self._tokenizer.src_lang = source_nllb
192
+
193
+ # Tokenize input
194
+ inputs = self._tokenizer(
195
+ text,
196
+ return_tensors="pt",
197
+ padding=True,
198
+ truncation=True,
199
+ max_length=max_length
200
+ )
201
+ inputs = {k: v.to(self._device) for k, v in inputs.items()}
202
+
203
+ # Get target language token ID
204
+ forced_bos_token_id = self._tokenizer.convert_tokens_to_ids(target_nllb)
205
+
206
+ # Generate translation
207
+ with torch.no_grad():
208
+ outputs = self._model.generate(
209
+ **inputs,
210
+ forced_bos_token_id=forced_bos_token_id,
211
+ max_length=max_length,
212
+ num_beams=5,
213
+ length_penalty=1.0,
214
+ early_stopping=True
215
+ )
216
+
217
+ # Decode output
218
+ translated = self._tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
219
+
220
+ logger.info(f"Translation complete ({len(translated)} chars)")
221
+
222
+ return translated.strip()
223
+
224
+ except Exception as e:
225
+ logger.error(f"Translation failed: {e}")
226
+ raise Exception(f"Could not translate text: {str(e)}")
227
+
228
+ def _translate_long_text(
229
+ self,
230
+ text: str,
231
+ source_lang: str,
232
+ target_lang: str,
233
+ max_length: int = 1024
234
+ ) -> str:
235
+ """
236
+ Translate long text by splitting into chunks.
237
+
238
+ Args:
239
+ text: Long text to translate
240
+ source_lang: Source language code
241
+ target_lang: Target language code
242
+ max_length: Maximum output length per chunk
243
+
244
+ Returns:
245
+ Concatenated translated text
246
+ """
247
+ # Split text into sentences (rough approximation)
248
+ sentences = text.replace("।", ".").replace("॥", ".").split(".")
249
+
250
+ chunks = []
251
+ current_chunk = ""
252
+
253
+ for sentence in sentences:
254
+ sentence = sentence.strip()
255
+ if not sentence:
256
+ continue
257
+
258
+ # Check if adding this sentence would exceed limit
259
+ if len(current_chunk) + len(sentence) + 2 > MAX_TRANSLATION_LENGTH:
260
+ if current_chunk:
261
+ chunks.append(current_chunk)
262
+ current_chunk = sentence
263
+ else:
264
+ current_chunk = current_chunk + ". " + sentence if current_chunk else sentence
265
+
266
+ if current_chunk:
267
+ chunks.append(current_chunk)
268
+
269
+ # Translate each chunk
270
+ translated_chunks = []
271
+ for i, chunk in enumerate(chunks):
272
+ logger.info(f"Translating chunk {i+1}/{len(chunks)}")
273
+ translated = self.translate(chunk, source_lang, target_lang, max_length)
274
+ translated_chunks.append(translated)
275
+
276
+ return " ".join(translated_chunks)
277
+
278
+ def translate_to_english(self, text: str, source_lang: str) -> str:
279
+ """
280
+ Convenience method to translate text to English.
281
+
282
+ Args:
283
+ text: Text to translate
284
+ source_lang: Source language code
285
+
286
+ Returns:
287
+ English translation
288
+ """
289
+ if is_english(source_lang):
290
+ return text # Already English
291
+
292
+ return self.translate(text, source_lang, "eng")
293
+
294
+ def translate_from_english(self, text: str, target_lang: str) -> str:
295
+ """
296
+ Convenience method to translate English text to another language.
297
+
298
+ Args:
299
+ text: English text to translate
300
+ target_lang: Target language code
301
+
302
+ Returns:
303
+ Translated text in target language
304
+ """
305
+ if is_english(target_lang):
306
+ return text # Already English
307
+
308
+ return self.translate(text, "eng", target_lang)
309
+
310
+ def get_supported_languages(self) -> list:
311
+ """
312
+ Get list of supported languages.
313
+
314
+ Returns:
315
+ List of language dictionaries with code, name, and nllb_code
316
+ """
317
+ return SUPPORTED_LANGUAGES.copy()
318
+
319
+ def is_model_loaded(self) -> bool:
320
+ """Check if the NLLB model is currently loaded."""
321
+ return self._model is not None
322
+
323
+ def warmup(self):
324
+ """
325
+ Pre-load the model to avoid delay on first request.
326
+ Call this during application startup if desired.
327
+ """
328
+ logger.info("Warming up TranslationService...")
329
+ self._load_model()
330
+ logger.info("TranslationService warmup complete!")