Automatic Speech Recognition
Transformers
PyTorch
JAX
Safetensors
whisper
audio
hf-asr-leaderboard
Eval Results
Instructions to use openai/whisper-large-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai/whisper-large-v3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("openai/whisper-large-v3") model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v3") - Inference
- Notebooks
- Google Colab
- Kaggle
Why does a tiny silence at the start of my audio change Whisper’s transcription?
#187
by dylanewbie - opened
Hi everyone, I’m using OpenAI’s Whisper for speech recognition.
My audio says “ABC1234,” but sometimes the model outputs “AVC1234.” If I prepend a short silence (e.g., 10ms), it switches to the correct “ABC1234,” but increasing that silence (20ms, 30ms, 40ms, etc.) makes it flip back and forth between “ABC1234” and “AVC1234.”
Even replacing silence with white noise has the same effect.
Has anyone else run into this issue? Why does adding a tiny bit of audio cause such unpredictable changes in the transcription?
Any insights or suggestions would be really helpful!