Whisper only transcribes the English words and ignores the rest of the Spanish audio
#220
by
davera-017
- opened
I’m having an issue transcribing a Spanish audio clip (a speech by Octavio Paz). The audio is mostly in Spanish, but the speaker says a couple of words in English. However, Whisper only transcribes that short English fragment and ignores the rest of the Spanish speech.
Here is the code I’m using (Transformers 4.57.3):
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
import torch
model_id = "openai/whisper-medium"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id,
dtype=torch_dtype,
low_cpu_mem_usage=True,
cache_dir="../data/models",
).to(device)
processor = AutoProcessor.from_pretrained(
model_id,
cache_dir="../data/models"
)
# Disable forced_decoder_ids to avoid automatic translation
model.config.forced_decoder_ids = None
gen_kwargs = {
"language": "es",
"task": "transcribe",
"compression_ratio_threshold": 1.35,
"return_timestamps": True,
}
inputs = processor(
[c.numpy() for c in waveform_tensor],
sampling_rate=16000,
return_tensors="pt",
padding="longest"
)
input_features = inputs.input_features.to(device)
generated_ids = model.generate(input_features, **gen_kwargs)
batch_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
And I'm attaching the chunk that presents the problem.