numpy torch torchaudio transformers>=4.40 sentencepiece tiktoken funasr>=1.3.3 huggingface_hub modelscope pydub ffmpeg-python librosa