NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity Paper • 2006.06280 • Published Jun 11, 2020
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference Paper • 2409.12117 • Published Sep 18, 2024
Edit-A-Video: Single Video Editing with Object-Aware Consistency Paper • 2303.07945 • Published Mar 14, 2023
VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech Paper • 2408.14739 • Published Aug 27, 2024
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models Paper • 2507.08128 • Published Jul 10, 2025 • 10
Music Flamingo: Scaling Music Understanding in Audio Language Models Paper • 2511.10289 • Published Nov 13, 2025 • 10
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning Paper • 2510.12000 • Published Oct 13, 2025 • 1
ETTA: Elucidating the Design Space of Text-to-Audio Models Paper • 2412.19351 • Published Dec 26, 2024
PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior Paper • 2106.06406 • Published Jun 11, 2021
BigVGAN: A Universal Neural Vocoder with Large-Scale Training Paper • 2206.04658 • Published Jun 9, 2022 • 4