Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 10 days ago • 80
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Paper • 2503.23377 • Published Mar 30 • 57