LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published 7 days ago • 108
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer Paper • 2605.15178 • Published 11 days ago • 80
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI Paper • 2605.08678 • Published 16 days ago • 8
STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation Paper • 2605.08029 • Published 17 days ago • 12
MolmoAct2: Action Reasoning Models for Real-world Deployment Paper • 2605.02881 • Published 21 days ago • 336
Nano-World-Model Collection 🌍 A minimalist repository for training video world models based on diffusion-forcing. • 20 items • Updated 8 days ago • 7
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 28 days ago • 71
Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising Paper • 2604.26694 • Published 26 days ago • 6
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published Mar 12 • 91
Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models Paper • 2601.19834 • Published Jan 27 • 25
Generative Neural Video Compression via Video Diffusion Prior Paper • 2512.05016 • Published Dec 4, 2025 • 10
What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards Paper • 2512.00425 • Published Nov 29, 2025 • 53
First Frame Is the Place to Go for Video Content Customization Paper • 2511.15700 • Published Nov 19, 2025 • 54
WorldGen: From Text to Traversable and Interactive 3D Worlds Paper • 2511.16825 • Published Nov 20, 2025 • 24
RynnVLA-002: A Unified Vision-Language-Action and World Model Paper • 2511.17502 • Published Nov 21, 2025 • 28