Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration? Paper • 2606.01247 • Published 4 days ago • 25
Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models Paper • 2605.28132 • Published 8 days ago • 20
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published 3 days ago • 135
Running on Zero Agents 13 NV-Generate Synthetic Medical Imaging 🧠 13 Synthetic 3D CT and MR generation with NVIDIA NV-Generate.
Running on Zero Agents Featured 197 LTX 2.3 Studio 🎬 197 Generate videos from text, images, audio, or video clips
Running Agents 92 Omni-Video-Factory-API-iframe 🐠 92 Access video creation tools via an embedded interface
Learning A Unified Risk Map for Autonomous Driving in Partially Observable Environments Paper • 2605.22189 • Published 14 days ago • 6
WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction Paper • 2605.29341 • Published 7 days ago • 14
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models Paper • 2605.30161 • Published 7 days ago • 57
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 7 days ago • 134
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources Paper • 2605.29250 • Published 7 days ago • 74
Running on Zero Agents Featured 52 VGGT-Omega Demo 🌀 52 3D reconstruction from images/video with VGGT-Omega
WorldKV: Efficient World Memory with World Retrieval and Compression Paper • 2605.22718 • Published 14 days ago • 41