SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds Paper • 2512.01078 • Published Nov 30, 2025 • 33
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 14 days ago • 82
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation Paper • 2504.16060 • Published Apr 22, 2025
4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time Paper • 2506.18890 • Published Jun 23, 2025 • 6
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 14 days ago • 82
AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies Paper • 2508.08113 • Published Aug 11, 2025 • 11
From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens Paper • 2510.02292 • Published Oct 2, 2025 • 1
Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry Paper • 2510.25595 • Published Oct 29, 2025
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation Paper • 2511.01163 • Published Nov 3, 2025 • 31
Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation Paper • 2506.21876 • Published Jun 27, 2025 • 28
Can Vision Language Models Infer Human Gaze Direction? A Controlled Study Paper • 2506.05412 • Published Jun 4, 2025 • 4