Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published 11 days ago • 17
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published 15 days ago • 94
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published 25 days ago • 115
Exploring MLLM-Diffusion Information Transfer with MetaCanvas Paper • 2512.11464 • Published 22 days ago • 12
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction Paper • 2511.23386 • Published Nov 28, 2025 • 15
Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving Paper • 2512.10739 • Published 23 days ago • 45
Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models Paper • 2512.01949 • Published Dec 1, 2025 • 8
Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization Paper • 2511.22586 • Published Nov 27, 2025 • 6
Monet: Reasoning in Latent Visual Space Beyond Images and Language Paper • 2511.21395 • Published Nov 26, 2025 • 16
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper • 2511.19418 • Published Nov 24, 2025 • 28
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models Paper • 2511.14582 • Published Nov 18, 2025 • 18
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published Nov 12, 2025 • 69
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17, 2025 • 134
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data Paper • 2511.12609 • Published Nov 16, 2025 • 103