video LM
updated
StreamChat: Chatting with Streaming Video
Paper
• 2412.08646
• Published
• 18
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Paper
• 2412.04432
• Published
• 16
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding
by Video Spatiotemporal Augmentation
Paper
• 2412.00927
• Published
• 29
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for
Long-term Streaming Video and Audio Interactions
Paper
• 2412.09596
• Published
• 97
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
• 2412.10360
• Published
• 147
VidTok: A Versatile and Open-Source Video Tokenizer
Paper
• 2412.13061
• Published
• 8
Video-Panda: Parameter-efficient Alignment for Encoder-free
Video-Language Models
Paper
• 2412.18609
• Published
• 17
Dispider: Enabling Video LLMs with Active Real-Time Interaction via
Disentangled Perception, Decision, and Reaction
Paper
• 2501.03218
• Published
• 35
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One
Vision Token
Paper
• 2501.03895
• Published
• 52
MotionBench: Benchmarking and Improving Fine-grained Video Motion
Understanding for Vision Language Models
Paper
• 2501.02955
• Published
• 44