CVPR Demo Track

non-profit

http://cvpr2022.thecvf.com/

Activity Feed Request to join this org

AI & ML interests

CVPR Demo Track @ CVPR 2022

Recent Activity

elischwartz authored a paper 10 days ago

FETA: Towards Specializing Foundation Models for Expert Task Applications

elischwartz authored a paper 10 days ago

Teaching Structured Vision&Language Concepts to Vision&Language Models

elischwartz authored a paper 10 days ago

Teaching VLMs to Localize Specific Objects from In-context Examples

View all activity

authored 12 papers 10 days ago

FETA: Towards Specializing Foundation Models for Expert Task Applications

Paper • 2209.03648 • Published Sep 8, 2022

Teaching Structured Vision&Language Concepts to Vision&Language Models

Paper • 2211.11733 • Published Nov 21, 2022

Teaching VLMs to Localize Specific Objects from In-context Examples

Paper • 2411.13317 • Published Nov 20, 2024

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

Paper • 2502.09927 • Published Feb 14, 2025

REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark

Paper • 2502.12342 • Published Feb 17, 2025 • 7

DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers

Paper • 2505.22584 • Published May 28, 2025

Advancing Speech Understanding in Speech-Aware Language Models with GRPO

Paper • 2509.16990 • Published Sep 21, 2025 • 22

ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation

Paper • 2507.19492 • Published May 31, 2025 • 1

CLIMP: Contrastive Language-Image Mamba Pretraining

Paper • 2601.06891 • Published Jan 11 • 3

CARES: Context-Aware Resolution Selector for VLMs

Paper • 2510.19496 • Published Oct 22, 2025 • 9

NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning

Paper • 2404.00459 • Published Mar 30, 2024

WAVECLIP: Wavelet Tokenization for Adaptive-Resolution CLIP

Paper • 2509.21153 • Published Sep 25, 2025 • 1

authored 5 papers 24 days ago

Helios: Real Real-Time Long Video Generation Model

Paper • 2603.04379 • Published 25 days ago • 177

Adaptive 1D Video Diffusion Autoencoder

Paper • 2602.04220 • Published Feb 4 • 5

FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space

Paper • 2602.02092 • Published Feb 2 • 18

Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models

Paper • 2601.07287 • Published Jan 12 • 5

Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting

Paper • 2501.15641 • Published Jan 26, 2025 • 1

authored a paper 4 months ago

RynnVLA-002: A Unified Vision-Language-Action and World Model

Paper • 2511.17502 • Published Nov 21, 2025 • 28

authored 2 papers 6 months ago

TTT3R: 3D Reconstruction as Test-Time Training

Paper • 2509.26645 • Published Sep 30, 2025 • 15

Human3R: Everyone Everywhere All at Once

Paper • 2510.06219 • Published Oct 7, 2025 • 11