38 63 37

Qinghong (Kevin) Lin

KevinQHLin

http://qhlin.me/

AI & ML interests

Vision-Language Model, Video Understanding, Human-AI Interaction

Recent Activity

upvoted a paper 11 days ago

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

authored a paper 17 days ago

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

liked a dataset 17 days ago

kolerk/Video_Reality_Test

View all activity

Organizations

authored a paper 17 days ago

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

Paper • 2512.13281 • Published 19 days ago • 63

submitted a paper to Daily Papers 17 days ago

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

Paper • 2512.13281 • Published 19 days ago • 63

authored 2 papers 17 days ago

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 51

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Paper • 2503.15661 • Published Mar 19, 2025 • 2

authored a paper about 1 month ago

Computer-Use Agents as Judges for Generative User Interface

Paper • 2511.15567 • Published Nov 19, 2025 • 52

authored 2 papers about 2 months ago

Grounding Computer Use Agents on Human Demonstrations

Paper • 2511.07332 • Published Nov 10, 2025 • 105

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Paper • 2511.02778 • Published Nov 4, 2025 • 101

authored 2 papers 3 months ago

Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6, 2025 • 118

Code2Video: A Code-centric Paradigm for Educational Video Generation

Paper • 2510.01174 • Published Oct 1, 2025 • 33

authored a paper 5 months ago

Reinforcement Learning in Vision: A Survey

Paper • 2508.08189 • Published Aug 11, 2025 • 29

authored 2 papers 7 months ago

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Paper • 2505.21497 • Published May 27, 2025 • 109

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Paper • 2505.16854 • Published May 22, 2025 • 11

authored 2 papers 10 months ago

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning

Paper • 2503.13444 • Published Mar 17, 2025 • 17

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary

Paper • 2503.09402 • Published Mar 12, 2025 • 8

authored 2 papers about 1 year ago

ROICtrl: Boosting Instance Control for Visual Generation

Paper • 2411.17949 • Published Nov 27, 2024 • 87

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 89

authored 2 papers over 1 year ago

VideoLLM-online: Online Video Large Language Model for Streaming Video

Paper • 2406.11816 • Published Jun 17, 2024 • 26

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Paper • 2406.10227 • Published Jun 14, 2024 • 9

authored 2 papers almost 2 years ago

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17

Too Large; Data Reduction for Vision-Language Pre-Training

Paper • 2305.20087 • Published May 31, 2023

Qinghong (Kevin) Lin

AI & ML interests

Recent Activity

Organizations

KevinQHLin's activity