WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning Paper β’ 2512.02425 β’ Published 25 days ago β’ 23
Sleeping 17 νκ΅μ΄ TTS μλ λ π€ 17 νκ΅μ΄ TTS λͺ¨λΈμ λΈλΌμΈλ ν μ€νΈλ‘ λΉκ΅ νκ°νμΈμ!
RefineBench: Evaluating Refinement Capability of Language Models via Checklists Paper β’ 2511.22173 β’ Published 30 days ago β’ 13
Adaptive Multi-Agent Response Refinement in Conversational Systems Paper β’ 2511.08319 β’ Published Nov 11 β’ 41
Simulating Environments with Reasoning Models for Agent Training Paper β’ 2511.01824 β’ Published Nov 3 β’ 2
AgentFold: Long-Horizon Web Agents with Proactive Context Management Paper β’ 2510.24699 β’ Published Oct 28 β’ 68 β’ 4
AgentFold: Long-Horizon Web Agents with Proactive Context Management Paper β’ 2510.24699 β’ Published Oct 28 β’ 68
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs Paper β’ 2510.04767 β’ Published Oct 6 β’ 27
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs Paper β’ 2510.09201 β’ Published Oct 10 β’ 49
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs Paper β’ 2510.07499 β’ Published Oct 8 β’ 48
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? Paper β’ 2510.02209 β’ Published Oct 2 β’ 53
ACON: Optimizing Context Compression for Long-horizon LLM Agents Paper β’ 2510.00615 β’ Published Oct 1 β’ 32
ACON: Optimizing Context Compression for Long-horizon LLM Agents Paper β’ 2510.00615 β’ Published Oct 1 β’ 32 β’ 2
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper β’ 2509.24002 β’ Published Sep 28 β’ 173
Rethinking Reward Models for Multi-Domain Test-Time Scaling Paper β’ 2510.00492 β’ Published Oct 1 β’ 27
agent-distillation/Qwen2.5-32B-Instruct_cot_trajectories_2k Viewer β’ Updated Jun 9 β’ 3k β’ 61 β’ 1