Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind Paper • 2601.15715 • Published 12 days ago • 13
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published 18 days ago • 30
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published 18 days ago • 30
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published 18 days ago • 30
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents Paper • 2601.07264 • Published 22 days ago • 24
CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration? Paper • 2510.24505 • Published Oct 28, 2025 • 4
CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration? Paper • 2510.24505 • Published Oct 28, 2025 • 4
CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration? Paper • 2510.24505 • Published Oct 28, 2025 • 4 • 2
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents Paper • 2511.02734 • Published Nov 4, 2025 • 22 • 2
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents Paper • 2511.02734 • Published Nov 4, 2025 • 22
Mathematical Proof as a Litmus Test: Revealing Failure Modes of Advanced Large Reasoning Models Paper • 2506.17114 • Published Jun 20, 2025
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents Paper • 2511.02734 • Published Nov 4, 2025 • 22
UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Paper • 2509.19736 • Published Sep 24, 2025 • 12
UserBench: An Interactive Gym Environment for User-Centric Agents Paper • 2507.22034 • Published Jul 29, 2025 • 30
Diversity-Enhanced Reasoning for Subjective Questions Paper • 2507.20187 • Published Jul 27, 2025 • 26