SleepWalk: A Three-Tier Benchmark for Stress-Testing Instruction-Guided Vision-Language Navigation Paper • 2605.10376 • Published 10 days ago • 1
SleepWalk: A Three-Tier Benchmark for Stress-Testing Instruction-Guided Vision-Language Navigation Paper • 2605.10376 • Published 10 days ago • 1
Prediction Bottlenecks Don't Discover Causal Structure (But Here's What They Actually Do) Paper • 2605.09169 • Published 12 days ago
Prediction Bottlenecks Don't Discover Causal Structure (But Here's What They Actually Do) Paper • 2605.09169 • Published 12 days ago
Moral Sensitivity in LLMs: A Tiered Evaluation of Contextual Bias via Behavioral Profiling and Mechanistic Interpretability Paper • 2605.03217 • Published 17 days ago • 1
VISTA: Video Interaction Spatio-Temporal Analysis Benchmark Paper • 2605.01391 • Published 19 days ago
PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training Paper • 2604.22117 • Published 23 days ago
Personality Shapes Gender Bias in Persona-Conditioned LLM Narratives Across English and Hindi: An Empirical Investigation Paper • 2604.23600 • Published 25 days ago • 2
Personality Shapes Gender Bias in Persona-Conditioned LLM Narratives Across English and Hindi: An Empirical Investigation Paper • 2604.23600 • Published 25 days ago • 2
CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation Paper • 2604.09746 • Published Apr 10 • 1
CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation Paper • 2604.09746 • Published Apr 10 • 1
Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models Paper • 2603.21854 • Published Mar 23 • 3
Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models Paper • 2603.21854 • Published Mar 23 • 3
Quantized LLama-based Models Collection llama based models quantized to various precisions • 6 items • Updated Mar 18