Schoenfeld's Anatomy of Mathematical Reasoning by Language Models Paper • 2512.19995 • Published 14 days ago • 14
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models Paper • 2512.19995 • Published 14 days ago • 14
Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction Paper • 2512.18880 • Published 15 days ago • 24
V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions Paper • 2512.11995 • Published 24 days ago • 9
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents Paper • 2511.02734 • Published Nov 4, 2025 • 20
Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision Paper • 2507.20976 • Published Jul 28, 2025 • 10
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text Paper • 2506.07001 • Published Jun 8, 2025 • 4
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text Paper • 2506.07001 • Published Jun 8, 2025 • 4
MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving Paper • 2503.03205 • Published Mar 5, 2025 • 4
Diversity-Enhanced Reasoning for Subjective Questions Paper • 2507.20187 • Published Jul 27, 2025 • 25
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published Jun 26, 2025 • 28
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors Paper • 2505.23001 • Published May 29, 2025 • 8 • 2
Attacking by Aligning: Clean-Label Backdoor Attacks on Object Detection Paper • 2307.10487 • Published Jul 19, 2023
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors Paper • 2505.23001 • Published May 29, 2025 • 8
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published Apr 10, 2025 • 48
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? Paper • 2504.06514 • Published Apr 9, 2025 • 39