The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper ⢠2509.09677 ⢠Published Sep 11, 2025 ⢠35
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Paper ⢠2502.19414 ⢠Published Feb 26, 2025 ⢠20