OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published Oct 28, 2025 • 71
InteractComp: Evaluating Search Agents With Ambiguous Queries Paper • 2510.24668 • Published Oct 28, 2025 • 97
ReCode: Unify Plan and Action for Universal Granularity Control Paper • 2510.23564 • Published Oct 27, 2025 • 121
Sparser Block-Sparse Attention via Token Permutation Paper • 2510.21270 • Published Oct 24, 2025 • 24
BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions Paper • 2510.05318 • Published Oct 6, 2025 • 21
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling Paper • 2509.23909 • Published Sep 28, 2025 • 32
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning Paper • 2509.13305 • Published Sep 16, 2025 • 91
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper • 2509.15221 • Published Sep 18, 2025 • 111
WideSearch: Benchmarking Agentic Broad Info-Seeking Paper • 2508.07999 • Published Aug 11, 2025 • 110
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning Paper • 2506.08989 • Published Jun 10, 2025 • 14
ARIA: Training Language Agents with Intention-Driven Reward Aggregation Paper • 2506.00539 • Published May 31, 2025 • 30
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering Paper • 2505.23604 • Published May 29, 2025 • 23
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper • 2505.19897 • Published May 26, 2025 • 104
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping Paper • 2505.15612 • Published May 21, 2025 • 34
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31, 2025 • 301
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31, 2025 • 301
ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering Paper • 2503.16867 • Published Mar 21, 2025 • 11
Atom of Thoughts for Markov LLM Test-Time Scaling Paper • 2502.12018 • Published Feb 17, 2025 • 17 • 4