Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published Dec 2, 2025 • 51
Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting Paper • 2509.11452 • Published Sep 14, 2025 • 13
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30, 2025 • 143
Optimizing Decomposition for Optimal Claim Verification Paper • 2503.15354 • Published Mar 19, 2025 • 18