RL - a goforit123 Collection

goforit123 's Collections

LLM

RL

RL

updated Nov 28, 2025

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29, 2025 • 98
One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 61
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Paper • 2503.16219 • Published Mar 20, 2025 • 52
Performance Trade-offs of Optimizing Small Language Models for E-Commerce

Paper • 2510.21970 • Published Oct 24, 2025 • 2
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29, 2025 • 45
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph

Paper • 2511.00086 • Published Oct 29, 2025 • 41
OpenSIR: Open-Ended Self-Improving Reasoner

Paper • 2511.00602 • Published Nov 1, 2025 • 20
Data-Efficient RLVR via Off-Policy Influence Guidance

Paper • 2510.26491 • Published Oct 30, 2025 • 10
RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization

Paper • 2511.04285 • Published Nov 6, 2025 • 7
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

Paper • 2511.07317 • Published Nov 10, 2025 • 15
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published Nov 9, 2025 • 132
Adaptive Multi-Agent Response Refinement in Conversational Systems

Paper • 2511.08319 • Published Nov 11, 2025 • 41
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism

Paper • 2511.11373 • Published Nov 14, 2025 • 12
P1: Mastering Physics Olympiads with Reinforcement Learning

Paper • 2511.13612 • Published Nov 17, 2025 • 134
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Paper • 2511.14460 • Published Nov 18, 2025 • 20
Soft Adaptive Policy Optimization

Paper • 2511.20347 • Published Nov 25, 2025 • 41