goforit123
's Collections
Reinforcement Learning for Reasoning in Large Language Models with One
Training Example
Paper
•
2504.20571
•
Published
•
98
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper
•
2505.18129
•
Published
•
61
Reinforcement Learning for Reasoning in Small LLMs: What Works and What
Doesn't
Paper
•
2503.16219
•
Published
•
52
Performance Trade-offs of Optimizing Small Language Models for
E-Commerce
Paper
•
2510.21970
•
Published
•
2
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise
Reasoning
Paper
•
2510.25992
•
Published
•
45
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
Paper
•
2511.00086
•
Published
•
41
OpenSIR: Open-Ended Self-Improving Reasoner
Paper
•
2511.00602
•
Published
•
20
Data-Efficient RLVR via Off-Policy Influence Guidance
Paper
•
2510.26491
•
Published
•
10
RLoop: An Self-Improving Framework for Reinforcement Learning with
Iterative Policy Initialization
Paper
•
2511.04285
•
Published
•
7
RLVE: Scaling Up Reinforcement Learning for Language Models with
Adaptive Verifiable Environments
Paper
•
2511.07317
•
Published
•
15
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model
Reasoning Ability in VibeThinker-1.5B
Paper
•
2511.06221
•
Published
•
132
Adaptive Multi-Agent Response Refinement in Conversational Systems
Paper
•
2511.08319
•
Published
•
41
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
Paper
•
2511.11373
•
Published
•
12
P1: Mastering Physics Olympiads with Reinforcement Learning
Paper
•
2511.13612
•
Published
•
134
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
Paper
•
2511.14460
•
Published
•
20
Soft Adaptive Policy Optimization
Paper
•
2511.20347
•
Published
•
41