Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training Paper • 2509.21500 • Published Sep 25 • 18
Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability Paper • 2401.08574 • Published Jan 16, 2024
RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs Paper • 2305.08844 • Published May 15, 2023 • 1