Collections for the paper "Language Models Can Learn from Verbal Feedback Without Scalar Rewards" (https://arxiv.org/pdf/2509.22638)
-
Renjie-Ranger/FCP_big_math_pro_C-plus_no_concise
Viewer • Updated • 185k • 12 -
Renjie-Ranger/FCP_general_reasoner_pro_C-plus_no_concise
Viewer • Updated • 133k • 10 -
Renjie-Ranger/FCP_general_reasoner_pro_SFT
Viewer • Updated • 272k • 7 -
Renjie-Ranger/FCP_big_math_pro_SFT
Viewer • Updated • 384k • 21 • 1