AnIdealRing's picture

AnIdealRing

SmartDazi

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 4 hours ago

How Far Can Unsupervised RLVR Scale LLM Training?

upvoted a paper 25 days ago

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

upvoted a paper 28 days ago

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models

View all activity

Organizations

SmartDazi 's datasets

None public yet