view article Article Red Teaming with RL: Exploiting Tinker API for Harmful RL on 235B Model 4 days ago • 14
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published Mar 10, 2025 • 88