F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare
Paper
•
2602.06717
•
Published
•
67
Scientific research; Natural language processing: speech analytics, search engines, dialogue systems; A family of LLMs; Speech technologies; Fraud prevention technologies; Computer vision; Recommender systems; Time series analysis
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare
T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground