Article
Yihua Zhang
NormalUhr
AI & ML interests
None yet
Organizations
published
an
article
3 months ago
published
an
article
5 months ago
Article
Re-understanding KL Approximation from an RL-for-LLM Lens: Notes on “Approximating KL Divergence”
•
4
published
an
article
5 months ago
Article
From GRPO to DAPO and GSPO: What, Why, and How
•
71
published
an
article
7 months ago
Article
Decorators in Machine Learning
published
an
article
10 months ago
Article
DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Training Background
•
14
published
an
article
11 months ago
Article
Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment
•
94
published
an
article
11 months ago
Article
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge
•
263
published
an
article
11 months ago
Article
A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons
•
28
published
an
article
11 months ago
Article
From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning
•
16
published
an
article
11 months ago
Article
MLA: Redefining KV-Cache Through Low-Rank Projections and On-Demand Decompression
•
18