Transformers documentation
TRL
Get started
Base classes
Models
Preprocessors
Inference
Pipeline API
Generate API
Optimization
Chat with models
Serving
Training
Quantization
Ecosystem integrations
Resources
Contribute
API
You are viewing v5.4.0 version. A newer version v5.8.1 is available.
TRL
TRL is a post-training framework for foundation models. It includes methods like SFT, GRPO, and DPO. Each method has a dedicated trainer that builds on the Trainer class and scales from a single GPU to multi-node clusters.
from datasets import load_dataset
from trl import GRPOTrainer
from trl.rewards import accuracy_reward
dataset = load_dataset("trl-lib/DeepMath-103K", split="train")
trainer = GRPOTrainer(
model="Qwen/Qwen2-0.5B-Instruct",
reward_funcs=accuracy_reward,
train_dataset=dataset,
)
trainer.train()Transformers integration
TRL extends Transformers APIs and adds method-specific settings.
TRL trainers build on Trainer. Method-specific trainers like GRPOTrainer add generation, reward scoring, and loss computation. Config classes extend TrainingArguments with method-specific fields.
Model loading uses AutoConfig.from_pretrained(), then instantiates the model class from the config with that class’
from_pretrained.
Resources
- TRL docs
- Fine Tuning with TRL talk