— Awesome RL datasets 📈 — ScaleAI/SWE-bench_Pro Viewer • Updated Sep 25, 2025 • 731 • 11.5k • 46 agentica-org/DeepScaleR-Preview-Dataset Viewer • Updated Feb 10, 2025 • 40.3k • 7.59k • 187 open-r1/DAPO-Math-17k-Processed Viewer • Updated Nov 10, 2025 • 34.8k • 5.81k • 54
Awesome RLHF A curated collection of datasets, models, Spaces, and papers on Reinforcement Learning from Human Feedback (RLHF). Running 202 MT Bench 📊 202 Compare AI model responses side-by-side garage-bAInd/Open-Platypus Viewer • Updated Jan 24, 2024 • 24.9k • 4.24k • 414 meta-llama/Llama-2-7b-chat-hf Text Generation • 7B • Updated Apr 17, 2024 • 457k • 4.69k meta-llama/Llama-2-70b-chat-hf Text Generation • 69B • Updated Apr 17, 2024 • 17.7k • 2.2k
— Long-context post-training 🧶 — Resources for post-training LLMs with long-context samples zai-org/LongAlign-10k Viewer • Updated Feb 22, 2024 • 9.89k • 706 • 82 HuggingFaceTB/smoltalk2 Viewer • Updated Oct 31, 2025 • 8.61M • 4.23k • 137 zai-org/LongReward-10k Viewer • Updated Oct 29, 2024 • 30k • 874 • 6 Tongyi-Zhiwen/DocQA-RL-1.6K Viewer • Updated May 23, 2025 • 3.6k • 178 • 41
Mistral 7B + UltraChat + Arithmo checkpoints A collection of Mistral 7B fine-tunes on UltraChat and Arithmo to boost the math capabilities of chat models. See https://x.com/_lewtun/status/1715652 lewtun/mistral-7b-sft-ultrachat-arithmo-full Text Generation • Updated Oct 21, 2023 • 4 • 1 lewtun/mistral-7b-sft-ultrachat-arithmo-50 Text Generation • Updated Oct 21, 2023 • 2 • 1 lewtun/mistral-7b-sft-ultrachat-arithmo-25 Text Generation • Updated Oct 21, 2023 • 1 openbmb/UltraChat Viewer • Updated Feb 22, 2024 • 949k • 3.81k • 469
Gemma RLAIF lewtun/gemma-7b-sft-full-ultrachat-v0 Text Generation • 9B • Updated Feb 29, 2024 • 5 • 1 lewtun/gemma-7b-sft-full-dolly-v3 Text Generation • 9B • Updated Feb 29, 2024 • 4 lewtun/gemma-7b-sft-full-deita-10k-v0 Text Generation • 9B • Updated Feb 29, 2024 • 3 lewtun/gemma-7b-dpo-full-ultrafeedback-v0 Text Generation • Updated Feb 29, 2024 • 2
— Awesome RL datasets 📈 — ScaleAI/SWE-bench_Pro Viewer • Updated Sep 25, 2025 • 731 • 11.5k • 46 agentica-org/DeepScaleR-Preview-Dataset Viewer • Updated Feb 10, 2025 • 40.3k • 7.59k • 187 open-r1/DAPO-Math-17k-Processed Viewer • Updated Nov 10, 2025 • 34.8k • 5.81k • 54
— Long-context post-training 🧶 — Resources for post-training LLMs with long-context samples zai-org/LongAlign-10k Viewer • Updated Feb 22, 2024 • 9.89k • 706 • 82 HuggingFaceTB/smoltalk2 Viewer • Updated Oct 31, 2025 • 8.61M • 4.23k • 137 zai-org/LongReward-10k Viewer • Updated Oct 29, 2024 • 30k • 874 • 6 Tongyi-Zhiwen/DocQA-RL-1.6K Viewer • Updated May 23, 2025 • 3.6k • 178 • 41
Awesome RLHF A curated collection of datasets, models, Spaces, and papers on Reinforcement Learning from Human Feedback (RLHF). Running 202 MT Bench 📊 202 Compare AI model responses side-by-side garage-bAInd/Open-Platypus Viewer • Updated Jan 24, 2024 • 24.9k • 4.24k • 414 meta-llama/Llama-2-7b-chat-hf Text Generation • 7B • Updated Apr 17, 2024 • 457k • 4.69k meta-llama/Llama-2-70b-chat-hf Text Generation • 69B • Updated Apr 17, 2024 • 17.7k • 2.2k
Mistral 7B + UltraChat + Arithmo checkpoints A collection of Mistral 7B fine-tunes on UltraChat and Arithmo to boost the math capabilities of chat models. See https://x.com/_lewtun/status/1715652 lewtun/mistral-7b-sft-ultrachat-arithmo-full Text Generation • Updated Oct 21, 2023 • 4 • 1 lewtun/mistral-7b-sft-ultrachat-arithmo-50 Text Generation • Updated Oct 21, 2023 • 2 • 1 lewtun/mistral-7b-sft-ultrachat-arithmo-25 Text Generation • Updated Oct 21, 2023 • 1 openbmb/UltraChat Viewer • Updated Feb 22, 2024 • 949k • 3.81k • 469
Gemma RLAIF lewtun/gemma-7b-sft-full-ultrachat-v0 Text Generation • 9B • Updated Feb 29, 2024 • 5 • 1 lewtun/gemma-7b-sft-full-dolly-v3 Text Generation • 9B • Updated Feb 29, 2024 • 4 lewtun/gemma-7b-sft-full-deita-10k-v0 Text Generation • 9B • Updated Feb 29, 2024 • 3 lewtun/gemma-7b-dpo-full-ultrafeedback-v0 Text Generation • Updated Feb 29, 2024 • 2