view article Article Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL +6 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, lvwerra, sergiopaniego • 6 days ago • 36
Nemotron-Labs-Diffusion Collection Set of models of internal diffusion models • 7 items • Updated 4 days ago • 46
CohereLabs/command-a-plus-05-2026-w4a4 Image-Text-to-Text • 126B • Updated 6 days ago • 16.1k • • 216
view article Article Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation nvidia • 15 days ago • 21
view article Article Unlocking asynchronicity in continuous batching +1 ror, pcuenq, ariG23498 • 19 days ago • 56
view article Article Building Blocks for Foundation Model Training and Inference on AWS amazon • 21 days ago • 23
view article Article CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models lablab-ai-amd-developer-hackathon • 25 days ago • 10
Mistral Medium 3.5 Collection Our first flaship models handling instruction-following, reasoning, and coding in a single set of opened-weights. • 2 items • Updated Apr 29 • 17