view article Article The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator 8 days ago • 32
view post Post 2442 Instead of architectural upgade, each major model drop nowadays perfects a regional innovation. What Kimi brought to spot light this time is quantization aware training (QAT). I wrote an article to explain it and why it matters to reasoning models.https://huggingface.co/blog/onekq/qat-bonsaiIf you are interested in this kind of posts, I will introduce the Muon optimizers, another technology behind Kimi success. See translation 🧠 3 3 👍 1 1 + Reply
view reply Thank you for an interesting study. Do you have plans to create encoder-decoder models from these?
Unsloth Dynamic 2.0 Quants Collection New 2.0 version of our Dynamic GGUF + Quants. Dynamic 2.0 achieves superior accuracy & SOTA quantization performance. • 66 items • Updated about 23 hours ago • 279