Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation Paper • 2601.22813 • Published 3 days ago • 40
WUSH: Near-Optimal Adaptive Transforms for LLM Quantization Paper • 2512.00956 • Published Nov 30, 2025 • 23
TiDAR: Think in Diffusion, Talk in Autoregression Paper • 2511.08923 • Published Nov 12, 2025 • 126
view article Article A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons Feb 4, 2025 • 28
Running on CPU Upgrade Featured 2.95k The Smol Training Playbook 📚 2.95k The secrets to building world-class LLMs
ISTA-DASLab/Qwen3-30B-A3B-Instruct-2507-W4A4-mxfp4-gptq-hadamard-transform 17B • Updated Nov 5, 2025 • 11
ISTA-DASLab/Qwen3-30B-A3B-Instruct-2507-W4A4-mxfp4-gptq-identity-transform 17B • Updated Nov 5, 2025 • 2