DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published 28 days ago • 241
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 246
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models Paper • 2506.16054 • Published Jun 19 • 60
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 273
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Paper • 2505.01658 • Published May 3 • 39
SageAttention2++: A More Efficient Implementation of SageAttention2 Paper • 2505.21136 • Published May 27 • 45
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Paper • 2505.09343 • Published May 14 • 73
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20 • 78
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training Paper • 2505.11594 • Published May 16 • 75
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder Paper • 2505.07916 • Published May 12 • 134