Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking Paper • 2602.21196 • Published 3 days ago • 3
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published 16 days ago • 185
PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation Paper • 2507.16116 • Published Jul 22, 2025 • 13
DuoGen: Towards General Purpose Interleaved Multimodal Generation Paper • 2602.00508 • Published 27 days ago • 4
DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers Paper • 2602.16968 • Published 9 days ago • 11
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published 14 days ago • 43
Optimizing Few-Step Generation with Adaptive Matching Distillation Paper • 2602.07345 • Published 20 days ago • 9
Geometry-Aware Rotary Position Embedding for Consistent Video World Model Paper • 2602.07854 • Published 19 days ago • 10
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper • 2602.14041 • Published 12 days ago • 50
DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels Paper • 2602.11715 • Published 15 days ago • 5
Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception Paper • 2602.11858 • Published 15 days ago • 58
T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization Paper • 2602.12262 • Published 15 days ago • 8
PISCO: Precise Video Instance Insertion with Sparse Control Paper • 2602.08277 • Published 18 days ago • 11