PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation Paper • 2512.24551 • Published 4 days ago • 17
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published 17 days ago • 90
DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation Paper • 2512.21252 • Published 10 days ago • 32
StoryMem: Multi-shot Long Video Storytelling with Memory Paper • 2512.19539 • Published 12 days ago • 17
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry Paper • 2512.19629 • Published 12 days ago • 25
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges Paper • 2512.11362 • Published 23 days ago • 21
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience Paper • 2512.17260 • Published 16 days ago • 48
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence Paper • 2512.16793 • Published 16 days ago • 72
Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward Paper • 2512.16912 • Published 16 days ago • 10
The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text Paper • 2512.16924 • Published 16 days ago • 25
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 16 days ago • 82
LLaDA2.0: Scaling Up Diffusion Language Models to 100B Paper • 2512.15745 • Published 25 days ago • 78
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition Paper • 2512.15603 • Published 17 days ago • 58
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models Paper • 2512.13607 • Published 19 days ago • 27