Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars Paper • 2602.01538 • Published 23 days ago • 15
FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation Paper • 2601.13976 • Published Jan 20 • 21
MVInverse: Feed-forward Multi-view Inverse Rendering in Seconds Paper • 2512.21003 • Published Dec 24, 2025 • 2
Sharp Monocular View Synthesis in Less Than a Second Paper • 2512.10685 • Published Dec 11, 2025 • 28
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published Jul 14, 2025 • 51