9 32 22

Zhongang Cai

caizhongang

http://caizhongang.com/

AI & ML interests

Multimodal, Video Reasoning, Spatial Intelligence, Virtual Humans.

Recent Activity

liked a dataset 5 days ago

Video-Reason/VBVR-Bench-Data

liked a model 6 days ago

sensenova/SenseNova-SI-1.5-InternVL3-8B

upvoted a paper 8 days ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

View all activity

Organizations

liked a dataset 5 days ago

Video-Reason/VBVR-Bench-Data

Viewer • Updated 15 days ago • 500 • 1.01k • 9

liked a model 6 days ago

sensenova/SenseNova-SI-1.5-InternVL3-8B

Image-Text-to-Text • 8B • Updated 14 days ago • 244 • 2

upvoted a paper 8 days ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published 10 days ago • 232

upvoted a paper 9 days ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published 10 days ago • 40

liked a model 18 days ago

sensenova/SenseNova-SI-1.4-InternVL3-8B

Image-Text-to-Text • 8B • Updated 20 days ago • 1.74k • 3

authored a paper 23 days ago

Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

Paper • 2603.19227 • Published 27 days ago • 42

upvoted 2 papers 27 days ago

Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

Paper • 2603.19227 • Published 27 days ago • 42

MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

Paper • 2603.19231 • Published 27 days ago • 36

upvoted a paper 29 days ago

Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation

Paper • 2603.16669 • Published 29 days ago • 70

authored a paper 29 days ago

Demystifing Video Reasoning

Paper • 2603.16870 • Published 29 days ago • 369

upvoted a paper 29 days ago

Demystifing Video Reasoning

Paper • 2603.16870 • Published 29 days ago • 369

upvoted a paper 30 days ago

HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions

Paper • 2603.15612 • Published about 1 month ago • 152

upvoted 2 papers about 1 month ago

ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors

Paper • 2603.04338 • Published Mar 4 • 24

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Paper • 2603.03241 • Published Mar 3 • 87

liked a dataset about 2 months ago

Video-Reason/VBVR-Dataset

Viewer • Updated 15 days ago • 1M • 1.56k • 50

liked a Space about 2 months ago

VBVR Bench Leaderboard

🥇

Leaderboard for VBVR-Bench

authored 3 papers about 2 months ago

liked a model about 2 months ago

Video-Reason/VBVR-Wan2.2

Image-to-Video • Updated about 19 hours ago • 204 • 127

Zhongang Cai

AI & ML interests

Recent Activity

Organizations

caizhongang's activity

VBVR Bench Leaderboard