Thanks! I really love your phrasing of 'refusing to hallucinate success' β that's exactly the mindset we aimed for. Glad the philosophy resonates!
Xiangpeng Yang PRO
XiangpengYang
AI & ML interests
diffusion models, video generaiton, video editing
Recent Activity
replied to
their
post
10 days ago
π Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)!
Weβre excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4Γ video length extrapolation, trained with only 50k video pairs. π₯
π What makes VideoCoF different?
π§ Chain-of-Frames reasoning , mimic human thinking process like Seeing β Reasoning β Editing to apply edits accurately over time without external masks, ensuring physically plausible results.
π Strong length generalization β trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4Γ).
π― Unified fine-grained editing β Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control.
β‘ Fast inference update
π H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use.
π Links
π Paper: https://arxiv.org/abs/2512.07469
π» Code: https://github.com/knightyxp/VideoCoF
π€ Demo: https://huggingface.co/spaces/XiangpengYang/VideoCoF
π§© Models: https://huggingface.co/XiangpengYang/VideoCoF
π Project Page: https://videocof.github.io/
#VideoEditing #DiffusionModels #GenerativeAI #ComputerVision #AI
replied to
their
post
10 days ago
π Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)!
Weβre excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4Γ video length extrapolation, trained with only 50k video pairs. π₯
π What makes VideoCoF different?
π§ Chain-of-Frames reasoning , mimic human thinking process like Seeing β Reasoning β Editing to apply edits accurately over time without external masks, ensuring physically plausible results.
π Strong length generalization β trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4Γ).
π― Unified fine-grained editing β Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control.
β‘ Fast inference update
π H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use.
π Links
π Paper: https://arxiv.org/abs/2512.07469
π» Code: https://github.com/knightyxp/VideoCoF
π€ Demo: https://huggingface.co/spaces/XiangpengYang/VideoCoF
π§© Models: https://huggingface.co/XiangpengYang/VideoCoF
π Project Page: https://videocof.github.io/
#VideoEditing #DiffusionModels #GenerativeAI #ComputerVision #AI
updated
a Space
10 days ago
XiangpengYang/VideoCoF