Ji Xie's picture

On Vacation 🏝️

Ji Xie PRO

sanaka87

·

https://horizonwind2004.github.io/

AI & ML interests

Generative Model

Recent Activity

liked a dataset 9 days ago

Marlo-Z/SegLLM_dataset

reacted to their post with 🔥 13 days ago

🚀 Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)! We’re excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4× video length extrapolation, trained with only 50k video pairs. 🔥 🔍 What makes VideoCoF different? 🧠 Chain-of-Frames reasoning , mimic human thinking process like Seeing → Reasoning → Editing to apply edits accurately over time without external masks, ensuring physically plausible results. 📈 Strong length generalization — trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4×). 🎯 Unified fine-grained editing — Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control. ⚡ Fast inference update 🚀 H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use. 🔗 Links 📄 Paper: https://arxiv.org/abs/2512.07469 💻 Code: https://github.com/knightyxp/VideoCoF 🤗 Demo: https://huggingface.co/spaces/XiangpengYang/VideoCoF 🧩 Models: https://huggingface.co/XiangpengYang/VideoCoF 🌐 Project Page: https://videocof.github.io/ #VideoEditing #DiffusionModels #GenerativeAI #ComputerVision #AI

posted an update 14 days ago

🚀 Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)! We’re excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4× video length extrapolation, trained with only 50k video pairs. 🔥 🔍 What makes VideoCoF different? 🧠 Chain-of-Frames reasoning , mimic human thinking process like Seeing → Reasoning → Editing to apply edits accurately over time without external masks, ensuring physically plausible results. 📈 Strong length generalization — trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4×). 🎯 Unified fine-grained editing — Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control. ⚡ Fast inference update 🚀 H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use. 🔗 Links 📄 Paper: https://arxiv.org/abs/2512.07469 💻 Code: https://github.com/knightyxp/VideoCoF 🤗 Demo: https://huggingface.co/spaces/XiangpengYang/VideoCoF 🧩 Models: https://huggingface.co/XiangpengYang/VideoCoF 🌐 Project Page: https://videocof.github.io/ #VideoEditing #DiffusionModels #GenerativeAI #ComputerVision #AI

View all activity

Organizations

None yet

sanaka87 's models 9

sanaka87/3DIS

Text-to-Image • Updated 24 days ago • 65 • 7

sanaka87/Show-o-RecA

Text-to-Image • Updated Nov 13 • 15 • 3

sanaka87/Show-o-512x512-RecA

Any-to-Any • Updated Nov 13 • 13 • 2

sanaka87/BAGEL-RecA

Any-to-Any • Updated Nov 13 • 70 • 26

sanaka87/Harmon-0.5B-RecA

Text-to-Image • Updated Nov 13 • 15 • 4

sanaka87/Harmon-1.5B-RecA

Any-to-Any • Updated Nov 13 • 13 • 2

sanaka87/Harmon-1.5B-RecA-plus

Text-to-Image • Updated Nov 13 • 18 • 3

sanaka87/OpenUni-RecA

Any-to-Any • Updated Sep 11 • 22 • 1

sanaka87/ICEdit-MoE-LoRA

Image-to-Image • Updated May 2 • 328 • 118