University of Science and Technology of China

community

https://en.ustc.edu.cn/

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

nielsr submitted a paper 17 days ago

VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

nielsr submitted a paper 22 days ago

Causal-JEPA: Learning World Models through Object-Level Latent Interventions

nielsr submitted a paper about 1 month ago

UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders

View all activity

Papers

UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision

MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting

View all Papers

nielsr

submitted a paper to Daily Papers 17 days ago

VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

Paper • 2602.17807 • Published 20 days ago • 6

nielsr

submitted a paper to Daily Papers 22 days ago

Causal-JEPA: Learning World Models through Object-Level Latent Interventions

Paper • 2602.11389 • Published 28 days ago • 5

nielsr

submitted a paper to Daily Papers about 1 month ago

UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders

Paper • 2601.17950 • Published Jan 25 • 4

nielsr

submitted a paper to Daily Papers about 2 months ago

TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration

Paper • 2601.04544 • Published Jan 8 • 6

nielsr

submitted a paper to Daily Papers 3 months ago

CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published Dec 22, 2025 • 12

ariG23498

authored a paper 5 months ago

FineVision: Open Data Is All You Need

Paper • 2510.17269 • Published Oct 20, 2025 • 76

merve

posted an update 5 months ago

Post

9603

deepseek-ai/DeepSeek-OCR is out! 🔥 my take ⤵️
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages

4 replies

merve

posted an update 6 months ago

Post

6929

large AI labs open-sourced a ton of models last week 🔥
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 🤝
> IBM released a new Docling model with 258M params based on Granite (A2.0) 📝 ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana 🍌 (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset 💻 OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash 💭 meituan-longcat/LongCat-Flash-Thinking

2 replies

merve

posted an update 6 months ago

Post

3483

IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face 🔥

> not only a document converter but also can do document question answering, understand multiple languages 🤯
> best part: released with Apache 2.0 license 👏 use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! 🤗
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo 💗

merve

posted an update 6 months ago

Post

1248

a ton of image/video generation models and LLMs from big labs 🔥

> Meta released facebook/mobilellm-r1-68c4597b104fac45f28f448e, smol LLMs for on-device use 💬
> Tencent released tencent/SRPO, high res image generation model and tencent/POINTS-Reader, cutting edge OCR 📝
> ByteDance released bytedance-research/HuMo, video generation from any input ⏯️

find more models, datasets, demos here merve/sep-11-releases-68c7dbfa26bea8cd921fa0ac

merve

posted an update 6 months ago

Post

1046

fan-favorite vision LM Florence-2 is now officially supported in transformers 🤗

find all the models in

florence-community org 🫡

ariG23498

posted an update 6 months ago

Post

1870

New post is live!

This time we cover some major updates to transformers.

🤗

2 replies

merve

posted an update 6 months ago

Post

1849

past week was great for open LLMs 🔥 merve/sep-1-releases-68bede0e729c12597eefd050

> Google released google/embeddinggemma-300m, new embedding model with 300M params
> new update to Kimi-K2 just landed moonshotai/Kimi-K2-Instruct-0905 😍
> OpenBMB released a new version to MiniCPM with 8B params openbmb/MiniCPM4.1-8B

also soooo many Qwen-Image & Kontext LoRAs dropped!

merve

posted an update 6 months ago

Post

3735

upgrade your transformers 🔥
it comes with insanely capable models like merve/sam2-66ac9deac6fca3bc5482fe30, microsoft/kosmos-2.5, and more 🫡
I built a notebook you can run with free Colab T4 to walk through the API for new models 🙋🏻‍♀️ merve/smol-vision

fine-tuning will follow-up soon!

merve

posted an update 6 months ago

Post

6305

large AI labs have dropped so many open models last week 🔥 don't miss out on them

→ Apple released on-device vision LMs apple/fastvlm-68ac97b9cd5cacefdd04872e & apple/mobileclip2-68ac947dcb035c54bcd20c47
→ OpenGVLab released InternVL3.5, 32 new vision LMs with one based on gpt-oss! (OS) OpenGVLab/internvl35-68ac87bd52ebe953485927fb
→ MSFT released a killer small TTS model (OS) microsoft/VibeVoice-1.5B

find more herehttps://huggingface.co/collections/merve/august-29-releases-68b5a3754cfb8abf59e2b486

1 reply

merve

posted an update 7 months ago

Post

6091

first vision language model built off openai/gpt-oss-20b just dropped! 🔥

InternVL3.5 comes with 32 models 🤯 pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb
comes with gpt-oss or Qwen3 for LLM part ⤵️

1 reply

qubvel-hf

in ustc-community/dfine-nano-coco 7 months ago

Fix weights

#1 opened 7 months ago by

qubvel-hf

merve

posted an update 7 months ago

Post

3338

GPT-4.1-mini level model right in your iPhone 🤯

openbmb/MiniCPM-V-4 is only 4B while surpassing GPT-4.1-mini in vision benchmarks 🔥

allows commercial use as well!

merve

posted an update 7 months ago

Post

1200

we're all sleeping on this OCR model rednote-hilab/dots.ocr 🔥

dots.ocr is a new 3B model with sota performance, support for 100 languages & allowing commercial use! 🤯

single e2e model to extract image, convert tables, formula, and more into markdown 📝
try it MohamedRashad/Dots-OCR

merve

posted an update 7 months ago

Post

713

massive releases and tons of Flux 1. Krea LoRas past week!
here's some of the picks, find more models in collection 🫡 merve/releases-august-2-6890c14248203522b7d0267f

LLMs 💬
> Tencent dropped tencent/Hunyuan-7B-Instruct
> Qwen released Qwen/Qwen3-Coder-30B-A3B-Instruct, 30B MoE with 3B params for coding (OS)

vision/multimodal
> RedNote released rednote-hilab/dots.ocr - 3B OCR model (OS)
> Cohere released CohereLabs/command-a-vision-07-2025 - 112B (dense!) VLM for 6 languages
> StepFun-AI shipped stepfun-ai/step3 - 321B MoE VLM (OS)
> Skywork shipped Skywork/Skywork-UniPic-1.5B - new any-to-any model (image+text → image+text) (OS)

AI & ML interests

Recent Activity

Papers

Team members 5

ustc-community's activity

Fix weights