Building on HF

Sergio Paniego PRO

sergiopaniego

onath's profile picture

SaymBasheer's profile picture

hawkh's profile picture

https://sergiopaniego.github.io/

sergiopaniego
sergiopaniego
sergio-paniego-blanco

AI & ML interests

None yet

Recent Activity

updated a collection about 5 hours ago

😎 Awesome vision Spaces

liked a Space about 5 hours ago

pcuenq/gemma-4-object-detection

upvoted an article about 5 hours ago

The PR you would have opened yourself

View all activity

Organizations

sergiopaniego 's collections 9

Bringing Autonomous Driving RL to OpenEnv and TRL resources

Blog: https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl/

Running on T4

RL

CARLA Environment Server

🚗

Control a Carla driving simulation with custom actions
Running on T4

RL

CARLA Environment Server

🚗

Control a CARLA driving simulator with custom actions
Sleeping

Agents

Carla Grpo Trolley

🚀

Visualize your program’s I/O activity in real time
sergiopaniego/Qwen3-0.6B-carla-trolley-escape

0.8B • Updated Feb 26 • 16

Amazing design resources

Running

111

HFBA

🤗

111

A collection of Huggies!
Running

14

HF Thumbnail Crafter

🎨

14

Create custom thumbnails for your videos

GUI Grounding datasets

rootsautomation/ScreenSpot

Viewer • Updated Apr 10, 2024 • 1.27k • 2.05k • 46
OS-Copilot/OS-Atlas-data

Updated Dec 4, 2024 • 2.64k • 43

👁 Vision comparison ftw

Spaces to compare vision models — there’s no single best model, only the best one for your specific use case.

Sleeping

Agents

41

comparevlms

🏃

41

Compare Vision Language Models
Running on Zero

Agents

66

OCR Time Machine

📚

66

Extract text from images and XML files using OCR models
Running

Agents

26

Compare Docvqa Models

🦀

26

Compare different visual question answering
Running on CPU Upgrade

Agents

23

Compare Clip Siglip

🏃

23

Compare strong zero-shot image classification models

Vision Language Models: 2025 Update

This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update

Qwen/Qwen2.5-Omni-7B

Any-to-Any • Updated Apr 30, 2025 • 474k • 1.89k
Running

Agents

Featured

371

Qwen2.5 Omni 7B Demo

🏆

371

Chat with AI using text, audio, images, and video
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Oct 5, 2025 • 116k • 1.29k

📝 Research & Long-Form Blog Posts

In-depth technical articles and research pieces published by Hugging Face

Running

3.79k

The Ultra-Scale Playbook

🌌

3.79k

The ultimate guide to training LLM on large GPU Clusters
Running on CPU Upgrade

Featured

3.11k

The Smol Training Playbook

📚

3.11k

The secrets to building world-class LLMs
Running

302

Evaluation Guidebook

📝

302

Explore LLM benchmark trends over time
Running

221

FineVision: Open Data is All You Need

📝

221

A new open-source dataset for training VLMs

Vision reasoning datasets

deepcs233/Visual-CoT

Preview • Updated Mar 11, 2025 • 2.27k • 56
lmms-lab/multimodal-open-r1-8k-verified

Viewer • Updated Jan 27, 2025 • 7.69k • 3.2k • 74
leonardPKU/GEOQA_R1V_Train_8K

Viewer • Updated Feb 11, 2025 • 8.03k • 178 • 14
leonardPKU/clevr_cogen_a_train

Viewer • Updated Feb 2, 2025 • 70k • 328 • 40

My vision Spaces

Vision Spaces created by me

Running on Zero

Agents

Featured

114

VLM Object Understanding

🦀

114

Explore object detection, visual grounding, keypoint Detecti
Running on Zero

Agents

4

VQA Autonomous Driving SmolVLM2

🌖

4

Visual Question Answering - Autonomous Driving - SmolVLM2

😎 Awesome vision Spaces

Spaces where I've collaborated or that I consider unique!

Sleeping

Agents

41

comparevlms

🏃

41

Compare Vision Language Models
Runtime error

Agents

4

Gemma3 License Plate Detection

📈

4

Gemma 3 for license plate detection
Running on Zero

Agents

Featured

142

Gemma 3n E4B It

⚡

142

Chat with a multimodal assistant using text, images, audio, or video
Running on Zero

Agents

Featured

38

Moondream3

🏢

38

Image and video tasks with moondream3.

Bringing Autonomous Driving RL to OpenEnv and TRL resources

Blog: https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl/

Running on T4

RL

CARLA Environment Server

🚗

Control a Carla driving simulation with custom actions
Running on T4

RL

CARLA Environment Server

🚗

Control a CARLA driving simulator with custom actions
Sleeping

Agents

Carla Grpo Trolley

🚀

Visualize your program’s I/O activity in real time
sergiopaniego/Qwen3-0.6B-carla-trolley-escape

0.8B • Updated Feb 26 • 16

📝 Research & Long-Form Blog Posts

In-depth technical articles and research pieces published by Hugging Face

Running

3.79k

The Ultra-Scale Playbook

🌌

3.79k

The ultimate guide to training LLM on large GPU Clusters
Running on CPU Upgrade

Featured

3.11k

The Smol Training Playbook

📚

3.11k

The secrets to building world-class LLMs
Running

302

Evaluation Guidebook

📝

302

Explore LLM benchmark trends over time
Running

221

FineVision: Open Data is All You Need

📝

221

A new open-source dataset for training VLMs

Amazing design resources

Running

111

HFBA

🤗

111

A collection of Huggies!
Running

14

HF Thumbnail Crafter

🎨

14

Create custom thumbnails for your videos

Vision reasoning datasets

deepcs233/Visual-CoT

Preview • Updated Mar 11, 2025 • 2.27k • 56
lmms-lab/multimodal-open-r1-8k-verified

Viewer • Updated Jan 27, 2025 • 7.69k • 3.2k • 74
leonardPKU/GEOQA_R1V_Train_8K

Viewer • Updated Feb 11, 2025 • 8.03k • 178 • 14
leonardPKU/clevr_cogen_a_train

Viewer • Updated Feb 2, 2025 • 70k • 328 • 40

GUI Grounding datasets

rootsautomation/ScreenSpot

Viewer • Updated Apr 10, 2024 • 1.27k • 2.05k • 46
OS-Copilot/OS-Atlas-data

Updated Dec 4, 2024 • 2.64k • 43

My vision Spaces

Vision Spaces created by me

Running on Zero

Agents

Featured

114

VLM Object Understanding

🦀

114

Explore object detection, visual grounding, keypoint Detecti
Running on Zero

Agents

4

VQA Autonomous Driving SmolVLM2

🌖

4

Visual Question Answering - Autonomous Driving - SmolVLM2

👁 Vision comparison ftw

Spaces to compare vision models — there’s no single best model, only the best one for your specific use case.

Sleeping

Agents

41

comparevlms

🏃

41

Compare Vision Language Models
Running on Zero

Agents

66

OCR Time Machine

📚

66

Extract text from images and XML files using OCR models
Running

Agents

26

Compare Docvqa Models

🦀

26

Compare different visual question answering
Running on CPU Upgrade

Agents

23

Compare Clip Siglip

🏃

23

Compare strong zero-shot image classification models

😎 Awesome vision Spaces

Spaces where I've collaborated or that I consider unique!

Sleeping

Agents

41

comparevlms

🏃

41

Compare Vision Language Models
Runtime error

Agents

4

Gemma3 License Plate Detection

📈

4

Gemma 3 for license plate detection
Running on Zero

Agents

Featured

142

Gemma 3n E4B It

⚡

142

Chat with a multimodal assistant using text, images, audio, or video
Running on Zero

Agents

Featured

38

Moondream3

🏢

38

Image and video tasks with moondream3.

Vision Language Models: 2025 Update

This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update

Qwen/Qwen2.5-Omni-7B

Any-to-Any • Updated Apr 30, 2025 • 474k • 1.89k
Running

Agents

Featured

371

Qwen2.5 Omni 7B Demo

🏆

371

Chat with AI using text, audio, images, and video
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Oct 5, 2025 • 116k • 1.29k

Sergio Paniego PRO

AI & ML interests

Recent Activity

Organizations

sergiopaniego 's collections 9

CARLA Environment Server

CARLA Environment Server

Carla Grpo Trolley

HFBA

HF Thumbnail Crafter

comparevlms

OCR Time Machine

Compare Docvqa Models

Compare Clip Siglip

Qwen2.5 Omni 7B Demo

The Ultra-Scale Playbook

The Smol Training Playbook

Evaluation Guidebook

FineVision: Open Data is All You Need

VLM Object Understanding

VQA Autonomous Driving SmolVLM2

comparevlms

Gemma3 License Plate Detection

Gemma 3n E4B It

Moondream3

CARLA Environment Server

CARLA Environment Server

Carla Grpo Trolley

The Ultra-Scale Playbook

The Smol Training Playbook

Evaluation Guidebook

FineVision: Open Data is All You Need

HFBA

HF Thumbnail Crafter

VLM Object Understanding

VQA Autonomous Driving SmolVLM2

comparevlms

OCR Time Machine

Compare Docvqa Models

Compare Clip Siglip

comparevlms

Gemma3 License Plate Detection

Gemma 3n E4B It

Moondream3

Qwen2.5 Omni 7B Demo