Muratcan Laloğlu
muratcanlaloglu
·
AI & ML interests
None yet
Recent Activity
reacted
to
Hellohal2064's
post
with 🔥
about 21 hours ago
🚀 Excited to share: The vLLM container for NVIDIA DGX Spark!
I've been working on getting vLLM to run natively on the new DGX Spark with its GB10 Blackwell GPU (SM121 architecture). The results? 2.5x faster inference compared to llama.cpp!
📊 Performance Highlights:
• Qwen3-Coder-30B: 44 tok/s (vs 21 tok/s with llama.cpp)
• Qwen3-Next-80B: 45 tok/s (vs 18 tok/s with llama.cpp)
🔧 Technical Challenges Solved:
• Built PyTorch nightly with CUDA 13.1 + SM121 support
• Patched vLLM for Blackwell architecture
• Created custom MoE expert configs for GB10
• Implemented TRITON_ATTN backend workaround
📦 Available now:
• Docker Hub: docker pull hellohal2064/vllm-dgx-spark-gb10:latest
• HuggingFace: huggingface.co/Hellohal2064/vllm-dgx-spark-gb10
The DGX Spark's 119GB unified memory opens up possibilities for running massive models locally. Happy to connect with others working on the DGX Spark Blackwell!
liked
a model
21 days ago
moondream/md3p-int4
liked
a model
about 1 month ago
cyankiwi/Qwen3-VL-32B-Instruct-AWQ-4bit
Organizations
None yet