Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Stas Bekman
stas
32
3
1
Follow
jaesun's profile picture
ngoquanghuy's profile picture
Aurelien-Morgan's profile picture
138 followers
·
4 following
https://stasosphere.com/machine-learning/
StasBekman
stas00
stasbekman
AI & ML interests
Toolmaker. Software creator, optimizer and harmonizer. Makes things work and fly at Snowflake AI Research Training LLM/RAG/Generative AI/Machine Learning/Scalability
Recent Activity
posted
an
update
about 2 hours ago
In parallel we announce a new open source repo: https://github.com/Snowflake-AI-Research/Arctic-Platform This is the framework for very fast RL (and future other optimizations rolled into it) It currently has all the code you need to use or integrate Arctic RL into RL frameworks, with SkyRL and Verl available and more framework integrations coming. Please kindly spread the word! Thank you!
posted
an
update
about 2 hours ago
After many months of intense work the Snowflake AI Research team is happy to present to you the new open source project: Arctic RL https://snowflake.com/en/blog/engineering/arctic-rl-open-source-backend/ - Arctic RL integrates with VeRL and SkyRL today; enable ZoRRo with one config flag, no code changes required - ZoRRo delivers up to 6x actor-update acceleration and a 3.5x end-to-end training speedup, reducing Arctic-Text2SQL-R2 training from ~5 days to ~36 hours on 32 H200 GPUs - Arctic-Text2SQL-R2 achieved higher accuracy scores (48.7) than Gemini 3.1 Pro (47.9) and Claude 4.7 (47.3) on Snowflake's evaluated enterprise SQL benchmark under the tested conditions - Two open source recipes ship with this release: a text-to-SQL recipe that improved BIRD dev accuracy from 59.92% to 70.35%, and a multi-hop QA recipe that improved average accuracy from 69.6% to 72.3%
posted
an
update
13 days ago
PSA for DeepSpeed users - a long outstanding precision-related critical bug has been identified and fixed in https://github.com/deepspeedai/DeepSpeed/pull/8066 and a new release has been made. The issue was about mixed precision mode downcasting buffers that had to be in fp32 - massively impacting correctness due to large static buffers - e.g. RoPE in Qwen3 models when using long sequence lengths 32K+. Hopefully this fix brings Deepspeed to a close parity with FSDP2 which has been an issue since a long time. You can still have the old behavior but you'd now need to manually configure it - by default the model's buffers will now remain in the original precision. Please install deepspeed==0.19.2 which will do the right thing. Thanks to Tunji Ruwase and Claude Opus 4.8 via Cursor for identifying and fixing the problem.
View all activity
Organizations
stas
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a Space
8 months ago
Running
Agents
12
Gpu Tflop Finder
😻
12
Get the TFLOPs for your GPU quickly