🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 189
view article Article LoRA training scripts of the world, unite! linoyts, multimodalart • Jan 2, 2024 • 79
view article Article Improving Parquet Dedupe on Hugging Face Hub yuchenglow, seanses • Oct 5, 2024 • 41
view article Article Introducing BERTopic Integration with the Hugging Face Hub MaartenGr, davanstrien • May 31, 2023 • 10
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community +1 Leyo, HugoLaurencon, VictorSanh • Apr 15, 2024 • 191
view article Article Fine-Tuning Gemma Models in Hugging Face +2 svaibhav, alanwaketan, ybelkada, ArthurZ • Feb 23, 2024 • 46
view article Article SmolLM - blazingly fast and remarkably powerful +1 loubnabnl, anton-l, eliebak • Jul 16, 2024 • 455
view article Article Docmatix - a huge dataset for Document Visual Question Answering andito, HugoLaurencon • Jul 18, 2024 • 78
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models +1 loubnabnl, anton-l, davanstrien • Mar 20, 2024 • 113
view article Article Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality +8 evijit, frimelle, yjernite, meg, irenesolaiman, dvilasuero, fdaudens, BrigitteTousi, giadap, sasha • Jun 24, 2024 • 34
view article Article Experimenting with Automatic PII Detection on the Hub using Presidio +2 lhoestq, meg, presidio, omri374 • Jul 10, 2024 • 26
view article Article Announcing New Dataset Search Features +1 lhoestq, severo, kramp • Jul 8, 2024 • 23
view article Article How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o chilijung • May 31, 2024 • 11
view article Article Synthetic dataset generation techniques: generating custom sentence similarity data davanstrien • May 23, 2024 • 16