MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Paper ⢠2509.25531 ⢠Published Sep 29, 2025 ⢠9
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Paper ⢠2509.25531 ⢠Published Sep 29, 2025 ⢠9
Survivor Library Books - OCR Collection Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs ⢠2 items ⢠Updated Jul 14, 2025 ⢠5
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs Paper ⢠2507.02778 ⢠Published Jul 3, 2025 ⢠9
EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition Paper ⢠2505.20033 ⢠Published May 26, 2025 ⢠4
EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection Paper ⢠2506.09827 ⢠Published Jun 11, 2025 ⢠21