Hierarchical Dataset Selection for High-Quality Data Sharing Paper • 2512.10952 • Published 11 days ago • 1
Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems Paper • 2512.11150 • Published 11 days ago • 4
Skywork-Reward-V2 Collection Scaling preference data curation to the extreme • 9 items • Updated Jul 4 • 25
Reward Models 10-2025 Collection A collection of great reward models for research and production • 7 items • Updated 6 days ago • 12
Olmo 3 Pre-training Collection All artifacts related to Olmo 3 pre-training • 10 items • Updated 13 days ago • 30
view article Article ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases Nov 5 • 57
OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation Paper • 2511.13655 • Published Nov 17 • 9
view article Article The Heterogeneous Feature of RoPE-based Attention in Long-Context LLMs Nov 15 • 12
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls Paper • 2511.09148 • Published Nov 12 • 16
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs Paper • 2511.07419 • Published Nov 10 • 25
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper • 2511.04962 • Published Nov 7 • 52