EvasionBench: Detecting Evasive Answers in Financial Q&A via Multi-Model Consensus and LLM-as-Judge Paper • 2601.09142 • Published 28 days ago • 10
compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data Paper • 2602.06669 • Published 4 days ago • 6