CheeseBench: Evaluating Large Language Models on Rodent Behavioral Neuroscience Paradigms Paper • 2604.10825 • Published 9 days ago • 1