Such a simple and ingenious method to isolate reasoning from memorization in LLMs.
Performance of reasoning models drop significantly evaluated based on multiple choice questions in which the correct answer was replaced with 'None of the others'
arxiv.org/abs/2502.12896
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks
In LLM evaluations, reasoning is often distinguished from recall/memorization by performing numerical variations to math-oriented questions. Here we introduce a general variation method for multiple-c...