Submitted by Yang 6 Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models University of Science and Technology Beijing 6 2