📄
Abstract - ReverseMath: Answer Inversion for Scalable and Verifiable Mathematical Problem Generation
Mathematical reasoning benchmarks are vital for evaluating large language models (LLMs), but many are static and repeatedly exposed through public evaluation and training pipelines, making it difficult to separate genuine reasoning from memorization. Meanwhile, manually constructing new math problems with reliable answers remains costly. We introduce ReverseMath, a scalable method for generating new math problems through answer inversion. Given a problem and its answer, ReverseMath masks a numerical value in the original problem, treats the original answer as a known condition, and rewrites the problem so that the masked value becomes the new answer. The generated problem reverses the original input-output relation, making its answer known by construction. We study ReverseMath for both evaluation and training. For evaluation, paired original/reversed problems reveal substantial behavioral shifts: models sometimes fail on reversed problems and even incorrectly output the original answer, suggesting memorization-like behavior. For training, ReverseMath provides automatically labeled reversed problems as data augmentation for reinforcement learning (RL). Experiments show that including ReverseMath-generated data improves mathematical reasoning performance across multiple benchmarks, demonstrating its value as both an analysis tool and a scalable source of verifiable training data.
反向数学:通过答案反转实现可扩展且可验证的数学问题生成 /
ReverseMath: Answer Inversion for Scalable and Verifiable Mathematical Problem Generation
1️⃣ 一句话总结
该论文提出了一种名为ReverseMath的自动化方法,通过将已有数学问题的答案隐藏并改写题目,来生成全新的、答案已知的数学问题,既能用于检测大模型是否真的理解了推理过程还是仅靠记忆作答,也能为模型训练提供大量可靠的练习数据,从而提升其数学推理能力。