This repository is dedicated to compiling a comprehensive and balanced dataset tailored to match the mathematics curriculum for 10-year-olds. Our goal is to provide a robust resource that supports the development and evaluation of educational tools like chatbots, which can effectively answer math questions tailored to elementary students.
We followed a methodical approach to construct this dataset:
-
Classification of Math Questions: We categorized questions to allow for targeted educational interventions and specialized solution techniques.
-
Search for Suitable Datasets: We identified datasets for each category, ensuring they align with the learning level and content requirements of elementary school math.
-
Compilation and Sampling: We combined and randomly sampled from exisitng datasets to create a diverse collection that accurately mirrors the types of math challenges faced by 10-year-olds.
Our dataset categorizes mathematical questions into three groups:
- Arithmetic: Includes basic calculations, measurement conversions, and more. (E.g. "What is 1.20m in mm?" or "What is 12+8?")
- Word Problems: Engages students with real-world scenarios requiring mathematical solutions. (E.g. "If you split 50$ equally among 5 people, how much does each get?")
- Geometry: Focuses on shape, space, and measurement problems suitable for young learners. (E.g. "What is the volume of a sphere with radius 6 cm?")
These categories where inspired by Ahn et al., 2024.
We adhered to rigorous criteria to ensure the dataset's relevance and quality:
- Language: English-only to maintain consistency across data.
- Educational Level: Suitable for elementary school math levels.
- Content Type: Focused exclusively on text-based datasets, avoiding any that include pictures or additional multimedia sources to ensure straightforward analysis.
- Single Float Answer: The answer to each question is a single float value. This ensures easy evaluation.
The dataset is a collection of .json
objects. Each object has the following format:
{
"category": "Arithmetic | Word Problems | Gemoetry",
"subcategory": "<divided category further>",
"question": "<question>",
"answer": "<answer as float>",
"reasoning": "(optional) <can be an equation, python program, etc.>",
"source": "<source dataset name>"
}
Example Entry (from the SVAMP dataset):
{
"category": "Word Problem",
"subcategory": "challenge",
"question": "Dan had $ 3 left with him after he bought a candy bar. If he had $ 4 at the start, how much did the candy bar cost?",
"answer": 1.0,
"reasoning": "( 4.0 - 3.0 )",
"source": "SVAMP"
}
Some of files were to big for GitHub, therefore: All data files and scripts are hosted on our Google Drive, available for download here. To conserve space and facilitate easier downloading, all resources are compiled into a data.zip
file. The contents of the zip file are organized as follows:
original-data:
Contains the original datasets in various formats as they were collected.original-data-transformed:
Includes datasets that have been transformed into the unified structure described in our documentation.
The datasets are versioned into three types for each category:
<category>_complete.csv/json
: Provides the comprehensive dataset, ideal for extensive analysis.<category>_1000.csv/json
: A balanced sample of 1000 items, perfect for in-depth testing.<category>_100.csv/json
: A smaller sample of 100 items, designed for quick assessments.
For ease of access, the complete datasets for smaller samples (<category>_1000.csv/json
and <category>_100.csv/json
) are directly accessible within this repository in the data directory.
We have utilized the DeepL API
for high-quality, free translations of selected datasets into German. For access to the data and more details about the translation process, please visit the translation-to-german
folder.
This table gives an overview of the different dataset versions.
Section | Name | Number of subcategories | Size |
---|---|---|---|
I. Arithmetic | arithmetic_complete |
14 | 7,731,654 |
arithmetic_1000 |
14 | 1,000 | |
arithmetic_100 |
14 | 100 | |
II. Word Problems | wordProblems_complete |
3 | 1,995 |
wordProblems_1000 |
3 | 1,000 | |
wordProblems_100 |
3 | 100 | |
III. Geometry | geometry_complete |
1 | 1,698 |
geometry_1000 |
1 | 1,000 | |
geometry_100 |
1 | 100 |
A nice overview of all available datasets in the mathematical domain can be found in Lu et al, 2023 and in Ahn et al., 2024.
In constructing this dataset, we made a concerted effort to include a comprehensive range of datasets that are best suited for the educational level and cognitive abilities of 10-year-olds. While we don't provide extensive details on the selection process for each dataset, our overarching goal was to incorporate as many relevant and suitable datasets as possible.
Source | Subcategory | Size | Example |
---|---|---|---|
Math-401 | arithmetic_mixed |
71 | log 10(797)= |
Mathematics Dataset (Google Deepmin) | add_or_sub |
71 | What is -6.5 + -1.5? |
add_sub_multiple |
71 | Calculate -4 + 0 - ((-3 - -1) + 7). | |
conversion |
71 | What is three eighths of a kilogram in grams? | |
div |
71 | Calculate -238 divided by -3. | |
div_remainder |
73 | What is the remainder when 255 is divided by 20? | |
gcd |
72 | What is the highest common divisor of 75 and 390? | |
lcm |
72 | Calculate the lowest common multiple of 1355 and 80. | |
mul |
72 | Multiply -0.0756 and 0.14. | |
mul_div_multiple |
71 | Evaluate 2/(-6)*(-120)/(-80). | |
place_value |
71 | What is the tens digit of 5546? | |
round_number |
71 | Round 4117.6 to the nearest 10. | |
sequence_next_term |
72 | What comes next: -75, -80, -85, -90? | |
time |
71 | How many minutes are there between 1:03 PM and 9:11 PM? | |
Source | Subcategory | Size | Example |
SVAMP | challenge |
334 | At the arcade Edward won 9 tickets. If he spent 4 tickets on a beanie and later won 4 more tickets, how many would he have? |
AddSub | add_sub |
333 | Tim has 44 books. Sam has 52 books. How many books do they have together? |
MultiArith | multi_step |
333 | Roger had 25 books. If he sold 21 of them and used the money he earned to buy 30 new books, how many books would Roger have? |
Source | Subcategory | Size | Example |
MathQA Geometry | geometry |
1000 | Find the surface area of a 8 cm x 6 cm x 2 cm brick |
Overview of existing datasets in the mathematical domain:
- P. Lu, L. Qiu, W. Yu, S. Welleck, and K.-W. Chang, “A Survey of Deep Learning for Mathematical Reasoning.” arXiv, Jun. 21, 2023. Accessed: May 02, 2024. [Online]. Available: http://arxiv.org/abs/2212.10535
- J. Ahn, R. Verma, R. Lou, D. Liu, R. Zhang, and W. Yin, “Large Language Models for Mathematical Reasoning: Progresses and Challenges.” arXiv, Apr. 05, 2024. doi: 10.48550/arXiv.2402.00157.
- W. Liu et al., “Mathematical Language Models: A Survey.” arXiv, Feb. 23, 2024. Accessed: May 02, 2024. [Online]. Available: http://arxiv.org/abs/2312.07622
References for the used datasets we sampled from:
- Math-401: W. Liu et al., “Mathematical Language Models: A Survey.” arXiv, Feb. 23, 2024. Accessed: May 02, 2024. [Online]. Available: http://arxiv.org/abs/2312.07622
- Mathematics Dataset: D. Saxton, E. Grefenstette, F. Hill, and P. Kohli, “Analysing Mathematical Reasoning Abilities of Neural Models.” arXiv, Apr. 02, 2019. doi: 10.48550/arXiv.1904.01557.
- SVAMP: A. Patel, S. Bhattamishra, and N. Goyal, “Are NLP Models really able to Solve Simple Math Word Problems?” arXiv, Apr. 15, 2021. doi: 10.48550/arXiv.2103.07191.
- AddSub: M. J. Hosseini, H. Hajishirzi, O. Etzioni, and N. Kushman, “Learning to Solve Arithmetic Word Problems with Verb Categorization,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), A. Moschitti, B. Pang, and W. Daelemans, Eds., Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 523–533. doi: 10.3115/v1/D14-1058.
- MultiArith: S. Roy and D. Roth, “Solving General Arithmetic Word Problems.” arXiv, Aug. 20, 2016. doi: 10.48550/arXiv.1608.01413.
- MathQA Geometry: A. Amini, S. Gabriel, S. Lin, R. Koncel-Kedziorski, Y. Choi, and H. Hajishirzi, “MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds., Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 2357–2367. doi: 10.18653/v1/N19-1245.