Data Documentation for the paper titled as “Automatic Debugging of Design Faults in MapReduce applications” published in IEEE Transactions on Software Engineering. Link: https://doi.org/10.1109/TSE.2024.3369766 General Information: It contains both the test cases used in the evaluation and the statistical analysis to reproduce the experiments. Name of dataset: Supplemental material for “Automatic Debugging of Design Faults in MapReduce applications” Name of data files in data set: The supplemental material contains the following files: 1_testCases.zip: all test cases randomly generated for the experiments. The description of the test cases is in ./1_testCases/README.txt 2_executionTestCases.zip: the aggregated data obtained after the execution of the test cases in the debugging techniques: fault localization technique (MRDebug-FL), input reduction technique (MRDebug-IR) and the combination of both techniques (MRDebug-IR-FL). The folder contains csv with the results the experimentation unit, and they are detailed in the ./2_executionTestCases/README.pdf file. 3_notebook.zip: jupyter notebook that contains the analysis done in the experiments. This notebook allows the interactive execution of statistical test and plots. The instructions to install the notebook are in the file 3_notebook.zip/installation.txt. Dataset language: CSV format processed with the R programming language Date the data set was last modified: 08/09/23 Funder: This work was supported in part by the project PID2019-105455GB-C32 funded by MCIN/AEI/10.13039/501100011033 (Spain), project PID2022-137646OB-C32 funded by MCIN/AEI/10.13039/501100011033/FEDER, UE, and the project RDS_2022-2024_2.1_Progetto_CYBER funded by MASE/PTR_22_24_INT_2_1 (Italy). How to cite data: J. Morán, A. Bertolino, C. de la Riva and J. Tuya, "Automatic Debugging of Design Faults in MapReduce Applications," in IEEE Transactions on Software Engineering, vol. 50, no. 4, pp. 956-978, April 2024, doi: 10.1109/TSE.2024.3369766 Methodology for data collection: Test cases generated randomly to trigger a design fault in 13 programs. Both details and techniques used in the methodology are in Section VII of the manuscript Data collector(s): Jesús Morán, Antonia Bertolino, Claudio de la Riva, Javier Tuya Date of data collection: 04/06/2022 Person to contact with questions: Jesús Morán (moranjesus@uniovi.es) Data entry: 08/09/23 Software (including version #) used to prepare data set: Java version 1.8 running MRDebug version 1. The data are processed with R version 4 Data processing that was performed: quantitative analysis of the root cause of the faults using statistical tests. The details of the processing is in Section VII and both code and documentation of data processing is in the file 3_notebook.zip Variables: There are multiple variables in each dataset. The files 1_testCases.zip and 2_executionTestCases.zip contains a readme file describing the variables. File Overview: The supplemental material contains the following structure: 1_testCases.zip: contains the test cases in several folders. The description of the structure is in ./1_testCases/README.txt 2_executionTestCases.zip: contains the execution of the test cases with three debugging techniques. Both content and structure of these files are described in the ./2_executionTestCases/README.pdf file. 3_notebook.zip: contains the code of the experiments. The instructions to install it are in the file 3_notebook.zip/installation.txt.