Automatic Testing of Design Faults in MapReduce Applications
Subject:
Big Data
Combinatorial testing
Metamorphic testing
Partition testing
Publication date:
Editorial:
IEEE
Publisher version:
Citación:
Descripción física:
Abstract:
New processing models are being adopted in Big Data engineering to overcome the limitations of traditional technology. Among them, MapReduce stands out by allowing for the processing of large volumes of data over a distributed infrastructure that can change during runtime. The developer only designs the functionality of the program and its execution is managed by a distributed system. As a consequence, a program can behave differently at each execution because it is automatically adapted to the resources available at each moment. Therefore, when the program has a design fault, this could be revealed in some executions and masked in others. However, during testing, these faults are usually masked because the test infrastructure is stable, and they are only revealed in production because the environment is more aggressive with infrastructure failures, among other reasons. This paper proposes new testing techniques that aimed to detect these design faults by simulating different infrastructure configurations. The testing techniques generate a representative set of infrastructure configurations that as whole are more likely to reveal failures using random testing, and partition testing together with combinatorial testing. The techniques are automated by using a test execution engine called MRTest that is able to detect these faults using only the test input data, regardless of the expected output. Our empirical evaluation shows that MRTest can automatically detect these design faults within a reasonable time
New processing models are being adopted in Big Data engineering to overcome the limitations of traditional technology. Among them, MapReduce stands out by allowing for the processing of large volumes of data over a distributed infrastructure that can change during runtime. The developer only designs the functionality of the program and its execution is managed by a distributed system. As a consequence, a program can behave differently at each execution because it is automatically adapted to the resources available at each moment. Therefore, when the program has a design fault, this could be revealed in some executions and masked in others. However, during testing, these faults are usually masked because the test infrastructure is stable, and they are only revealed in production because the environment is more aggressive with infrastructure failures, among other reasons. This paper proposes new testing techniques that aimed to detect these design faults by simulating different infrastructure configurations. The testing techniques generate a representative set of infrastructure configurations that as whole are more likely to reveal failures using random testing, and partition testing together with combinatorial testing. The techniques are automated by using a test execution engine called MRTest that is able to detect these faults using only the test input data, regardless of the expected output. Our empirical evaluation shows that MRTest can automatically detect these design faults within a reasonable time
ISSN:
Patrocinado por:
Trabajo apoyado por el Ministerio de Ciencia y Tecnología de España, en el marco del proyecto PERTEST (TIN2013-46928-C3-1-R), el Ministerio de Economía y Competitividad de España bajo TestEAMoS (TIN2016-76956-C3-1-R), y los proyectos POLOLAS (TIN2016-76956-C3-2-R); el Principado de Asturias (España), en el marco del proyecto GRUPIN14-007 y las becas predoctorales Severo Ochoa (BP16215); el MIUR italiano, en el proyecto GAUSS (PRIN 2015 , 2015KWREMX), y fondos del FEDER