research paper for software testing

Advanced Search

Artificial Intelligence Applied to Software Testing: A Tertiary Study

University of Naples Federico II, Italy

Sapienza University of Rome, Italy

Federal University of Santa Catarina, Brazil

University of the West of Scotland, United Kingdom

University of Rome UnitelmaSapienza, Italy

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Publisher Site

ACM Computing Surveys

Context: Artificial intelligence (AI) methods and models have extensively been applied to support different phases of the software development lifecycle, including software testing (ST). Several secondary studies investigated the interplay between AI and ST but restricted the scope of the research to specific domains or sub-domains within either area.

Objective: This research aims to explore the overall contribution of AI to ST, while identifying the most popular applications and potential paths for future research directions.

Method: We executed a tertiary study following well-established guidelines for conducting systematic literature mappings in software engineering and for answering nine research questions.

Results : We identified and analyzed 20 relevant secondary studies. The analysis was performed by drawing from well-recognized AI and ST taxonomies and mapping the selected studies according to them. The resulting mapping and discussions provide extensive and detailed information on the interplay between AI and ST.

Conclusion: The application of AI to support ST is a well-consolidated and growing interest research topic. The mapping resulting from our study can be used by researchers to identify opportunities for future research, and by practitioners looking for evidence-based information on which AI-supported technology to possibly adopt in their testing processes.

1 INTRODUCTION

Software testing (ST) and artificial intelligence (AI) are two research areas with a long and ripe history in computing. AI methodologies and techniques have been around for more than 50 years [ 38 ] and, in the current century, with the advances in computational resources and the abundance of data, their potential has vastly increased. As a consequence, AI has been applied to fields as diverse as healthcare [ 39 ], project management [ 54 ], finance [ 66 ], law [ 93 ], and many more. Both the academic research community and the industry have injected AI paradigms to provide solutions to traditional engineering problems. Similarly, AI has evidently been useful to software engineering (SE) [ 7 , 13 , 47 ]. ST has always been an intrinsic part of the software development lifecycle [ 85 ]. Yet, as software has become more and more pervasive, it has also grown in size and complexity [ 16 ], bringing new challenges to software testing practices [ 64 ]. Therefore, with AI poised to enhance knowledge work, there is interest in analyzing how it has been used to improve testing practices. Several studies have explored the interplay between AI and ST [ 51 ]. Yet, given the breadth and depth of each of these disciplines, high-quality review studies tend to focus their scope on orthogonal selections in each of these areas. For instance, the use of evolutionary algorithms for regression test case prioritization has been investigated in [ 76 ] while the application of natural language processing technique in ST has been analyzed in [ 35 ]. Alternatively, unstructured review papers or position papers have proposed how these two fields would merge.

although there are already several secondary studies investigating the application of AI to ST, to manage the vastness of the two research areas, most of these studies limit their scope with an orthogonal division of one or both areas;

there is a wealth of primary studies that makes it unfeasible to approach our research goal with a secondary study, if not by limiting the scope of the research, as the identified secondary studies have done;

a systematic process makes the work reproducible and provides internal consistency to the results and focuses the discussion on available evidence in existing secondary studies;

a systematic mapping study is suited to structure a research area [ 77 ], and as such is more suitable than a systematic literature review in our research context because of the size and scope of the bodies of knowledge (AI and ST). Furthermore, after an initial investigation, we noted that secondary studies that applied systematic literature reviews as their research method are able to do so by limiting the scope to a sub-domain (for instance, search base techniques for ST). Therefore, to observe the whole possible interplay between AI and ST, a systematic mapping is the suitable research method for this tertiary study.

As a main contribution, this article provides a broad view of the AI applications in support of ST. An additional novel contribution of this work is a fine-grained mapping showing how specific testing fields have been supported by specific AI sub-domains and methodologies. This mapping can be used by researchers to identify open topics for future research on new applications of AI for ST and by practitioners to make decisions on the most suitable AI-supported technologies that can be introduced in their testing processes. To the best of our knowledge, this is the first tertiary study that attempts to depict a comprehensive picture of how AI is used to support ST, and how the two research domains are woven. The remainder of the article is organized as follows. Section 2 introduces the key concepts and terminology related to the areas of interest of our study. Section 3 describes the protocol we designed to support the process of selecting secondary studies of interest and for extracting evidence from them. Section 4 provides insights about the process execution. Section 5 analyzes extracted data and answers our research questions. Section 6 presents overall considerations on the results of our study and provides a focus on testing activities whose automation has been supported by different AI techniques. Section 7 discusses threats to the validity of our study. Finally, Section 8 concludes the article and provides final remarks.

2 BACKGROUND

AI and ST are two large and complex research areas for which there are no universally agreed upon taxonomies nor bodies of knowledge. As a way to define the language and vocabulary that has been used throughout the article, we built two taxonomies, one for each research area. The taxonomy shown in Figure 6 reports the AI key concepts that have been used to support ST, whereas the one in Figure 7 refers to the ST key concepts that have been supported by AI. In the following sections, we provide a short description of the two research areas, the domains, and sub-domains of each taxonomy along with a short definition of related key concepts that are relevant to our study. For each key concept, we also provide a proper literature reference from which it is possible to access more detailed and complete definitions.

Fig. 1. Diagram of the secondary studies selection process execution.

Fig. 2. Distribution of secondary studies per publication year and type.

Fig. 3. World map of the authors’ affiliation countries.

Fig. 4. Primary studies selected by each secondary study (including repeated primary studies).

Fig. 5. Distribution of unique primary studies per publication year.

Fig. 6. The resulting excerpt taxonomy of AI supporting ST, built starting from the EU AI Watch report. Gray boxes represent key concepts not explicitly included in the AI Watch [ 88 ] report and added as a result of the data analysis process. Each concept is annotated with the labels of secondary studies in which it was surveyed. Original domain labels are reported in bold inside sub-domains boxes.

Fig. 7. The resulting excerpt AI supported ST taxonomy, built starting from the SWEBOK [ 18 ]. Gray boxes represent key concepts not explicitly included in the SWEBOK and added as a result of the data analysis process. Each concept is annotated with the labels of secondary studies in which it was surveyed.

2.1 Artificial Intelligence

Although there exist many definitions of AI, for the aims of this study, we mention the one given in the European Commission JCR report on AI [ 59 ]: “AI is a generic term that refers to any machine or algorithm that is capable of observing its environment, learning, and based on the knowledge and experience gained, taking intelligent action or proposing decisions. There are many technologies that fall under this broad AI definition. At the moment, ML techniques are the most widely used.” This definition was adopted by the AI Watch 1 in [ 88 ] as the starting point for the specification of an operational definition and a taxonomy of AI aimed at supporting the mapping of the AI landscape and at detecting AI applications in a wide range of technological contexts. The taxonomy provided by the AI Watch report includes five core scientific domains, namely, Reasoning , Planning , Learning , Communication , and Perception , and three transversal domains, namely Integration and Interaction , Services , and Ethics and Philosophy . The overall taxonomy is depicted in Figure 6 , where: (i) white boxes represent domains and key concepts drawn from the AI Watch report [ 88 ], while (ii) gray boxes are additional key concepts extracted, during the mapping process, from the analyzed secondary studies.

2.1.1 Reasoning.

The AI domain studying methodologies to transform data into knowledge and infer facts from them. This domain includes three sub-domains: knowledge representation , automated reasoning , and common sense reasoning . Knowledge representation is the area of AI addressing the problem of representing, maintaining, and manipulating knowledge [ 56 ]. Automated reasoning is concerned with the study of using algorithms that allow machines to reason automatically [ 14 ]. Finally, as described in [ 27 ], common sense reasoning is the field of science studying the human-like ability to make presumptions about the type and essence of ordinary situations. Key concepts related to our study and belonging to this domain are: (i) fuzzy logic , a form of logic in which the truth value of variables may be any real number (between 0 and 1) [ 72 ], (ii) knowledge representation and reasoning , the use of symbolic rules to represent and infer knowledge [ 56 ], (iii) ontologies , forms of knowledge representation facilitating knowledge sharing and reuse [ 31 ], and (iv) semantic web , an extension of the World Wide Web through standards set by the World Wide Web Consortium, 2 which “ ...enables people to create data stores on the Web, build vocabularies, and write rules for handling data” 3 [ 75 ].

2.1.2 Planning.

The AI domain whose main purpose concerns the design and execution of strategies to carry out an activity, typically performed by intelligent agents, autonomous robots, and unmanned vehicles. In this domain, strategies are identified by complex solutions that must be discovered and optimized in a multidimensional space. This domain includes three highly related sub-domains dealing with the problem of optimizing the search for solutions to planning and scheduling problems, namely, planning and scheduling , searching , and optimization . Key concepts related to our study and belonging to this domain are: (i) constraint satisfaction , the process of finding a solution to a set of constraints on a set of variables [ 99 ], (ii) evolutionary algorithms , a subset of metaheuristic optimization algorithms based on mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection [ 6 ], (iii) genetic algorithms , a branch of evolutionary algorithms inspired by the process of natural selection relying on biologically inspired operators such as mutation, crossover and selection [ 69 ], (iv) graph plan algorithms , a family of planning algorithms based on the expansion of compact structures known as planning graphs [ 17 ], (v) hyper-heuristics , the field dealing with the problem of automating the design of heuristic methods to solve hard computational search problems [ 21 ], and (vi) metaheuristic optimization , the research field dealing with optimization problems using metaheuristic algorithms [ 19 ].

2.1.3 Learning.

The AI domain dealing with the ability of systems to automatically learn, decide, predict, adapt and react to changes and improve from experience, without being explicitly programmed. The corresponding branch of the resulting taxonomy is mainly constructed with machine learning (ML)-related concepts. Key concepts related to our study and belonging to this domain are: (i) artificial neural networks , a family of supervised algorithms inspired by the biological neural networks that constitute animal brains [ 41 ], the training of a neural network consists in observing the data regarding the inputs and the expected output, and in forming probability-weighted associations between the two, which are stored within the data structure of the network itself - designed as a sequence of layers of connected perceptrons [ 86 ], (ii) boosting , is an ensemble meta-algorithm for the reduction of bias and variance error’s components [ 20 ], (iii) classification , a supervised task where a model is trained on a population of instances labeled with a discrete set of labels and the outcome is a set of predicted labels for a given collection of unobserved instances [ 55 ], (iv) clustering , an unsupervised task were given a similarity function, objects are grouped into clusters so that objects in the same cluster are more similar to each other than to objects in other clusters [ 105 ], (v) convolutional neural networks , a specialized type of neural networks that uses convolution in place of general matrix multiplication in at least one of its layers [ 37 ], (vi) decision trees , a family of classification and regression algorithms that learn hierarchical structures of simple decision rules from data and whose resulting models can be depicted as trees were nodes represent decision rules and leaf nodes are the outcomes [ 70 ], (vii) ensemble methods , algorithms leveraging a set of individually trained classifiers (such as, decision trees) whose predictions are combined to produce more accurate predictions than any of the single classifiers [ 73 ], (viii) probabilistic models , a family of classifiers that are able to predict, given an observation of an input, a probability distribution over a set of classes [ 40 ], (ix) recurrent neural networks , neural networks with recurrent connections, which can be used to map input sequences to output sequences [ 15 ], (x) reinforcement learning , is one of the fundamental machine learning paradigms, where algorithms address the “problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment” [ 46 ], (xi) regression , a set of mathematical methods that allow data scientists to predict a continuous outcome based on the value of one or more predictor variables [ 106 ], (xii) supervised learning , a machine learning paradigm for problems where the available data consists of labelled examples [ 87 ], (xiii) support vector machines , supervised learning algorithms where input features are non-linearly mapped to a very high-dimension feature space and a linear decision surface is constructed to generate classification and regression analysis models [ 24 ], and (xiv) unsupervised learning , one of the fundamental machine learning paradigms where algorithms try to learn patterns from unlabelled data [ 87 ].

2.1.4 Communication.

The AI domain referring to the abilities of identifying, processing, understanding, and generating information from written and spoken human communications. This domain is mainly covered by the natural language processing (NLP) [ 45 , 62 ]. Key concepts related to our study and belonging to this domain are: (i) information extraction , the automatic extraction of structured information, such as entities, relationships and attributes describing entities, from unstructured sources [ 90 ], (ii) information retrieval deals with the problem of “finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers)” [ 61 ], (iii) natural language generation refers to “the process of constructing natural language outputs from non-linguistic inputs” [ 80 ], (iv) natural language understanding refers to “computer understanding of human language, which includes spoken as well as typed communication” [ 103 ], (v) text mining is the semi-automated process of extracting knowledge from a large number of unstructured texts [ 29 ], and (vi) word embedding is “a word representation involving the mathematical embedding from a space with many dimensions per word to a continuous vector space with a much lower dimension” [ 45 ].

2.1.5 Perception.

Refers to the ability of a system to become aware of the environment through the senses of vision and hearing. Although this is a broad domain of AI with many AI applications, the only key concept coming from this domain (particularly, from the sub-domain of computer vision ) related to our study is image processing , which is the field dealing with the use of machines to process digital images through algorithms [ 78 ].

2.1.6 Integration and Interaction.

A transversal AI domain comprising, among others, the multi-agent systems sub-domain. It can be described as the domain that addresses the combination of perception, reasoning, action, learning and interaction with the environment, as well as characteristics such as distribution, coordination, cooperation, autonomy, interaction and integration. Key concepts related to our study and belonging to this domain are: (i) intelligent agent , an entity equipped with sensors and actuators that exhibits some form of intelligence in its action and thought [ 87 ], (ii) q-learning , a reinforcement learning algorithm (see Section 2.1.3 ), which provides agents with the capability of learning to act optimally in Markovian domains by experiencing the consequences of actions, without requiring them to build maps of the domains [ 102 ], and (iii) swarm intelligence , which refers to the algorithms typically based on a population of simple agents interacting locally with one another and with their environment [ 42 ].

For completeness, we remark that the reference AI Watch taxonomy includes two—unrelated to this study—additional transversal domains, namely, Services and Ethics and Philosophy . Where, the first domain includes all forms of infrastructure, software and platform provided as services or applications, and the second is related to important issues regarding the impact of AI technologies in our society.

2.2 Software Testing

ST is defined by the 29119-1-2013 ISO/IEC/IEEE International Standard as a process made by a set of interrelated or interacting activities aimed at providing two types of confirmations: verification and validation [ 34 ]. Verification is a confirmation that specified requirements have been fulfilled in a given software product (a.k.a., work item or test item), whereas validation demonstrates that the work item can be adopted by the users for their specific tasks. The main objective of ST is to assess the absence of faults , errors , or failures in the test items. Among the great number of taxonomies proposed in the literature for describing the different and heterogeneous aspects of the ST research area, in this work, we refer to the unified view proposed by Software Engineering Body of Knowledge (SWEBOK) [ 18 ]. The SWEBOK is a guide to the broad scope of software engineering. Its core is a tested and proven knowledge base that has been developed and continues to be updated frequently, through practices that have been documented, reviewed, and discussed by the software engineering community. Even more precisely, in this article, we refer to dynamic testing that comprises the activities that are performed to assess whether a software product works as expected when it is executed [ 34 ]. The ST taxonomy is shown in Figure 7 , where, the white boxes represent ST domains and key concepts drawn from the SWEBOK [ 18 ], while the gray boxes are additional key concepts extracted from the analyzed secondary studies during the mapping process and missing in the SWEBOK. In the remainder of this section, we provide a short description for each domain and key concept of the taxonomy. Moreover, we indicated one or more references for the key concepts that are not described in the SWEBOK.

2.2.1 Test Target.

The ST domain that defines the possible objects of the testing. The target can vary from a single module to an integration of such modules (related by purpose, use, behavior, or structure) and an entire system. In this domain, we recognized three relevant fields: (i) Unit Testing , which verifies the correct behavior, in isolation, of software elements that are separately testable; (ii) Integration Testing , which is intended to verify the correct interactions among software components; and (iii) System Testing , which is concerned with checking the expected behavior of an entire system.

2.2.2 Testing Objective.

The ST domain defining the purpose of a testing process. Test cases can be designed to check that the functional specifications are correctly implemented. This objective is also defined in literature as conformance testing, correctness testing, functional testing, or feature testing [ 11 ]. However, in Non-functional Testing , several other nonfunctional properties may be verified as well, including reliability , usability , safety , and security , among many others quality characteristics such as compatibility [ 30 ] and quality of service (QoS) [ 1 ]. Other possible testing objectives are the following ones: (i) Acceptance Testing , which determines whether a system satisfies its acceptance criteria, usually by checking desired system behaviors against the customer’s requirements; (ii) Regression Testing , which, according to [ 33 ], is “...selective retesting of a system or component to verify that modifications have not caused unintended effects and that the system or component still complies with its specified requirements..” ; (iii) Stress Testing , which exercises the software at the maximum design load with the goal of determining the behavioral limits, and to test defense mechanisms in critical systems; (iv) structural testing , whose target is to cover the internal structure of the system source code or model [ 89 ]; and (v) GUI Testing , which focuses on detecting faults related to the Graphical User Interface (GUI) and its code [ 89 ].

2.2.3 Testing Technique.

The ST domain dealing with the detection of as many failures as possible. Testing techniques have the main goal of identifying inputs that produce representative program behaviors and assessing whether these behaviors are expected or not in comparison to specific oracles. Testing techniques have been classified on the basis of how they design or generate test cases. Possible testing techniques are described as follows: (i) Combinatorial Testing , where test cases are derived by combining interesting values for every pair of a set of input variables instead of considering all possible combinations; (ii) Mutation Testing , which uses mutants, i.e., mutated versions of the source code under test, as test goals to create or improve test suites; (iii) Random Testing , which generates tests purely at random; (iv) Model-based Testing , which is used to validate requirements, check their consistency, and generate test cases focused on the behavioral aspects of the software. The software under test is usually represented in a formal or semi-formal way by means of models; (v) Equivalence Partitioning Testing , which involves the partitioning of the input domain into a collection of subsets (or equivalent classes) based on a specified criterion or relation; (vi) Requirement-based Testing , which extracts test cases from requirements in any (partially) automated way [ 35 ]; (vii) Concolic Testing employs the symbolic execution of a program paired with its actual execution [ 11 ]; (viii) Metamorphic-based Testing , which uses metamorphic relationships for the test oracles definition [ 32 ]; (ix) Concurrency Testing , where tests are generated for verifying the behavior of concurrent systems [ 2 ]; and (x) Statistical Testing , where the test cases are generated starting from statistical models such as Markov Chains [ 12 ].

2.2.4 Testing Activity.

The ST domain that outlines the activities that can be performed by testers and testing teams into well-defined controlled processes. Such activities vary from test planning to test output evaluation, in such a way as to provide assurance that the test objectives are met in a cost-effective way. Well-known testing activities presented in the literature are the ones presented in the following of this section. (i) Test Case Generation whose goal is to generate executable test cases based on the level of testing to be performed and the particular testing techniques. (ii) Test Planning is a fundamental activity of the ST process, its includes the coordination of personnel, availability of test facilities and equipment, creation and maintenance of all test-related documentation, and planning for the execution of other testing activities. (iii) Test Logs Reporting is used to identify when a test was conducted, who performed the test, what software configuration was used, and other relevant information to identify or reproduce unexpected or incorrect test results. (iv) Defect Tracking is the activity where the defects can be tracked and analyzed to determine when they were introduced into the software, why they were created (for example, poorly defined requirements, incorrect variable declaration, memory leak, programming syntax error), and when they could have been first observed in the software. (v) Test Results Evaluation is performed to determine whether the testing has been successfully executed. In most cases, “successful” means that the software performed as expected and did not have any major unexpected outcomes. Not all unexpected outcomes are necessarily faults, but are sometimes determined to be simply noise. Before a fault can be removed, an analysis and debugging effort is needed to isolate, identify, and describe it. (vi) Test Execution represents both the execution of test cases and the recording of the results of those test runs. Execution of tests should embody a basic principle of scientific experimentation: everything done during testing should be performed and documented clearly enough that another person could replicate the results. (vii) Test Environment Development regards the implementation of the environment that is used for testing. It should be guaranteed that the environment is compatible with the other software engineering tools adopted during the testing process. It should facilitate the development and control of test cases, as well as the logging and recovery of expected results, scripts, and other testing materials. (viii) Test Oracle Definition is the activity performed either to generate automatically or to support the creation of test oracles [ 32 ]. (ix) Test Case Design and Specification is executed to design or to specify the testing cases. This activity usually starts from the analysis of the requirements of the system under test [ 30 , 35 ]. (x) Test Case Optimization/ Prioritization/Selection is performed for the optimized reduction or prioritization or selection of test cases to be executed [ 43 ]. (xi) Test Data Definition , a.k.a. test data generation, is the activity where the data for test cases are produced [ 98 ]. (xii) Test Repair is, in essence, a maintenance activity. Within the course of this activity, test scripts are adjusted to changed conditions. The need for it lies in the fact that test scripts are fragile and vulnerable to the changes introduced by developers in a newer version of the tested software [ 98 ]. (xiii) Flaky Test Prediction is the activity where the tests expressing similar characteristics are identified and repaired. This activity significantly improves the overall stability and reliability of the tests [ 98 ]. A flaky test is considered as such when it reports false positive or false negative test result, or when adjustment was made to the test scripts and/or to the code of the system under test. (xiv) Test Costs Estimation has the main goal to predict the testing costs, mainly the testing time [ 30 ].

2.2.5 Software Testing Fundamentals.

The ST domain that covers the Testing Related Terminology , such as basic definitions, basic terminology and key issues, and relationships between software testing and other software engineering activities.

3 TERTIARY SYSTEMATIC MAPPING PROTOCOL

In this section, we describe the research protocol adopted to conduct our tertiary systematic mapping study. The protocol was designed following the guidelines proposed by Petersen et al. [ 77 ] to fulfill the requirements of a structured process, whose execution details are provided in Section 4 . Specifically, the protocol includes the following steps: (i) definition of goals and research questions, (ii) definition of the search string, (iii) selection of electronic databases, (iv) definition of inclusion and exclusion criteria, (v) definition of quality assessment criteria, and (vi) design of the data extraction form. We describe in detail each of these steps and their outcomes in the rest of this section.

3.1 Goal and Research Questions

The goal of our study is to understand how AI has been applied to support ST. To reach this goal, we defined nine research questions (RQs) grouped into two categories publication space and research space questions (as suggested by [ 77 ]). Publication space questions aim at characterizing the bibliographic information (i.e., venue, year of publication, authors’ affiliation, etc.) of the identified sources (i.e., secondary studies). Research space questions aim at providing the answers needed to achieve the research goal.

Publication Space (PS) RQs.

We defined the following five publication space research questions:

PS-RQ1:	How many secondary studies have been identified per publication year?
PS-RQ2:	Which types of secondary studies have been executed?
PS-RQ3:	What are the venues where the secondary studies have been published?
PS-RQ4:	What are the authors’ affiliation countries of the selected secondary studies?
PS-RQ5:	What is the amount of primary studies analyzed by the selected secondary studies and how are they distributed over time?

Research Space (RS) RQs.

We defined the following four research space research questions:

RS-RQ1:	What AI domains have been applied to support ST?
RS-RQ2:	What domains of ST have been supported by AI?
RS-RQ3:	Which ST domains have been supported by which AI domains, and how?
RS-RQ4:	What are the future research directions of AI in ST?

3.2 Search String Definition

To systematically define the search string to be used for finding secondary studies of interest, we adopted the PICOC (Population, Intervention, Comparison, Outcome, and Context) criteria as suggested in Petersen et al. [ 77 ]. The main term of each of the PICOC view points are described in the following:

Population : We identified Software Testing as the main term of this view point, since it is the domain of interest of our study.

Intervention : We identified Artificial Intelligence as the main term of this view point, since our research questions are aimed at investigating how this science has been applied to the population .

Comparison : This view point is not applicable in a systematic mapping study, since no effect of the intervention on the population can be expected.

Outcome : this view point is not applicable in a systematic mapping study, since no effect of the intervention on the population can be expected.

Context : We identified Secondary Study as main term of this view point, since it is the context where we expect to find sources.

To identify the keywords of the search string, we followed the general approach as suggested by Kitchenham and Charters [ 52 ]. Hence, we performed a break down of our research questions (see Section 3.1 ) into individual facets (one for each PICOC view point). Then, we generated a list of synonyms, abbreviations, and alternative spellings. Additional terms were obtained by considering subject headings used in journals and scientific databases. The main terms and the synonyms we inferred for the PICOC view points are shown in Table 1 . Finally, the search string was obtained by the conjunction (AND) of disjunction (OR) predicates, each built on the main term and the corresponding synonyms of a PICOC view point. Moreover, as suggested by the Kitchenham [ 52 ] and Petersen [ 77 ] guidelines, we checked our search string against four selected control papers (Garousi et al. [ 35 ], Trudova. et al. [ 98 ], Catal [ 22 ], and Durelli et al. [ 30 ]). 4 The resulting search string is shown in Box 1 .


Population	Software Testing	Based Testing, Dynamic Testing, Static Testing, Test Oracle, Test Design, Test Execution, Test Report, Test Plan, Test Automation, Automated Test, Test Case, Bug Detection, Fault Detection, Error Detection, Failure Detection.
Intervention	Artificial Intelligence	AI, Linguistic, Computer Vision, Recommend System, Decision Support, Expert System, NLP, Natural Language Processing, Data Mining, Information Mining, Text mining, Learning, Supervised, Unsupervised, Rule-based, Training, Decision Tree, Neural Network, Bayesian network, Clustering, Genetic Programming, Genetic Algorithm, Evolutionary Programming, Evolutionary Algorithm, Evolutionary Computation, Ensemble Method, Search-based, Intelligent Agent, Naive Bayes, Ontology, Random Forest, Reasoning, SVM, Support Vector, Activation Function, Autoencoder, Backpropagation, Boosting, Cross-validation, Ground Truth, Ant Colony, Bee Colony, Particle Swarm, Robotics, Planning.
Comparison	N.A.
Outcome	N.A.
Context	Secondary Study	Survey, Mapping, Review, Literature Analysis

Table 1. PICOC Main Terms and their Synonyms

3.3 Digital Libraries Selection

To retrieve candidate studies, we selected four of the most well-known digital libraries usually adopted for conducting literature review and mapping studies [ 4 ]. The digital libraries adopted in this study are: ACM Digital Library , 5 IEEE Xplore , 6 Web of Science , 7 and Scopus . 8 We adapted the search string to accommodate to the syntax required by each digital library search engine, hence we built four queries that apply our search string to the title and abstract attributes. Additionally, for Scopus and Web of Science , we limited the results to the computer science and computer engineering categories. Since the ACM Digital Library and the IEEE Xplore gather publications within computer science and computer engineering no restrictions have been applied in the corresponding queries.

3.4 Inclusion and Exclusion Criteria Definition

To support the selection of retrieved secondary studies, we defined exclusion and inclusion criteria. When defining these criteria, we acknowledged our complementary skills in AI and ST. Therefore, as we will mention in Section 4 , we defined these criteria with the outright intention that its application would be supported by classifiers with the required skills to properly apply the criteria in the context of the expertise of each of our fields.

Exclusion Criteria (EC) .

We excluded a publication if at least one of the following 6 EC applies:

(EC1)	The study focuses on the testing of AI-based software systems rather than the application of AI to ST.
(EC2)	The study focuses on the application of AI for either the prediction or the analysis, or the localization of: (i) errors; (ii) faults; (iii) bugs; or (iv) failures.
(EC3)	The study is a duplicate of another candidate paper.
(EC4)	The study does not provide substantially different contribution compared to another candidate work written by the same authors.
(EC5)	The study has another candidate paper, written by the same authors, which is an extended version of it.
(EC6)	The study is a tertiary systematic mapping.

We remark that, to apply EC2, we took special attention to confirm that the source shares our focus on dynamic testing. In particular, we are looking to exclude studies (systematic mapping or reviews) that are not focused on the design and execution of test cases.

Inclusion Criteria (IC) .

We included a publication i.i.f. all the following 4 IC apply:

(IC1)	The study is a secondary study.
(IC2)	The study addresses the topic of AI applied to ST.
(IC3)	The study is a peer-reviewed paper.
(IC4)	The study is written in English.

3.5 Quality Assessment Criteria Definition (QC)

To filter-out low quality publications, we scored each candidate paper according to a list of six quality assessment criteria inspired by Kitchenham et al. [ 53 ]. We report in Table 2 such QCs along with the rationale we adopted to assign a score \(\in \lbrace 0.0, 0.5, 1.0\rbrace\) to each paper. We evaluate the overall quality of a candidate by summing up the six QC scores and excluding those papers reaching an overall score lower than 3.0.


were there explicit research questions?	Source presents the research questions, and these guide the secondary study through the application of PICOC (or a variation)	Source present the research questions, and it guides the secondary study without a formal mapping to the search strategy	Source does not present research questions that guide the secondary study
were inclusion and exclusion criteria reported?	Inclusion and exclusion criteria are explicitly defined	Implicit inclusion/exclusion criteria	Inclusion and exclusion criteria are not defined and cannot be inferred
was the search strategy adequate?	Searched in 4 or more digital libraries and included additional search strategies	Searched in 3 or 4 digital libraries with no extra search strategies	Searched up to 2 digital libraries
was the quality of the included studies assessed?	Quality criteria explicitly defined and extracted for each secondary study	Quality issues of primary studies addressed by research questions	No explicit quality assessment
were there sufficient details about the individual included studies presented?	Each primary study can clearly be traced from the information provided.	Only summary information is provided for each individual study.	Results of individual studies are neither specified nor summarized.
were the included studies synthesized?	Data was extracted, summarized and interpreted.	Data was extracted and summarized but not interpreted.	Data was not summarized nor interpreted.

Table 2. Quality Assessment Criteria

3.6 Data Extraction Form Design

To support the data extraction process, we designed the data extraction form reported in Table 3 . This form was used to report the pieces of evidence—extracted from the selected papers—that will be analyzed to answer the RQs.



1	Title	Title of the secondary study	–
2	Abstract	Abstract of the secondary study	–
3	Authors	Authors list of the secondary study	–
4	Year	Publication year of the secondary study	PS-RQ1
5	Study Type	Type of secondary study, i.e., SM, Review, SLR, Multivocal	PS-RQ2
6	Venue	Name of the venue where the secondary study was published	PS-RQ3
7	Venue Type	Type of the venue where the secondary study was published	PS-RQ3
8	Institutions	Authors’ Institutions list of the secondary study	PS-RQ4
9	Primary Studies	List of primary studies reviewed by the secondary study	PS-RQ5


10	AI Space	List of extracted sentences on AI domain concepts	RS-RQ1
11	ST Space	List of extracted sentences on ST domain concepts	RS-RQ2
12	AI applied to ST Space	List of extracted sentences on AI applied to ST	RS-RQ3
13	Future Directions Space	List of extracted sentences on future directions in AI applied to ST	RS-RQ4

In this table, we enumerate and describe the fields composing the data extraction forms for the PS and RS RQs.

Table 3. Data Extraction form

The form includes a list of fields organized in two sections, one dedicated to the publication space RQs and the other dedicated to the research space RQs. For each field, we provide a name, a brief description of the data that the field is meant to collect, and the RQ for which the field is used for.

4 TERTIARY SYSTEMATIC MAPPING EXECUTION

In this section, we describe the execution of the tertiary systematic mapping study we conducted with the protocol that we introduced in Section 3 . Specifically, in Section 4.1 , we provide details about the selection process while, in Section 4.2 , we provide details about the data extraction process.

4.1 Selection Process Execution

The process followed to select secondary studies is shown in Figure 1 . The figure provides a representation of the executed steps and their outcomes. The selection process is based on the execution of the two stages described in the following. The full selection process was executed on June 2021 and repeated on May 2022 to ensure that we did not miss any recent secondary study on the investigated topic.

4.1.1 First Stage.

This stage was executed to select a preliminary set of secondary studies and relies on the sequential execution of the following four steps:

(1)	: In this step, the queries (introduced in Section ) were submitted to the four digital libraries reported in Section . As a result, 877 secondary studies were retrieved.
(2)	: The 877 papers were divided into two groups. The title, abstract, and keywords of the secondary studies in each group were analyzed by two researchers—one AI expert and one ST specialist—to apply IC and EC presented in Section . At the end of this step, 806 studies were excluded, since both researchers reached the same consensus to remove them. The remaining 71 papers were included and passed to the next step.
(3)	: The 71 papers were divided into two groups and the full text of each paper was read by two researchers—an AI expert and an ST specialist—to apply again the IC and EC. In the end, 29 studies were excluded, 32 were included, and 10 papers were labeled as a “doubt.” All doubts came from studies for which no agreement was reached.
(4)	: Two additional researchers—one AI expert and one ST specialist—were involved to reach an agreement on “doubt” papers. To this aim, all four researchers performed an additional discussion based on the papers’ full read and analysis. At the end of the discussion, 4 studies were excluded whereas the remaining 6 were selected. As a consequence, the final set of selected papers included 38 studies.

4.1.2 Second Stage.

This stage refers to a snowballing process [ 104 ] that was conducted as a complementary search strategy to mitigate the threat of missing literature. As shown in Figure 1 the stage relies on the execution of six sequential steps, three of which (i.e., steps 2, 3, and 4) involve the same steps that we described in the first stage.

(1)	: In this automatic step, 296 secondary studies were retrieved by applying backward and forward snowballing to the 63 papers selected in the step of the First stage. Specifically, for the backward snowballing, we collected all studies cited by the 63 papers using their references. For the forward snowballing, we used Google Scholar instead of the four digital libraries already exploited in the first stage, as it allowed us to fully automate the retrieval of papers citing one or more of the 63 papers.
(2)	: As a result of this step, 277 secondary studies were excluded by applying IC and EC. We remark that, to apply the EC3, we needed, as input to this step, the 63 papers selected in the step of the First stage of the process. As a result, 19 papers were included for full reading in the following selection steps.
(3)	: In this step, among the 19 secondary studies, 11 were excluded, five were included, and the remaining three papers were labeled as “doubt” and needed an extra analysis.
(4)	: Among the three papers labeled as “doubt,” two were excluded, and the other one was included. As a consequence, six secondary studies were obtained by snowballing.
(5)	: The papers selected in the two stages were merged to define a set of 44 candidate secondary studies.
(6)	: In this step, each paper was analyzed by one of the researchers who scored the source according to the six quality criteria described in Section . The studies with a quality score lower than 3 were excluded. Borderline papers, i.e., papers with a quality score between 2 and 4, were discussed by all the researchers. As a result of the quality assessment (see Table ), 20 secondary studies were included into the final set of selected papers.


F1	A systematic mapping addressing Hyper-Heuristics within Search-based Software Testing [ ]	1	1	1	0.5	1	1	5.5
F2	NLP-assisted software testing: A systematic mapping of the literature [ ]	1	1	1	0.5	1	1	5.5
F3	Analyzing and documenting the systematic review results of software testing ontologies [ ]	1	1	1	1	1	1	6
F4	A systematic literature review on semantic web enabled software testing [ ]	1	1	1	1	1	1	6
F5	Artificial intelligence in software test automation: A systematic literature review [ ]	1	1	1	1	1	1	6
F6	On the application of genetic algorithms for test case prioritization: A systematic literature review [ ]	0.5	1	1	0.5	0.5	0.5	4
F7	A systematic review of search-based testing for non-functional system properties [ ]	1	0.5	1	0.5	1	1	5
F8	Systematic Literature Review on Search-based mutation testing [ ]	1	0	1	0	0.5	0.5	3
F9	The experimental applications of search-based techniques for model-based testing: Taxonomy and systematic literature review [ ]	0.5	1	1	0	0.5	1	4
F10	A systematic review on search-based mutation testing [ ]	1	1	1	0	1	1	5
F11	A systematic review of the application and empirical investigation of search-based test case generation [ ]	1	1	1	0.5	0.5	1	5
F12	Machine learning applied to software testing: A systematic mapping study [ ]	0.5	1	1	0	1	0.5	4
F13	Using Genetic Algorithms in Test Data Generation: A Critical Systematic Mapping [ ]	0.5	1	1	0	1	1	4.5
F14	Ontologies in software testing: A Systematic Literature Review [ ]	0.5	1	1	0	1	1	4.5
F15	A comprehensive investigation of natural language processing techniques and tools to generate automated test cases [ ]	0.5	1	0.5	1	1	1	5
F16	Search-based Higher Order Mutation Testing: A Mapping Study [ ]	1	1	0.5	0.5	0	0	3
F17	Trend Application of Machine Learning in Test Case Prioritization: A Review on Techniques [ ]	1	1	0	1	0.5	0.5	4
F18	Using machine learning to generate test oracles: A systematic literature review [ ]	1	0	0.5	0	0.5	1	3
F19	Test case selection and prioritization using machine learning: a systematic literature review [ ]	1	1	1	0	1	1	5
F20	A Systematic Literature Review on prioritizing software test cases using Markov chains [ ]	1	1	1	0	1	1	5
	A survey on regression testing using nature-inspired approaches [ ]	0	0	0	0	0.5	0.5	1
	The Generation of Optimized Test Data: Preliminary Analysis of a Systematic Mapping Study [ ]	0.5	0	0.5	0	0.5	0.5	2
	Artificial Intelligence Applied to Software Testing: A Literature Review [ ]	0	0	0.5	0	0.5	0.5	1,5
	Use of Evolutionary Algorithm in Regression Test Case Prioritization: A Review [ ]	0.5	0	0	0	0.5	0	1
	An extensive evaluation of search-based software testing: a review [ ]	0.5	0	1	0	0	0.5	2
	Integration of properties of virtual reality, artificial neural networks, and artificial intelligence in the automation of software tests: A review [ ]	0.5	0	0	0	0.5	0.5	1,5
	A Systematic Literature Review of Test Case Prioritization Using Genetic Algorithms [ ]	0	0	0	0	0.5	0	0.5
	A critical review on automated test case generation for conducting combinatorial testing using particle swarm optimization [ ]	0	0	0	0	1	1	2
	A systematic review of software testing using evolutionary techniques [ ]	0	0	0	0	1	1	2
	Evolutionary algorithms for path coverage test data generation and optimization: A review [ ]	0	0	0	0	1	1	2
	Search-based secure software testing: A survey [ ]	1	0	0	0	0	0	1
	Multi-objective test case minimization using evolutionary algorithms: A review [ ]	0	0	0	0	1	1	2
	Literature survey of Ant Colony Optimization in software testing [ ]	0	0	0	0	0	0.5	0.5
	Heuristic search-based approach for automated test data generation: A survey [ ]	0.5	0	1	0	0.5	0.5	2,5
	Soft computing-based software test cases optimization: A survey [ ]	0	0	0	0	0.5	0	0.5
	Bayesian concepts in software testing: An initial review [ ]	0.5	0.5	1	0	0	0.5	2,5
	Search-based techniques and mutation analysis in automatic test case generation: A survey [ ]	0	0	0	0	1	0.5	1,5
	A Survey on Testing software through genetic algorithm [ ]	0	0	0	0	1	1	2
	Evolutionary software engineering, a review [ ]	0	0	0	0	1	0.5	1,5
	Search-based software test data generation: A survey [ ]	0	0	0	0	0.5	0.5	1
	Nature-inspired approaches to test suite minimization for regression testing [ ]	1	0	0	0	0.5	0	1,5
	Review of search-based techniques in software testing [ ]	0	0	0	0	0	0	0
	Object-Oriented Evolutionary Testing: A Review of Evolutionary Approaches to the Generation of Test Data for Object-Oriented Software [ ]	0	0	0	0	0	0	0
	A systematic review of agent-based test case generation for regression testing [ ]	1	0.5	0.5	0	0	0	2

For each paper, we report the corresponding quality scores.

Table 4. Quality Assessment Results

Hereafter, we refer to each selected paper following the identifier (i.e., F1, F2, ..., F20) provided in Table 4 at column ID .

4.2 Data Extraction Execution

Since our RQs (see Section 3.1 ) cross two different research areas, we divided the authors into two groups, each containing an AI expert and an ST specialist. The two members of each group collaborated in the extraction of the pieces of evidence from each of the 20 selected studies using the extraction form shown in Table 3 . Finally, to reach a large consensus, the two groups shared the extracted pieces of evidence and discussed the differences.

5 DATA ANALYSIS

In this section, we describe the results of the analysis performed on the extracted data (see Section 4.2 ) to answer our RQs (see Section 3.1 ). Specifically, in Section 5.1 , we provide the answers to our PS-RQs, while in Section 5.2 to our RS-RQs.

5.1 Publication Space Research Questions—Results

In the following sections, we answer the five PS-RQs of this study.

5.1.1 PS-RQ1. How many secondary studies have been identified per publication year?.

To reply to the first publication space question, we depicted in Figure 2 the distribution of selected secondary studies per publication year. As shown in Figure 2 , 2009 is the first year for which we selected a secondary study. Most of the selected secondary studies (75%) were published in the past six years (2017 to 2022), thus showing an increasing interest in the research community in conducting secondary studies about the application of AI in ST.

5.1.2 PS-RQ2. Which types of secondary studies have been executed?.

The second PS-RQ is about the types of selected secondary studies. For each study, we report in Figure 2 the corresponding type, i.e., either Systematic Literature Review (SLR) in light green color or Systematic Literature Mapping (SLM) in blue color. We followed the guidelines defined by Kitchenham and Charters [ 52 ] to verify whether a secondary study was correctly classified as SLR or SLM by its authors, and changed the classification when needed. For instance, F14 and F15 were originally classified as SLRs by their authors. After a careful analysis, we opted to classify them as SLMs, since the authors: (i) did not perform a quality assessment of selected primary studies; (ii) summarized the selected works without executing a meta-analysis. From the data presented in Figure 2 , we can observe that our selection includes 10 ( \(50\%\)) SLRs and 10 ( \(50\%\)) SLMs.

5.1.3 PS-RQ3. What are the venues where the secondary studies have been published?.

The third PS-RQ aims to analyze the venues where the selected secondary studies have been published. Table 5 reports the type, name, and rank ( Scimago Journal & Country Rank —SJR quartile 11 for journal papers and Computing Research and Education —CORE rank 12 for conference papers) of the venues of the 20 selected secondary studies. The table shows that 14 (70 \(\%\)) studies were published in journals and the remaining 6 studies ( \(30\%\)) were part of the proceedings of conferences, workshops, symposiums, or seminars. It is worth observing that 13 of the 14 journal papers have been published in top-ranked venues (according to the SJR quartile in which the venue is classified), with 6 of them published in the Information and Software Technology Journal . 13 Thus, from our selection, we can derive that the topic of AI applied to ST is largely covered by top-ranked journals.


Journal	Information and Software Technology	Q1		F1, F2, F3, F7, F10, F20
	ACM Computing Surveys	Q1		F13
	Applied Soft Computing	Q1		F9
	e-Informatica Software Engineering Journal	Q3		F8
	Empirical Software Engineering	Q1		F19
	IEEE Access	Q1		F17
	IEEE Transactions on Reliability	Q1		F12
	IEEE Transactions on Software Engineering	Q1		F11
	Journal of Systems and Software	Q1		F4
Conference	Brazilian Symposium on Systematic and Automated Software Testing		Not Available	F16
	International Conference on Evaluation of Novel Approaches to Software Engineering		B	F5
	International Conference on Internet of things, Data and Cloud Computing		C	F15
	International Workshop on Evidential Assessment of Software Technologies		Not Available	F6
	International Workshop on Test Oracles		Not Available	F18
	Seminar on Ontology Research in Brazil		Not Available	F14

Table 5. Secondary Studies Per Venues’ Types and Names

5.1.4 PS-RQ4. What are the authors’ affiliation countries of the selected secondary studies?.

The fourth PS-RQ is aimed at analyzing the countries of affiliation of the authors of the selected studies. Among the 20 selected studies, we found 68 different authors, with 5 authors (i.e., Érica Ferreira de Souza, Juliana Marino Balera, Lionel C. Briand, Nandamudi Lankalapalli Vijaykumar, and Juliana Marino Balera) involved in two studies each. We analyzed the countries of affiliations of these 68 authors, resulting in 16 unique affiliation countries. Since three authors reported a second affiliation country and five authors were involved in two different studies, we counted a total of 76 affiliations. Most of the selected studies have authors with affiliations from only one country, except studies F1 [ 11 ] (Brazil and UK), F2 [ 35 ] (Austria and Northern Ireland), and F3 [ 96 ] (Argentina and Uruguay), which included authors with affiliations from two different countries each. Figure 3 shows a world map of the authors’ affiliation countries, with each color representing a different value for the number of affiliations. In particular, 27 ( \(35.53\%\)) affiliations are counted for Brazil , 10 ( \(13.16\%\)) for Malaysia , 6 (7.89%) for Sweden , 4 ( \(5.26\%\)) for Argentina, Canada, Norway, and Pakistan, each, 3 ( \(3.95\%\)) for the Czech Republic, India, and Iran each, 2 ( \(2.63\%\)) for Austria, United Kingdom, and Uruguay each, and 1 ( \(1.32\%\)) for Luxembourg and Turkey, each. From the extracted data, we can observe that most of the affiliations (33 over 76) are located in South America (Brazil, Argentina, and Uruguay). Interestingly, affiliation countries that typically dominate in computer science or computer engineering (e.g., USA and China) do not occur in our observations. 14

5.1.5 PS-RQ5. What is the amount of primary studies analyzed by the selected secondary studies, and how are they distributed over time?.

The goal of this RQ is twofold: (i) to compute the number of primary studies that have been reviewed by the selected secondary studies and (ii) to understand how these primary studies are distributed over the publication years. Figure 4 shows, for each selected secondary study, the number of reviewed primary studies and how many of these studies are unique, i.e., works that have not been reviewed by any other secondary study. Looking at the figure, it shows that the 20 selected secondary studies analyzed a total of 807 primary studies, of which 710 ( \(87.98\%\)) were unique. Figure 5 shows the distribution of the unique primary studies per publication year. The figure shows that the reviewed 710 unique primary studies cover a period of 27 years, going from 1995 to 2021; 444 ( \(62.5\%)\) of these studies have been published in the past 10 years and 264 ( \(37.18\%)\) from 2015 to 2021. Primary studies that were “unique” in any of the older secondary studies could, in theory, have been found by the newer secondary studies. However, the publication date of a primary study would be just one of the factors that may lead it to be included in only one of the secondary studies. In addition, the search protocols for secondary studies also vary greatly in the choice of search strings and inclusion and exclusion criteria, leading to the great diversity of primary studies selected. The large amount of unique primary studies reviewed by the selected secondary studies 15 and their distribution over time leads us to two interesting observations. First, we can state that our set of secondary studies is representative of the research conducted on the topic of AI applied to ST. Moreover, as will be confirmed by RS-RQ1 and RS-RQ2 (see Sections 5.2.1 and 5.2.2 ) the set of unique primary studies reviewed by these works is a significant sample of primary studies covering broad aspects of AI in ST. Second, we can infer that the topic of AI applied to ST is of interest to the research community and that the interest has grown over the past decade. Finally, it is worth observing that the research topic of AI applied to ST is not new for the research community, indeed the first 19 ( \(2.67\%\)) primary studies in this field date from the late 1990s.

5.2 Research Space Research Questions—Results

In the following sections, we answer the four RS-RQs of our study.

5.2.1 RS-RQ1. What AI domains have been applied to support ST?.

To identify the AI domains from which solutions were applied to support ST, we analyzed the list of sentences about the applied AI domains that were extracted from our sources during the data extraction process with reference to the taxonomy introduced in Section 2.1 . As a result of this analysis, in Figure 6 , we report, for each AI domain concept, the list of secondary studies in which we found evidence of its application in ST. The most important findings we can derive from Figure 6 follow: (1) most of the secondary studies (11 of 20) investigated the application of AI solutions (i.e., algorithms, models, methods, techniques, etc.) belonging to the Planning and Scheduling / Searching / Optimization sub-domains to ST. Specifically, most surveyed AI solutions belonging to this domain are: evolutionary algorithms , genetic algorithms , and metaheuristic optimisation ; (2) 9 of 20 secondary studies focused on the application of Machine learning solutions. In particular, F12 [ 30 ] and F5 [ 98 ] covered almost all the concepts in this AI domain; (3) 6 of 20 secondary studies surveyed the support provided by Knowledge representation/Automated reasoning/Common sense reasoning AI solutions. Specifically, most of these studies analyzed the use of ontologies and F4 [ 25 ] is the study that surveyed the use of most of the concepts in this AI domain; (4) few secondary studies surveyed works on the application of Natural language processing (5 of 20), Multi-agents systems (3 of 20), and Computer vision (1 of 20). It is worth noticing that, within the Natural language processing domain, the most surveyed applications are based on text mining , while only one study surveyed applications of word embeddings . Notably, only one of the secondary studies analyzed the use of image processing techniques belonging to Computer vision .

By analyzing the publication years of the selected studies (see Figure 2 ), we can observe that most of the works (6 of 9) surveying the application of machine learning to ST have been published very recently (2020 or later). This indicates a growing interest in this AI domain. Similarly, 4 of 5 studies investigating the use of Natural language processing have been published after 2020, highlighting the timeliness of this research field, with a special focus on text mining and word embedding . Secondary studies analyzing the use of Planning and Scheduling/Searching/Optimization cover a period spanning from 2009 to 2020, showing a consolidated and still of interest research topic.

5.2.2 RS-RQ2. What domains of ST have been supported by AI?.

Similar to what we did to answer RS-RQ1, we analyzed the sentences collected from each secondary study during the data extraction process and annotated each study with the ST domain concepts involved in them, according to the taxonomy introduced in Section 2.2 . The result of this analysis is shown by Figure 7 , where, for each ST domain, we report the list of secondary studies in which we found pieces of evidence of the application of an AI solution to the specific ST domain. From Figure 7 , we can observe the following: (1) Almost all selected secondary studies (19 of 20) have surveyed studies about the application of AI to the Testing activity ST domain. In particular, the most recurrent ST concepts of this domain are: Test Case Optimization/Prioritization/Selection (11 of 20), Test Data Definition (10 of 20), and Test case generation (8 of 20); (2) 12 secondary studies surveyed the use of AI in the Testing Technique domain. In this ST domain, Mutation Testing and Requirement-based Testing are the testing techniques for which more secondary studies found evidence of AI applications (6 and 4 studies, respectively); (3) 11 secondary studies showed evidences of AI applied in the Testing Objective ST domain. In particular, 5 studies showed the use of AI to support Functional Testing , 5 studies analyzed primary sources where AI was applied to Non-functional Testing , 5 studies showed evidence on the application of AI to GUI Testing , and 4 surveyed the use of AI for Regression Testing ; (4) few secondary studies (3 of 20), reported evidence on the use of AI in the Test Target ST domain. Among these 3 secondary studies, 1 covered the use of AI in Unit Testing , 1 the application of AI in Integration Testing , and 2 the AI support in System Testing ; (5) Software Testing Fundamentals ST domain has been covered by 3 secondary studies. All these works reported evidence on the use of AI to support the introduction and standardization of terms and definitions in the field of the Testing Related Terminology .

Overall, the most important finding we can derive from Figure 7 is the evidence of an intense application of AI to: (1) the development of test cases, including their generation and the definition of test cases’ input and expected output, i.e., test oracles. To aid the test oracle definition, AI has been applied to metamorphic-based testing and to GUI testing; (2) the management of the test cases, particularly, their prioritization and selection, which is confirmed by the use of AI for regression testing; (3) the generation of test cases from requirements using natural language processing and knowledge representation techniques; (4) the detection of equivalent mutants and the generation of new mutants in mutation testing techniques; and (5) the testing of both functional and non-functional requirements.

5.2.3 RS-RQ3. Which ST domains have been supported by which AI domains and how?.

To answer RS-RQ3, we discuss the evidence collected from the selected secondary studies concerning what AI domains have been applied to support what ST domains. The bubble chart in Figure 8 reports the number of secondary studies that investigated the application of a given AI domain to a specific ST domain. From the chart, we can derive the following observations: (1) Testing Activity and Testing Objective are the only two ST domains for which we found evidence of the application of solutions from all the AI domains. Also, with the exception of Software Testing Fundamentals , the AI domains Planning , Communication , Learning , and Knowledge have been applied to all ST domains; (2) Knowledge is the only AI domain for which we found evidence of applications to Software Testing Fundamentals , moreover, it is the only AI domain involved in all the ST areas; (3) the majority of selected secondary studies (10 of 20) analyzed the application of AI techniques belonging to the Planning domain for supporting the Testing Activity , thus being the most surveyed interplay of AI and ST; (4) the second most surveyed interplay of AI and ST is Learning applied to Testing Activity . Moreover, we evidenced that machine learning is the only key concept belonging to the Learning AI domain that has been exploited in this ST domain; (5) very few secondary studies surveyed the application of the Integration & Interaction and the Perception AI domains to ST. More precisely, Multi-agent systems and Computer vision are the only AI key concepts belonging to these domains for which we had evidence of application in ST; (6) the Software Testing Fundamentals domain characterizes the Software Testing Fundamental terms and definitions. This justifies why Knowledge is the only one AI domain for which we found evidence of application to this ST domain.

Fig. 8. AI and ST domain pairs covered by the selected secondary studies. For each pair of domains, we report the count of distinct secondary studies surveying the corresponding applications of AI to ST.

Additionally, to deepen our discussion, we analyzed the pieces of evidence extracted from the selected secondary studies, to identify more in detail which AI methodology has been applied to specific ST fields. The results of this analysis are reported in Tables 6 (a) and 6 (b). Each table cell lists the secondary studies in which we found evidence of the application of a specific AI domain/subdomain (column) to support a specific ST domain/field (row). Looking at the mapping at a bird’s-eye view, we can observe that: (1) evolutionary algorithms , genetic algorithms , and metaheuristic optimisation have been applied to almost all the ST domains and fields, and (2) test case generation , test oracle definition , test case optimization/prioritization/selection , test data definition , Requirement-based Testing , and Mutation Testing are the ST fields that have seen support from most of the AI domains.

Table 6. (a) First Part of the Resulting Mapping

Table 6. (b) Second Part of the Resulting Mapping

Furthermore, to deepen the understanding of RS-RQ3, we drilled down into the cells with several sources assigned to them (i.e., 3 and 4). Such cells indicate that the associated AI concepts have been extensively applied to support the related ST fields. For such cells, in the following bullet list, we summarize (1) the commonalities and differences in the application of the AI techniques identified by the analyzed works, and (2) the difficulties and limitations of the application of the AI techniques to ST objective, and (3) practical insights.

(1)	and were found to be the most used AI techniques to support , with a prevalence of w.r.t. . As an example, F10 [ ] reports that an . While F8 [ ] states: These two search-based techniques have been applied to primarily for two purposes, either for mutant optimization or for . However, most of the proposed techniques are either presented in a general manner or are not sufficiently empirically evaluated and can not serve as a base for enabling practitioners to choose a specific technique for a given software. The major challenges include the effort and cost entailed in mutation testing and thus limiting its application to testing real-world programs. As stated by F1 [ ] and F16 [ ], very few works explored the application of to , while this technique could bring the advantages of generating stronger mutants and reducing the number of mutants used. As highlighted by F10 [ ], meta-heuristic search techniques, and genetic algorithms in particular, have been also effectively applied for the selection of mutant operators and the generation of mutants and generation of test data. From a more practical point of view , , , , and have been extensively used in search-based mutation testing.
(2)	has also been widely used for and as it can be drawn from F9 [ ] ( ) and F10 [ ] ( ). According to F10 [ ], experiments conducted in this field showed unsatisfactory results, with the most important challenge being the time necessary for obtaining a good solution, in terms of test cases and test data definition, when more than one solution must be found. However, preliminary results indicate that the use of meta-heuristic search techniques for reducing both the costs and efforts for test data generation in mutation testing is promising. In the field, and have been widely applied for “global” search-based techniques (SBTs), i.e., the effective search for global optimal solutions to overcome the problem of getting stuck in local optima. Subsequently, “local” SBTs are used to efficiently obtain the optimal solution starting from global ones. In particular, hill climbing and simulating annealing are the most common examples of local SBTs (F9 [ ]). From a more practical point of view, seem to outperform random search in for structural coverage (F11 [ ]).
(3)	has been extensively used for , , , , and . Examples of these applications are reported in F7 [ ] ( ), F9 [ ] ( ), F1 [ ] ( ), and F8 [ ] ( ). Although several secondary studies showed that metaheuristic-based techniques have been extensively used to provide solutions for automatizing testing tasks (such as test case selection and test order generation) and for implementing more cost-effective testing processes, some studies, in particular F5 [ ] and F11 [ ], also highlighted the need for additional empirical experimentation to demonstrate the applicability and the usefulness of metaheuristic in more realistic industrial scenarios.
(4)	is the most widely used NLP technique for . As an example, F2 [ ] and F5 [ ], respectively, report: As pointed out by F2 [ ] and F15 [ ], the use of NLP-assisted software testing techniques and tools has been found highly beneficial for researchers and practitioners, as they reduce the cost of test-case generation and the amount of human resources devoted to test activities. However, for a wide industrial usage of NLP-based testing approaches, more work is required to increase their accuracy. Moreover, comparative studies should be performed to highlight strengths and weaknesses of NLP tools and algorithms.
(5)	have been mainly adopted to support the introduction and standardization of terminologies and definitions in ST. Several examples of such application are reported in F3 [ ], F4 [ ], and F14 [ ], respectively: ; ; As highlighted by F3 [ ], the main benefit of having a suitable software testing ontology is to minimize the heterogeneity, ambiguity and incompleteness problems in terms, properties and relationships. Another potential value of using ontologies and, more in general, semantic web technologies in software testing highlighted by F4 [ ] is that they can provide a more powerful mechanism for sharing test assets that are less application-dependent and hence more reusable. By analyzing the terminological coverage of the selected ontologies, in F4 [ ] the authors observed that most ontologies cover terms related to dynamic and functional testing. Conversely, only a few ontologies consider terms related to static and non-functional testing. Similarly, the authors of F14 [ ] highlighted that most ontologies have limited coverage and none of them is truly a reference ontology or is grounded in a foundational ontology. In conclusion, the software testing community should invest more efforts to get a unique and well-established reference software testing ontology.
(6)	have been used for several testing activities such as oracle definition, test-case generation, test-case refinement, and test-case evaluation. The following evidence of the applications of for are reported in F12 [ ] and F18 [ ], respectively: ; Regarding the activity, F2 [ ] observed that test oracles obtained by using are more efficient, effective, and reusable compared to those generated with existing traditional approaches. Additionally, F12 [ ] identified the main advantages of using and in their scalability and in the minimal need of human intervention. As for the main problem faced by researchers when trying to apply and to solve software testing problems, both F12 [ ] and F18 [ ] identified the need for a substantial amount and high-quality training data, which is the key for algorithms to function as intended.
(7)	, , and AI methodologies have been widely adopted for , as highlighted by F19 [ ]: Similarly, F17 [ ] states that: “...the publication trend of ML technique applied to Test Case Prioritization...shows that the technique category was the most popular followed by then come as the last preferred.” F12 [ ] also reports F17 [ ] reported that is the most used ML technique as it benefits from the availability of historic data, which results in a high average percentage of faults detected and code coverage effectiveness. F17 [ ] also highlighted that requires a more structured process and improvements before it is mature enough to be included in undergraduate taught programs. Interestingly, F19 [ ] highlights that, although , , , and are the four main ML techniques used for test case selection and prioritization, some combinations of them have also been reported in the literature. For example, NLP-based techniques, which are often used for feature preprocessing, were combined with supervised or unsupervised learning to achieve better performance for test case prioritization. F19 [ ] highlighted that the lack of standard evaluation procedures and appropriate publicly available datasets resulting from the execution of real world case studies makes it very challenging to draw reliable conclusions concerning ML-based test case selection and prioritization performance. Thus, getting the research community to converge toward common evaluation procedures, metrics, and benchmarks is vital for building a strong body of knowledge we can rely on, without which advancing the state-of-the-art remains an elusive goal.

As a final consideration, we can highlight that the application of word embedding to Test Case Optimization/Prioritization/Selection has been observed only recently, in 2021 by F19 [ 74 ], which reports that “NLP-based techniques are used for processing textual data, such as topic modeling, Doc2Vec, and LSTM. NLP-based techniques can also be mixed with other ML or non-ML techniques.” Moreover, word embedding and neural NLP models are becoming more and more pervasive in trans-disciplinary studies and applications, and since foundation models 16 are receiving much attention from both academic and industrial researchers, we expect that in the near future NLP will be more extensively applied also to support ST.

5.2.4 RS-RQ4. What are the future research directions of AI in ST?.

Table 7 summarizes the most recurrent future research directions in AI applied to ST emerging from the analysis of the selected secondary studies, and the list of studies mentioning them. The table was built by analyzing the sentences, extracted from each study, discussing future research directions and grouping sentences indicating similar research directions. Finally, for each group, we defined a category by means of a short summary of the research direction. The need for more rigorous experimental research is the most recurrent future research direction (8 of 20 studies). For instance, the authors in F12 [ 30 ] state that “most research efforts are not methodologically sound, and some issues remain unexplored.” While in F11 [ 3 ], the authors report that empirical evidence is needed to assess how “AI-supported techniques (can outperform) current software testing techniques.” Three studies identified the need to develop evidence with real systems , i.e., to fill the lack of studies investigating the application of AI to ST of larger and more complex software systems. As an example, the authors of F16 [ 57 ] observed that “the great majority of the conducted evaluations do not use real and large systems.” Similarly, in F12 [ 30 ], the authors identified the lack of AI applications to “a wider range of software testing problems.” We believe that this future research direction might mitigate the current challenges in the applicability and transferability of AI applications to ST in industrial settings. The authors of F13 [ 81 ] identify the need of introducing new data type representation for test data generation to apply genetic algorithms for automated definition of input/output test values. Another research direction emerging from the analysis is meant to apply ML to support automation . The authors from F12 [ 30 ] suggest more research be conducted to evaluate how machine learning approaches can be used to support ST automation by claiming: “We believe that the overarching motivation for research in this area should be automating most software-testing activities.” Moreover, the authors of F14 [ 28 ] discuss the necessity to develop an ontology for ST as they concluded that “operational versions” of ST taxonomies must be “designed and implemented.” Finally, 9 of 20 studies do not propose any future research direction.


More rigorous experimental research	F4, F7, F10, F11, F12, F15, F16, F17
Develop evidence with real systems	F12, F16, F18
New data type representation for test data generation	F13
Apply ML to support Automation	F12
Develop an ontology for ST	F14
None	F1, F2, F3, F5, F6, F8, F9, F19, F20

Table 7. Future Research Directions Indicated by the Selected Secondary Studies

6 FURTHER DISCUSSION

In this section, we first provide additional general considerations on the results of our study (Section 6.1 ). Then, we focus on Testing Activities whose automation has been supported by different AI techniques and, for each AI technique, synthesize the main purpose has been used for (Section 6.2 ).

6.1 Overall Considerations

Replicability of primary studies: . As mentioned in Section 5.2.4 , we found that 8 of 20 secondary studies have highlighted the need for rigorous empirical researches to evaluate the outcomes presented by the primary studies. Drawing from this need, we believe that future secondary studies should devote more attention to this aspect by including specific research questions or quality assessment criteria aimed to evaluate the replicability of the surveyed studies.

Lack of benchmarks about the interplay between AI and ST: We observed the lack of benchmarks that practitioners and researcherscan use to assess the outcomes of applying a specific AI technique to support ST. We feel that this could be an important line of research that can be underpinned by the mapping developed in this study. In particular, benchmarks could include datasets and case studies for which results are already known, and performance metrics the proposed AI-supported ST approaches could be compared against. We also feel that the availability of these benchmarks could facilitate future research advancements by providing a common set of outcomes to outline new research questions and performance metrics.

Use of the mapping from the point of view of ST engineers: . ST engineers can use Tables 6 (a) and 6 (b) to find secondary studies about the AI methodologies that have been already applied to support specific ST domains and concepts. Each non-empty cell indicates that a specific AI concept has been already applied to support a given ST activity or field. For instance, let us suppose we have a practitioner interested in “Test Data Definition.” The practitioner can look at Tables 6 (a) and 7 (b) and find out which AI methodologies have been leveraged to support this activity. Moreover, each of the secondary studies reported in non-empty cells supplies pointers to primary studies providing additional details on the specific application of AI in ST. In this specific example, the practitioner interested in the application of “genetic algorithms” can deepen this topic by retrieving the primary studies surveyed by the four secondary studies listed in the corresponding cell, i.e., F5, F8, F10, and F11.

Empty cells as food for thought for researchers: . Researchers can use the mapping (Tables 6 (a) and 7 (b)) to identify new research opportunities by inspecting empty cells. An empty cell in these tables means that we did not find evidence of the application of a specific AI concept to a given ST one. Possible explanations for empty cells that should be properly taken into account by researchers are:

:	There are not enough primary studies on the application of the specific AI concept to a given ST field of interest. As a result, such application has not permeated through the secondary sources and into the resulting mapping of this tertiary study.
:	It represents a greenfield opportunity for research, which can be in the form of novel primary studies, or secondary studies that address the mapping associated to the specific empty cell; we note that, similar to this explanation for empty cells, an opportunity to conduct a secondary study is associated to cells of Table 7 including only one study published not recently. As an example, the only secondary study that surveyed the application of AI to support is F7, that has been published in 2009. As a result, an update of the F7 study could be of interest for the research community.
:	It is a false negative for our study. While we have taken great care with the analysis of our secondary sources, there is still the chance that we have missed a reported application.
:	It is not possible to apply the specific AI solution to the specific ST problem. The cell might be empty, because the application of AI to software testing might not be feasible. Either temporarily due to limitations in computing power, or by construct, where the application of a specific mapping would not make sense. Researchers must be aware of this possibility when using the mapping as inspiration for research directions.

To exemplify how researchers can use empty cells, let us suppose we are interested to explore the “New Data type representation for test data generation” future research direction reported in Table 7 . This future research direction is in relationship with the application of the knowledge representation reasoning AI concept to the Test Data Generation field; such application corresponds to an empty cell in Table 6 (a). At this point, we can use the rows and columns labels as relevant keywords to perform an initial search in Scopus. To follow this example, we executed this search string “(TITLE-ABS-KEY (knowledge AND representation AND reasoning) AND TITLE-ABS-KEY (test AND data AND generation))” in Scopus and it returned 17 studies. 17 We analyzed these papers and derived that just one of them could be potentially related to the empty cell or considered useful for the future research direction we are interested in. As a result, we can argue that this empty cell is consistent to support Explanation 1 , Explanation 2 , and Explanation 3 , and clearly not supportive of Explanation 4 . If several primary studies related to the empty cell would have been returned from Scopus, then only Explanation 2 and Explanation 3 would have been applied.

Use of standard or well-recognized terminologies and taxonomies: . We value the use of standard or well-recognized taxonomies (i.e., AI Watch [ 88 ] and SWEBOK [ 18 ]) as sources of a common language for our domain area. As such, they have been adopted to guide the analysis process. However, our analysis process shows how this outlook is not shared by the community. This puts a toll on the analysis process (in terms of construct validity threats, which we discuss in Section 7 ) to push the analysis forwards, as an agreement has to be reached upon the term used to describe a phenomenon. Needless to say, we do not view that standards or well-recognized taxonomies need to be static. Not only that these evolve, but novel research proposals might need novel terminology. Yet in general, we observed a lot of variations for concepts that are (or are supposed to be) well understood. We are far from the first to highlight this issue (for instance, see [ 36 , 84 ]), and in particular at the interplay of AI and software testing, Jöckel et al. [ 44 ] highlight how this issue becomes problematic for data analysis in our field and for extending and comparing research results.

6.2 AI Techniques Used to Support the Automation of Testing Activities

As it results from Table 6 , several AI techniques have been applied to support ST. In this section, we focus on Testing Activities whose automation has been supported by different AI techniques and synthesize the main purpose for which each AI technique has been used for. 18

AI for Test Case Generation: . Secondary studies shared similar conclusions about how AI techniques have been applied to support the test case generation activity. Search-based AI techniques have been used to generate optimal test suites according to a given adequacy criterion, such as code coverage or fault detection. NLP-based techniques have mainly been used to reduce the manual effort of extracting test cases from requirements, specifications and UML models [ 2 , 35 , 98 ]. ML is considered an emerging AI topic for the automation of test case generation. To be applied, these techniques have to learn a behavioral model of the application under test. Such a model is usually built starting from a dataset of inputs and outputs, or on the fly during the exploratory testing of the application under test. The latter approach is mostly used in GUI-based testing, where the user interface is explored and tested at the same time by triggering user events [ 30 ]. Ontologies have been used to build a vocabulary of terms that is specific for characterizing the application context of the software under test. The vocabulary can be used to build (1) abstract test cases, i.e., test cases that are not specific to a programming language or (2) platform specific executable test cases, i.e., test cases implemented in a specific programming language. Ontologies have also been used within NLP-assisted test case generation processes to impose restrictions on the context at hand and convert textual documents into an explicit system model for scenario-based test-case generation [ 25 , 35 ].

AI for Test Case Optimization/Prioritization/Selection: . The analyzed studies also pointed out a joint observation that considers the ML-based techniques as the most exploited and promising ones to select or to prioritize the test cases from a test suite for reducing the testing resources such as the testing time and the use of expensive devices [ 12 , 30 , 50 ]. Possible interesting applications of ML show that specific models can be trained, from a dataset of test suites, to select test cases that minimize the testing time or to predict defects in the system under test. The reduction of the testing time allows also the introduction of test case optimization processes based on ML in modern continuous integration development processes. ML leaning-based techniques may also be combined with NLP-based ones. The use of NLP is needed to process textual data for building the dataset used to train the models [ 74 ]. Another common conclusion regards the use of ontologies. Semantic web-based techniques are the most used ontologies to define traceability links between test cases, test results and requirements. Such links are exploited to profile the test cases and to select or prioritize the ones that guarantee specific testing adequacy criteria, such as coverage of requirements or failure discovery [ 25 , 96 ].

AI for Test Data Definition: . Most of the secondary studies reached a common consensus that considers the ant colony optimization techniques and genetic algorithms (GAs) as the most cost-effective for the automatic test data definition in the context of mutation testing [ 3 , 43 , 81 , 89 ]. GAs have been considered as the most effective solution for the automatic generating of test data for structural, functional, and mutation testing, and it has also been successfully exploited to generate data for testing floating point computations and expert systems. NLP and ML approaches have been mainly used to generate test data for GUI testing and, in particular, for mobile GUI testing. NLP have also been exploited to generate input values expressed in natural language [ 2 , 35 ], whereas ML techniques (such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Unsupervised and Reinforcement Learning) are used in automated exploratory testing to generate inputs (i.e., user events on the application GUI) allowing the exploration of application states that were not previously visited [ 32 , 98 ].

AI for Test Oracle Definition: . The studies concluded that ML has the potential to solve the “test oracle problem,” i.e., the challenge of automatically generating oracles. ML algorithms have been used to generate test verdicts, metamorphic relations, and most commonly expected output oracles [ 30 , 98 ]. In particular, ML-based predictive models are trained to serve either as a stand-in for an existing test oracle (used to predict a test verdict) or as a way to learn a function that defines expected outputs or metamorphic relationships and that can be used to issue a verdict. Supervised and semi-supervised ML approaches seem to be the most promising; the associated ML models are trained on labeled system executions or on source code metadata. Of these approaches, many use some type of neural networks, such as Backpropagation NNs, Multilayer Perceptrons, RBF NNs, probabilistic NNs, and Deep NNs. Others apply support vector machines, decision trees, and adaptive boosting [ 32 ]. The studies showed great promise, but significant open challenges. The performances of the trained ML models are influenced by the quantity, quality, and content of the available training data [ 32 ]. Models should be retrained over time. The applied techniques may be insufficient for modeling complex functions with many possible outputs. Research is limited by the overuse of simplistic examples, lack of common benchmarks, and unavailability of code and data. A robust open benchmark should be created, and researchers should provide replication packages. Computer vision approaches are mainly used to support the oracle definitions in the context of GUI-based testing [ 98 ], where the verdicts need that specific regions or images of the graphical user interface are recognized to check their correctness, such as the color, the position on the screen, or the quality of the image.

7 THREATS TO VALIDITY

This section discusses the main possible threats to the validity of our tertiary study, classifying them according to Petersen et al. [ 77 ] and drawing suggestions from Zhou et al. [ 107 ]. Thus, we classified the Threats to validity into (i) threats to Construct Validity, (ii) threats to internal validity, and (iii) threats to external validity.

Threats to Construct Validity . The use of different terminologies for AI and ST concepts in the selected secondary studies can lead to misclassification. As a strategy to mitigate this possible threat, we started from well-known taxonomies for both the AI [ 88 ] and ST [ 18 ] domains. In addition, the process of classifying the extracted data was performed iteratively and peer-reviewed by the authors. Furthermore, relevant concepts emerging from secondary studies were added to the adopted reference taxonomies, when missing.

Threats to Internal Validity . One of the major issues with systematic mappings is the risk of missing relevant studies. To mitigate this risk, we adopted a structured process to define and validate our search string, as suggested by Petersen et al. [ 77 ], and selected four major digital libraries to execute appropriate queries derived from it. In particular, our search string was designed to retrieve the largest number of published secondary studies by searching for the terms survey , mapping , review , secondary study , or literature analysis in the title or abstract of the papers. Furthermore, a snowball search process was performed to possibly find additional studies of interest. Another possible threat regards our decision to exclude gray literature papers, such as technical reports and graduate theses, that could lead to miss relevant secondary studies. However, since we reviewed secondary and not primary studies, the risk of excluding relevant but not peer-reviewed material is low. Biases or errors in the application of IC and EC as well as in the quality assessment of papers is another threat to the validity of our study. We mitigated this threat by having each selected paper examined by two groups of co-authors, including an AI expert and an ST specialist each, and having eventual disagreements resolved by face-to-face discussions between the members of the two groups.

Threats to External Validity . Publication bias is another common threat to the validity of secondary and tertiary studies [ 97 ]. In particular, the result of our study might have been biased from inaccurate results reported in the selected secondary studies. A common reason for this is that primary studies with negative results are less probable to get accepted for publication and, as a consequence, to be taken into account by secondary studies, and therefore not permeating through to a tertiary study. Another external validity threat for our study relates to the risk of not extracting all the relevant information available in the selected studies or incorrect interpretation of the extracted data. Both these risks may have caused an inaccurate mapping of some analyzed studies. We tried to mitigate this threat by having an AI expert and an ST specialist involved in the data extraction and mapping of each study and resolving eventual disagreements in a face-to-face discussion. Our data extraction could have missed emerging trends provided by recently published primary studies that were not surveyed yet by any secondary studies. Also, since a tertiary study is based on data aggregated in secondary studies, it is possible that relevant information that was present in primary studies was omitted in the secondary studies and thus missed by our study. This threat is inherent to any tertiary study.

8 CONCLUSIONS

The goal of our tertiary study was to systematically understand how AI has been applied to support ST. As a result, we were able to uncover the interplay between the two domains and to reveal trends and possible future research directions. To achieve this goal, we defined nine RQs (five publication space RQs and four research space RQs) and conducted a systematic mapping study. We designed a strict research protocol and followed a systematic and peer-reviewed process to: (1) select our sources of information, (2) extract evidence from them, and (3) analyze the extracted data to answer our RQs. Starting from an initial set of 877 secondary studies retrieved from four major computer science digital libraries and an additional set of 296 studies retrieved by applying snowballing, the selection process led us to 20 relevant high-quality secondary studies. The analysis of the data extracted from the selected studies let us answer our RQs and derive the following main conclusions.

As for the publication space RQs: (1) the distribution of the selected secondary studies over the publication years ( \(75\%\) of them were published in the past six years), the large amount of unique primary studies they surveyed (710), and the distribution of these primary studies over time (the first dating 1995 and almost two-thirds of them appearing in the past ten years) show a growing interest from the research community in a well-consolidated research topic; (2) most of the selected studies were published in journal venues and a large part of them appeared in top-ranked journals, indicating the high importance of the topic; and (3) most of the authors’ affiliations are located in South America (Brazil, Argentina, and Uruguay), while affiliation countries that typically dominate in computer science or computer engineering publications (e.g., USA and China) do not occur in our observations.

Regarding the research space RQs: (1) several AI domains have been applied to support ST with the Planning being the most popular one, and machine learning and natural language processing the most trendy; (2) several ST domains have been supported by AI. Almost all selected secondary studies surveyed the application of AI to the Testing Activity ST domain, and a majority of them surveyed the application of AI to the Testing Technique domain. Overall, it results that, in recent years, AI has been pervasively introduced in ST; (3) the majority of selected secondary studies investigated the application of Planning to support the Testing Activity , thus resulting the most surveyed pair of domains; (4) except for Software Testing Fundamentals , all ST domains have received support by more than one AI domain; in particular, Testing Activity and Testing Objective have seen applications from all AI domains. Similarly, by analyzing our mapping at a finer grain level, it results that most ST fields have received support from more than one AI concept, with some concepts having been applied only recently (e.g., word embedding ); and (5) most frequent future research directions emerging from the selected secondary studies are: (i) the need for more rigorous research, (ii) the evaluation of the proposals in larger or real-world software systems, (iii) more research to evaluate how machine learning approaches can be applied to support software testing automation, and (iv) the need for the development of new types of representations to apply genetic algorithms for test data generation.

To the best of our knowledge, this research is the first tertiary study investigating how AI is used to support ST. As a result of this research, we obtained a fine-grained mapping that describes the current interplay between AI and ST. Researchers can leverage this mapping to identify opportunities for future research on new secondary studies to be conducted or new applications of AI to ST to be developed. Practitioners can also use the mapping to take an informed decision on which AI technology to possibly adopt in support of their testing processes.

1 The European Commission knowledge service to monitor the development, uptake, and impact of artificial intelligence for Europe.

2 W3C https://www.w3.org/

3 SEMANTIC WEB-W3C https://www.w3.org/standards/semanticweb/

4 Control papers are used to calibrate the search string by representing, to the best of the research team’s knowledge, the characteristics of the “ideal” type of research publication that the team is looking for. Both the Kitchenham [ 52 ] and Petersen [ 77 ] guidelines highlight the need to check trial search strings against a list of already known studies. We used the Scopus digital library for the search string validation.

5 http://dl.acm.org/

6 http://ieeexplore.ieee.org/

7 https://www.webofknowledge.com/

8 http://www.scopus.com/

9 No time limits were considered in the search.

10 https://scholar.google.com/

11 https://www.scimagojr.com/

12 http://portal.core.edu.au/conf-ranks/

13 https://www.journals.elsevier.com/information-and-software-technology

14 https://www.natureindex.com/annual-tables/2021/country/all

15 We can assume that 710 is only a part of the primary works in the literature on the subject, and thus that the number of published primary studies is even higher.

16 Rishi Bommasani and Percy Liang (Oct. 2021), Reflections on foundation models: https://thegradient.pub/reflections-on-foundation-models/

17 June 16, 2022.

18 Due to space constraints, we limit the discussion to Testing Activities that have been supported by more than three AI techniques.

Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
Reference 7
Reference 8
Reference 9
Reference 10
Reference 11
Reference 12
Reference 13
Reference 14
Reference 15
Reference 16

Index Terms

Computing methodologies

Artificial intelligence

Machine learning

General and reference

Document types

Surveys and overviews

Software and its engineering

Software creation and management

Software verification and validation

Software defect analysis

Software testing and debugging

Recommendations

Contemporary challenges and solutions in applied artificial intelligence, artificial intelligence and software engineering: understanding the promise of the future, artificial intelligence and software engineering, login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Information
Contributors

Published in

University of Sydney, Australia

This work is licensed under a Creative Commons Attribution International 4.0 License.

In-Cooperation

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 6 October 2023
Online AM: 17 August 2023
Accepted: 3 August 2023
Revised: 25 May 2023
Received: 19 July 2022

Check for updates

Author tags.

Software testing
Tertiary study
Systematic literature review
Systematic mapping study

Funding Sources

Other metrics.

Bibliometrics
Citations 0

Article Metrics

0 Total Citations View Citations
4,585 Total Downloads
Downloads (Last 12 months) 4,585
Downloads (Last 6 weeks) 654

This publication has not been cited yet

View or Download as a PDF file.

View online with eReader.

Digital Edition

View this article in digital edition.

Share this Publication link

https://dl.acm.org/doi/10.1145/3616372

Share on Social Media

0 References

Export Citations

Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
Download citation
Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

Help | Advanced Search

Computer Science > Software Engineering

Title: software testing for machine learning.

Abstract: Machine learning has become prevalent across a wide variety of applications. Unfortunately, machine learning has also shown to be susceptible to deception, leading to errors, and even fatal failures. This circumstance calls into question the widespread use of machine learning, especially in safety-critical applications, unless we are able to assure its correctness and trustworthiness properties. Software verification and testing are established technique for assuring such properties, for example by detecting errors. However, software testing challenges for machine learning are vast and profuse - yet critical to address. This summary talk discusses the current state-of-the-art of software testing for machine learning. More specifically, it discusses six key challenge areas for software testing of machine learning systems, examines current approaches to these challenges and highlights their limitations. The paper provides a research agenda with elaborated directions for making progress toward advancing the state-of-the-art on testing of machine learning.

Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	[cs.SE]
	(or [cs.SE] for this version)
	Focus to learn more arXiv-issued DOI via DataCite
Journal reference:	Proceedings of the AAAI Conference on Artificial Intelligence, 34(09), 13576-13582 (2020)
:	Focus to learn more DOI(s) linking to related resources

Submission history

Access paper:.

Other Formats

References & Citations

Google Scholar
Semantic Scholar

BibTeX formatted citation

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Information

Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

Active Journals
Find a Journal
Proceedings Series
For Authors
For Reviewers
For Editors
For Librarians
For Publishers
For Societies
For Conference Organizers
Open Access Policy
Institutional Open Access Program
Special Issues Guidelines
Editorial Process
Research and Publication Ethics
Article Processing Charges
Testimonials
Preprints.org
SciProfiles
Encyclopedia

Article Menu

Subscribe SciFeed
Recommended Articles
Google Scholar
on Google Scholar
Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

A comprehensive bibliometric assessment on software testing (2016–2021).

1. Introduction

2. related work, 3. methodology, 3.1. creation of two distinguished datasets for two different time spans, 3.2. research questions for the analysis of datasets, 4. research findings, 4.1. year-wise scientific production, 4.2. top 20 publication venues, 4.3. types of documents, 4.4. top 20 web of science categories based on the publications count, 4.5. top 20 research areas in accordance with the record count of publications, 4.6. leading 20 institutions/organizations based on the frequency of publications, 4.7. the top 20 most actively contributing countries based on the frequency of publications, 4.8. continent-wise research contribution, 4.9. language of the publications, 4.10. collaboration network amongst countries, 4.11. correlation of documents on the basis of co-words, 4.12. research themes/topics, 5. future work and limitations of the research study, 5.1. future work, 5.2. limitations of the study.

Limited Time Frame: We have included the research publications for the six-year timeframes of the WoS database 2016–2021. Therefore, the paper does not include the research studies for the time duration before 2016.
Limitations of sub-domain of SE: We have a limited or bibliometric assessment on Software Testing only. However, there are many other sub-domains of Software Engineering that need to be analyzed in future works.
Use of ISI Web of Science (WoS): We have used one of the most commonly used and highly privileged databases, which is ISI Web of Science. Other databases can also be used.
Twelve research questions: Analysis on the basis of 12 research questions can be enhanced to include other bibliometric assessment parameters.

6. Conclusions

Author contributions, conflicts of interest.

Garousi, V. A bibliometric analysis of the Turkish software engineering research community. Scientometrics 2015 , 105 , 23–49. [ Google Scholar ] [ CrossRef ]
Galler, B.A. ACM president’s letter: NATO and software engineering? Commun. ACM 1969 , 12 , 301. [ Google Scholar ] [ CrossRef ]
Johnson, P.; Ekstedt MJacobson, I. Where’s the theory for software engineering? IEEE Softw. 2012 , 29 , 96. [ Google Scholar ] [ CrossRef ]
Alam, S.; Zardari, S.; Bano, M. Software engineering and 12 prominent sub-areas: Comprehensive bibliometric assessment on 13 years (2007–2019). IET Softw. 2021 , 16 , 125–145. [ Google Scholar ] [ CrossRef ]
Roger, S.P.; Bruce, R.M. Software Engineering: A Practitioner’s Approach ; McGraw-Hill Education: New York, NY, USA, 2005. [ Google Scholar ]
Wasserman, A.I. Software engineering issues for mobile application development. In Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research, Santa Fe, NM, USA, 7–8 November 2010; Association for Computing Machinery: New York, NY, USA, 2010; pp. 397–400. [ Google Scholar ]
Gregg, D.G.; Kulkarni, U.R.; Vinzé, A.S. Understanding the Philosophical Underpinnings of Software Engineering Research in Information Systems. Inf. Syst. Front. 2001 , 3 , 169–183. [ Google Scholar ] [ CrossRef ]
Singh, S.K.; Singh, A. Software Testing ; Vandana Publications: Lucknow, India, 2012. [ Google Scholar ]
Garousi, V.; Zhi, J. A survey of software testing practices in Canada. J. Syst. Softw. 2013 , 86 , 1354–1376. [ Google Scholar ] [ CrossRef ]
Jindal, T. Importance of Testing in SDLC. Int. J. Eng. Appl. Comput. Sci. 2016 , 1 , 54–56. [ Google Scholar ] [ CrossRef ]
Tan, T.B.; Cheng, W.K. December. Software testing levels in internet of things (IoT) architecture. In International Computer Symposium ; Springer: Singapore, 2018; pp. 385–390. [ Google Scholar ]
Hamza, Z.; Hammad, M. Testing Approaches for Web and Mobile Applications: An Overview. Int. J. Comput. Digit. Syst. 2020 , 9 , 657–664. [ Google Scholar ] [ CrossRef ]
Chauhan, R.K.; Singh, I. Latest research and development on software testing techniques and tools. Int. J. Curr. Eng. Technol. 2014 , 4 , 2368–2372. [ Google Scholar ]
Jayakumar, A.V.; Gautham, S.; Kuhn, R.; Simons, B.; Collins, A.; Dirsch, T.; Kacker, R.; Elks, C. Systematic software testing of critical embedded digital devices in nuclear power applications. In Proceedings of the 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), Coimbra, Portugal, 12–15 October 2020; IEEE: Piscataway Township, NJ, USA, 2020; pp. 85–90. [ Google Scholar ]
Nurul, M.; Quadri, S.M.K. Software Testing Approach for Cloud Applications (STACA)–Methodology, Techniques & Tools. In Proceedings of the 2019 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 10–11 January 2019; IEEE: Piscataway Township, NJ, USA, 2019; pp. 19–25. [ Google Scholar ]
Sanchez-Gomez, N.; Torres-Valderrama, J.; Garcia-Garcia, J.A.; Gutierrez, J.J.; Escalona, M.J. Model-Based Software Design and Testing in Blockchain Smart Contracts: A Systematic Literature Review. IEEE Access 2020 , 8 , 164556–164569. [ Google Scholar ] [ CrossRef ]
Murad, G.; Badarneh, A.; Qusef, A.; Almasalha, F. Software testing techniques in iot. In Proceedings of the 2018 8th International Conference on Computer Science and Information Technology (CSIT), Amman, Jordan, 11–12 July 2018; IEEE: Piscataway Township, NJ, USA, 2018; pp. 17–21. [ Google Scholar ]
Górski, T. The 1+5 Architectural Views Model in Designing Blockchain and IT System Integration Solutions. Symmetry 2021 , 13 , 2000. [ Google Scholar ] [ CrossRef ]
Shahin, M.; Babar, M.A.; Zhu, L. Continuous Integration, Delivery and Deployment: A Systematic Review on Approaches, Tools, Challenges and Practices. IEEE Access 2017 , 5 , 3909–3943. [ Google Scholar ] [ CrossRef ]
Górski, T. Continuous Delivery of Blockchain Distributed Applications. Sensors 2021 , 22 , 128. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Schermann, G.; Schöni, D.; Leitner, P.; Gall, H.C. Bifrost: Supporting continuous deployment with automated enactment of multi-phase live testing strategies. In Proceedings of the 17th International Middleware Conference, Trento, Italy, 12–16 December 2016; pp. 1–14. [ Google Scholar ]
Merigó, J.M.; Yang, J.-B. A bibliometric analysis of operations research and management science. Omega 2017 , 73 , 37–48. [ Google Scholar ] [ CrossRef ] [ Green Version ]
Alam, S.; Zardari, S.; Shamsi, J. Comprehensive three-phase bibliometric assessment on the blockchain (2012–2020). Libr. Hi Tech 2022 . [ Google Scholar ] [ CrossRef ]
Tse, T.; Chen, T.; Glass, R.L. An assessment of systems and software engineering scholars and institutions (2000–2004). J. Syst. Softw. 2006 , 79 , 816–819. [ Google Scholar ] [ CrossRef ] [ Green Version ]
Wohlin, C. An analysis of the most cited articles in software engineering journals—1999. Inf. Softw. Technol. 2005 , 47 , 957–964. [ Google Scholar ] [ CrossRef ]
Wong, W.E.; Tse, T.; Glass, R.L.; Basili, V.R.; Chen, T. An assessment of systems and software engineering scholars and institutions (2002–2006). J. Syst. Softw. 2009 , 82 , 1370–1373. [ Google Scholar ] [ CrossRef ] [ Green Version ]
Hamadicharef, B. Scientometric study of the IEEE transactions on software engineering 1980–2010. In Proceedings of the 2011 2nd International Congress on Computer Applications and Computational Science, Bali, Indonesia, 15–17 November 2011; Springer: Berlin, Heidelberg, 2012; pp. 101–106. [ Google Scholar ]
Freitas, F.G.D.; Souza, J.T.D. Ten years of search based software engineering: A bibliometric analysis. In International Symposium on Search Based Software Engineering ; Springer: Berlin/Heidelberg, Germany, 2011; pp. 18–32. [ Google Scholar ]
Garousi, V.; Mäntylä, M.V. Citations, research topics and active countries in software engineering: A bibliometrics study. Comput. Sci. Rev. 2016 , 19 , 56–77. [ Google Scholar ] [ CrossRef ]
Karanatsiou, D.; Li, Y.; Arvanitou, E.-M.; Misirlis, N.; Wong, W.E. A bibliometric assessment of software engineering scholars and institutions (2010–2017). J. Syst. Softw. 2018 , 147 , 246–261. [ Google Scholar ] [ CrossRef ]
Almaliki, M. Software Engineering in Saudi Arabia: A Bibliometric Assessment. IEEE Access 2021 , 9 , 17245–17255. [ Google Scholar ] [ CrossRef ]
Wong, W.E.; Mittas, N.; Arvanitou, E.M.; Li, Y. A bibliometric assessment of software engineering themes, scholars and institutions (2013–2020). J. Syst. Softw. 2021 , 180 , 111029. [ Google Scholar ] [ CrossRef ]
Mikki, S. Comparing Google Scholar and ISI Web of Science for Earth Sciences. Scientometrics 2009 , 82 , 321–331. [ Google Scholar ] [ CrossRef ] [ Green Version ]
Van Eck, N.J.; Waltman, L. VOSviewer Manual ; Univeristeit Leiden: Leiden, The Netherlands, 2013; Volume 1, pp. 1–53. [ Google Scholar ]
Ravikumar, S.; Agrahari, A.; Singh, S.N. Mapping the intellectual structure of scientometrics: A co-word analysis of the journal Scientometrics (2005–2010). Scientometrics 2014 , 102 , 929–955. [ Google Scholar ] [ CrossRef ]
Cobo, M.J.; López-Herrera, A.G.; Herrera-Viedma, E.; Herrera, F. Science mapping software tools: Review, analysis, and cooperative study among tools. J. Am. Soc. Inf. Sci. Technol. 2011 , 62 , 1382–1402. [ Google Scholar ] [ CrossRef ]
Wang, J.; Li, X.; Wang, P.; Liu, Q. Bibliometric analysis of digital twin literature: A review of influencing factors and conceptual structure. Technol. Anal. Strat. Manag. 2022 , 1–15. [ Google Scholar ] [ CrossRef ]

Ref.	Time Durations	Data Sources	Parameters Analyzed
[ ]	2000–2004	WoS	Top scholars, Top institutions, Systems and Software Engineering, and Research Publications.
[ ]	1986–2005	WoS	Author’s analysis for scholarly publications and presentation of 20 most cited articles.
[ ]	2002–2006	WoS	Survey of publications in the field of SE, Top Institutional Analysis, Annual Publication Trend, and Research Topics
[ ]	1980–2010	WoS	Scientometric study on IEEE Transactions (analysis of authors, citations and keywords, collaboration networks of authors and countries)
[ ]	2001–2010	SBSE (Search-Based Software Engineering)	Authorship pattern, Publication sources, Analysis covering 740 publications of the SBSE.
[ ]	1972–2013	Scopus	Publication rate of SE papers, Citation analysis, Thematic and Topic analysis, Country-wise research publication trend
[ ]	2010–2017	Google Scholar and selected publication venues	Analysis of Research Topics, Institutions, and Scholars
[ ]	2007–2019	WoS	Types of documents, Annual Scientific Publications, Current Research Areas, Co-word Analysis, Countries Collaboration.
[ ]	1984–2019	Scopus	Analysis of Publication rate, Analysis of Subject Areas, Actively Participating Institutions, Researchers’ Participation Analysis, Collaboration Network Analysis between International SE Community and Saudi Arabian SE Community, Assessment of Citation Trend
[ ]	2013–2020	Selected publication venues	Analysis of Research Topics, Institutions, and Scholars

The Significant Contribution of Our Research Study
In our research study, we have evaluated a dataset collected from the Web of Science (WoS) in the two distinguished time frames to represent the variation in various bibliometric aspects of research in Software Testing (ST) field. The two symmetric but different review timelines are 2016–2018 and 2019–2021. Our research study presents the top 20 countries in accordance with the number of publications. This shows which countries are progressing effectively and making the most contributions as far as the number of publications is concerned. We have represented in detail the relations among the countries in terms of research collaboration amongst the top 20 countries. This parameter helps in analyzing the importance of collaboration for research enhancement. Map-based representation depicting continent-wise research contribution in terms of publications is another aspect of our research study. Analysis on the basis of co-words that appear in different articles is presented in the study. The keywords play an important role in providing the basis for the evaluation of research topics/themes. Our research work presents the top 20 most active institutions/organizations with respect to the number of publications. This feature acts as a measure of research output with regard to the record count of publications to exhibit the progress of various institutions/organizations. Our research work presents emerging research topics/themes with respect to Software Testing. This also includes the representation of the topic dendrogram. Our paper includes findings on the basis of the top 20 WoS categories. This represents diversity in ST as WoS categories are journal-based and each WoS category is mapped to research areas. We also present the top 20 languages used as the medium for publications in the field. This further affirms the fact that, although English is by far the most commonly used language for writing articles, other languages also contribute. This encourages non-English writers to make effective and valuable research contributions by writing in their language of fluency. Our work includes findings based on cross-disciplinary research areas. Hence, this affirms the fact that the impact of ST goes beyond Computer Science and Software Engineering. This criterion represents the top 20 most relevant resources (publication venues) in the field of ST.

The Significant Contribution of Our Research Study

In our research study, we have evaluated a dataset collected from the Web of Science (WoS) in the two distinguished time frames to represent the variation in various bibliometric aspects of research in Software Testing (ST) field. The two symmetric but different review timelines are 2016–2018 and 2019–2021.
Our research study presents the top 20 countries in accordance with the number of publications. This shows which countries are progressing effectively and making the most contributions as far as the number of publications is concerned.
We have represented in detail the relations among the countries in terms of research collaboration amongst the top 20 countries. This parameter helps in analyzing the importance of collaboration for research enhancement.
Map-based representation depicting continent-wise research contribution in terms of publications is another aspect of our research study.
Analysis on the basis of co-words that appear in different articles is presented in the study. The keywords play an important role in providing the basis for the evaluation of research topics/themes.
Our research work presents the top 20 most active institutions/organizations with respect to the number of publications. This feature acts as a measure of research output with regard to the record count of publications to exhibit the progress of various institutions/organizations.
Our research work presents emerging research topics/themes with respect to Software Testing. This also includes the representation of the topic dendrogram.
Our paper includes findings on the basis of the top 20 WoS categories. This represents diversity in ST as WoS categories are journal-based and each WoS category is mapped to research areas.
We also present the top 20 languages used as the medium for publications in the field. This further affirms the fact that, although English is by far the most commonly used language for writing articles, other languages also contribute. This encourages non-English writers to make effective and valuable research contributions by writing in their language of fluency.
Our work includes findings based on cross-disciplinary research areas. Hence, this affirms the fact that the impact of ST goes beyond Computer Science and Software Engineering.
This criterion represents the top 20 most relevant resources (publication venues) in the field of ST.

Inclusion/Exclusion Criteria	Details of Criteria
Inclusion Criteria
Exclusion Criteria

Insights	Research Questions
Annual research publication	Q1. What is the frequency of year-wise research publications?
Publication venues	Q2. What are the top 20 publication venues (publication resources) in terms of the publication count?
Types of publications	Q3. What are the various types of documents present in the datasets?
Types of WoS categories	Q4. What are the 20 leading WoS categories?
Types of research areas	Q5. Which research areas constitute the top 20 research areas for Software Testing?
Research contribution of institutions/organizations	Q6. What are the leading 20 institutions/organizations based on the frequency of publications?
The research contribution of the countries	Q7. What are the top 20 countries in terms of the frequency of publications?
Continent-wise research contribution	Q8. What are the continent research participations in terms of publications?
Types of languages	Q9. What is the research contribution of different languages as per published scholarly works from the Software Testing aspect?
Research collaboration amongst countries	Q10. Which of the top 20 countries have the biggest research collaboration network?
Relation amongst documents	Q11. What is the correlation of documents on the basis of co-word?
Research topics/themes	Q12. What are the associated research topics/themes?

Web of Science Categories	Record Count	% of 35,161
Electrical Engineering	6382	18.151
Computer Science Theory and Methods	3493	9.934
Computer Science Software Engineering	2995	8.518
Computer Science Information Systems	2201	6.260
Computer Science Interdisciplinary Applications	1687	4.798
Computer Science Artificial Intelligence	1638	4.659
Telecommunications	1624	4.619
Mechanical Engineering	1504	4.277
Multidisciplinary Engineering	1322	3.760
Multidisciplinary Materials Science	1310	3.726
Energy Fuels	1235	3.512
Automation Control Systems	1152	3.276
Civil Engineering	1108	3.151
Multidisciplinary Sciences	938	2.688
General Internal Medicine	916	2.605
Applied Physics	875	2.489
Educational Research	827	2.352
Computer Science Hardware Architecture	784	2.230
Instrumentation	772	2.196
Radiology Nuclear Medical Imaging	740	2.105

Web of Science Categories	Record Count	% of 39,937
Electrical Engineering	5147	12.888
Computer Science Information Systems	2859	7.159
Computer Science Software Engineering	2829	7.084
Computer Science Theory and Methods	2779	6.958
Materials Science: Multidisciplinary	2023	5.065
Telecommunications	1909	4.78
Multidisciplinary Engineering	1656	4.147
Computer Science Interdisciplinary Applications	1621	4.059
Computer Science Artificial Intelligence	1575	3.944
Civil Engineering	1539	3.854
Mechanical Engineering	1404	3.516
General Internal Medicine	1375	3.443
Applied Physics	1238	3.1
Energy Fuels	1234	3.09
Multidisciplinary Sciences	1200	3.005
Environmental Sciences	1081	2.707
Instrumentation	980	2.454
Dentistry and Oral Surgery Medicine	939	2.351
Radiology Nuclear Medical Imaging	934	2.339
Automation Control Systems	894	2.239

Research Areas	Record Count	% of 35,161
Engineering	12,065	34.314
Computer Science	8921	25.372
Materials Science	1839	5.230
Telecommunications	1624	4.619
Science and Technology: Other Topics	1531	4.354
Physics	1383	3.933
Energy Fuels	1235	3.512
Automation Control Systems	1152	3.276
Educational Research	1129	3.211
General Internal Medicine	950	2.702
Environmental Sciences and Ecology	880	2.503
Chemistry	839	2.386
Instrumentation	772	2.196
Biochemistry and Molecular Biology	742	2.110
Radiology Nuclear Medical Imaging	740	2.105
Optics	739	2.102
Dentistry and Oral Surgery Medicine	687	1.954
Mathematics	670	1.906
Business Economics	607	1.726
Construction Technology	596	1.695

Research Areas	Record Count	% of 39,937
Engineering	11,717	29.339
Computer Science	8622	21.589
Materials Science	2617	6.553
Science and Technology: Other Topics	1972	4.938
Telecommunications	1909	4.78
Physics	1797	4.5
Chemistry	1743	4.364
General Internal Medicine	1505	3.768
Environmental Sciences and Ecology	1358	3.4
Energy Fuels	1234	3.09
Educational Research	1064	2.664
Instrumentation	980	2.454
Dentistry and Oral Surgery Medicine	939	2.351
Radiology Nuclear Medical Imaging	934	2.339
Automation Control Systems	894	2.239
Public Environmental Occupational Health	810	2.028
Pharmacology	807	2.021
Business and Economics	805	2.016
Biochemistry and Molecular Biology	804	2.013
Mathematics	801	2.006

Affiliations	Countries	Record Count	% of 35,161
Islamic Azad University	Iran	500	1.422
University of California System	USA	447	1.271
Chinese Academy of Sciences CAS	China	417	1.186
Udice French Research Universities	France	408	1.160
Centre National De La Recherche Scientifique CNRS	France	391	1.112
University of Texas System	USA	265	0.754
University of London	UK	260	0.739
United States Department of Energy Doe	USA	250	0.711
Indian Institute of Technology System IIT System	India	247	0.702
Universidade De Sao Paulo	Brazil	240	0.683
Russian Academy of Sciences	Russia	214	0.609
Helmholtz Association	Germany	209	0.594
Harvard University	USA	198	0.563
National Institute of Technology NIT System	India	195	0.555
State University System of Florida	USA	188	0.535
University College London	UK	175	0.498
Tehran University of Medical Sciences	Iran	174	0.495
Beihang University	China	170	0.483
University of North Carolina	USA	159	0.452
Pennsylvania Commonwealth System of Higher Education PCSHE	USA	154	0.438

Affiliations	Countries	Record Count	% of 39,937
Islamic Azad University	Iran	506	1.267
University of California System	USA	501	1.254
Chinese Academy of Sciences	China	482	1.207
Centre National De La Recherche Scientifique CNRS	France	449	1.124
Udice French Research Universities	France	431	1.079
University of London	UK	293	0.734
University of Texas system	USA	287	0.719
Indian Institute of Technology System IIT System	India	270	0.676
United States Department of Energy Doe	USA	258	0.646
National Institute of Technology NIT System	India	254	0.636
Universidade De Sao Paulo	Brazil	251	0.628
Russian Academy of Sciences	Russia	240	0.601
State University System of Florida	USA	234	0.586
Tehran University of Medical Sciences	Iran	225	0.563
Harvard University	USA	219	0.548
Helmholtz Association	Germany	218	0.546
Ministry of Education Science of Ukraine	Ukraine	212	0.531
Pennsylvania Commonwealth System of Higher Education PCSHE	USA	194	0.486
University of Chinese Academy of Sciences CAS	China	193	0.483
Shahid Beheshti University Medical Sciences	Iran	173	0.433

Countries/Regions	Record Count	% of 35,161
USA	6063	17.244
People’s Republic of China	5885	16.737
India	2380	6.769
Iran	2135	6.072
Germany	1993	5.668
Italy	1782	5.068
United Kingdom	1529	4.349
Brazil	1365	3.882
Spain	1234	3.510
France	1160	3.299
Canada	1062	3.020
Russia	974	2.770
Poland	885	2.517
Turkey	882	2.508
Australia	820	2.332
Malaysia	708	2.014
South Korea	644	1.832
Netherlands	634	1.803
Japan	616	1.752
Indonesia	542	1.541

Countries/Regions	Record Count	% of 39,937
People’s Republic of China	7581	18.982
USA	6355	15.913
India	2943	7.369
Iran	2690	6.736
Germany	2089	5.231
Italy	1940	4.858
United Kingdom	1731	4.334
Brazil	1568	3.926
Spain	1434	3.591
Canada	1216	3.045
Russia	1127	2.822
France	1122	2.809
Australia	1093	2.737
Turkey	1090	2.729
Poland	945	2.366
South Korea	833	2.086
Saudi Arabia	715	1.790
Japan	709	1.775
Malaysia	689	1.725
Netherlands	663	1.660

Languages	Record Count	% of 35,161
English	34,172	97.187
Spanish	222	0.631
Portuguese	165	0.469
Chinese	150	0.427
Russian	120	0.341
Turkish	86	0.245
German	55	0.156
French	38	0.108
Korean	29	0.082
Arabic	23	0.065
Polish	20	0.057
Persian	18	0.051
Italian	11	0.031
Ukrainian	11	0.031
Slovenian	8	0.023
Czech	7	0.02
Hungarian	6	0.017
Slovak	6	0.017
Croatian	5	0.014
Malay	4	0.011
Bulgarian	2	0.006
Japanese	2	0.006

Languages	Record Count	% of 39,937
English	38,975	97.591
Spanish	211	0.528
Chinese	201	0.503
Russian	156	0.391
Portuguese	133	0.333
Turkish	60	0.15
German	44	0.11
French	37	0.093
Korean	26	0.065
Ukrainian	21	0.053
Polish	17	0.043
Italian	9	0.023
Hungarian	7	0.018
Persian	6	0.015
Czech	5	0.013
Japanese	5	0.013
Arabic	4	0.01
Croatian	2	0.005
Malay	2	0.005
Slovenian	2	0.005
Welsh	2	0.005

Keywords	Occurrences

Behavior	647
Design	864
Model	1022
Optimization	497
Performance	897
Simulation	816
System	734
Systems	462

Classification	368
Identification	438
Models	362
Prediction	411
Software	1438
Validation	411

Children	393
Diagnosis	339
Impact	428
Management	464
Prevalence	410
Risk	388

Keywords	Occurrences

Behavior	1083
Design	1181
Model	1345
Optimization	846
Performance	1403
Simulation	991
System	861

Classification	554
Identification	532
Machine learning	757
Prediction	631
Reliability	493
Software	1852
Validation	533

Diagnosis	500
Impact	813
Management	682
Meta-analysis	536
Prevalence	702
Risk	596

MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

Zardari, S.; Alam, S.; Al Salem, H.A.; Al Reshan, M.S.; Shaikh, A.; Malik, A.F.K.; Masood ur Rehman, M.; Mouratidis, H. A Comprehensive Bibliometric Assessment on Software Testing (2016–2021). Electronics 2022 , 11 , 1984. https://doi.org/10.3390/electronics11131984

Zardari S, Alam S, Al Salem HA, Al Reshan MS, Shaikh A, Malik AFK, Masood ur Rehman M, Mouratidis H. A Comprehensive Bibliometric Assessment on Software Testing (2016–2021). Electronics . 2022; 11(13):1984. https://doi.org/10.3390/electronics11131984

Zardari, Shehnila, Sana Alam, Hamad Abosaq Al Salem, Mana Saleh Al Reshan, Asadullah Shaikh, Aneeq Fayyaz Karim Malik, Muhammad Masood ur Rehman, and Haralambos Mouratidis. 2022. "A Comprehensive Bibliometric Assessment on Software Testing (2016–2021)" Electronics 11, no. 13: 1984. https://doi.org/10.3390/electronics11131984

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

Subscribe to receive issue release notifications and newsletters from MDPI journals

Software Testing Research and Practice

Conference paper
First Online: 01 January 2003
Cite this conference paper

Antonia Bertolino 7

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2589))

Included in the following conference series:

International Workshop on Abstract State Machines

554 Accesses

13 Citations

The paper attempts to provide a comprehensive view of the field of software testing. The objective is to put all the relevant issues into a unified context, although admittedly the overview is biased towards my own research and expertise. In view of the vastness of the field, for each topic problems and approaches are only briefly tackled, with appropriate references provided to dive into them. I do not mean to give here a complete survey of software testing. Rather I intend to show how an unwieldy mix of theoretical and technical problems challenge software testers, and that a large gap exists between the state of the art and of the practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Available as PDF
Read on any device
Instant download
Own it forever
Compact, lightweight edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unable to display preview. Download preview PDF.

Adams, E.: Optimizing Preventive Service of Software Products. IBM J. of Research and Development 28 1 (1984) 2–14 4

Google Scholar

Agedis: Automated Generation and Execution of Test Suites for DIstributed Component-based Software. < http://www.agedis.de/index.shtml > 8

Bache, R., Müllerburg, M.: Measures of Testability as a Basis for Quality Assurance. Software Engineering Journal. 5 (March 1990) 86–92 14

Baresi, L., Young, M.: Test Oracles. Tech. Report CIS-TR-01-02. On line at < http://www.cs.uoregon.edu/ michal/pubs/oracles.html > 11

Basanieri, F., Bertolino, A., Marchetti, E.: The Cow Suite Approach to Planning and Deriving Test Suites in UML Projects. Proc. 5th Int. Conf. UML 2002, Dresden, Germany. LNCS 2460 (2002) 383–397 8

Basili, V.R., Selby, R.W.: Comparing the Effectiveness of Software Testing Strategies. IEEE Trans. Software Eng. 13 (12) (1987) 1278–1296 5, 6

Article Google Scholar

Bernot, G., Gaudel M.C., Marre, B.: Software Testing Based On Formal Specifications: a Theory and a Tool. Software Eng. Journal 6 (1991) 387–405 8

Bertolino, A.: ISSTA 2002 Panel: Is ISSTA Research Relevant to Industrial Users?. In [36] 201–202 (see also following entries 203-209) 6

Bertolino, A.: Knowledge Area Description of Software Testing. Guide to the SWEBOK, Joint IEEE-ACM Software Engineering Coordinating Committee. (2001) On-line at: < http://www.swebok.org > 2, 6

Bertolino, A., Corradini, F, Inverardi, P., Muccini, H.: Deriving Test Plans from Architectural Descriptions. Proc. Int. ACM Conf. on Soft. Eng. (2000) 220–229 14

Bertolino, A., Inverardi, P., Muccini, H.: An Explorative Journey from Architectural Tests Definition downto Code Tests Execution. Proc. Int. Conf. on Soft. Eng. (2001) 211–220 9, 14

Bertolino, A., Marré, M.: A General Path Generation Algorithm for Coverage Testing. Proc. 10th Int. Soft. Quality Week, San Francisco, Ca. (1997) pap. 2T1 7

Bertolino, A., Polini, A.: Re-thinking the Development Process of Component-Based Software, ECBS 2002 Workshop on CBSE, Lund, Sweden (2002) 15

Bertolino, A., Polini, A.: WCT: a Wrapper for Component Testing. Proc. Fidji’2002, Luxembourg (to appear) (2002) 16

Bertolino, A., Strigini, L.: On the Use of Testability Measures for Dependability Assessment IEEE Trans. Software Eng. 22 (2) (1996) 97–108 11, 14

Binder, R. V.: Testing Object-Oriented Systems-Models, Patterns, and Tools. Addison-Wesley (2000) 14, 15

Bochmann, G. V., Petrenko, A.: Protocol Testing: Review of Methods and Relevance for Software Testing. Proc. Int. Symp. on Soft. Testing and Analysis (ISSTA), Seattle (1994) 109–124 8

Börger, E.: The Origins and the Development of the ASM Method for High Level System Design and Analysis. J. of Universal Computer Science 8 (1) (2002) 2–74. On line at: < http://www.jucs.org/jucs_8_1/the_origins_and_the > 1, 2

Briand, L., Labiche, Y.: A UML-Based Approach to System Testing. Software and Systems Modeling 1 (1) (2002) 10–42 8

Brinksma, E., Tretmans, J.: Testing Transition Systems: An Annotated Bibliography. Proc. of MOVEP’2k, Nantes (2000) 44–50 8

Carver, R.H., Tai, K.-C.: Use of Sequencing Constraints for Specification-Based Testing of Concurrent Programs. IEEE Trans. on Soft. Eng. 24 (6) (1998) 471–490 10

Choi, J-D., Zeller, A.: Isolating Failure-Inducing Thread Schedules. In [36] 210–220 4

Coward, P.D.: Symbolic Execution Systems-A Review. Software Eng. J. (1988) 229–239 7

Clarke, L.A., Richardson, D. J.: Applications of Symbolic Evaluation. The J. of Systems and Software 5 (1985) 15–35 2

Crnkovic, I.: Component-based Software Engineering-New Challenges in Software Development. John Wiley&Sons (2001) 15

DeMillo, R.A., Offutt, A. J.: Constraint-Based Test Data Generation. IEEE Trans. Software Eng. 17 (9) (1991) 900–910 8

Dick, J., Faivre, A.: Automating The Generation and Sequencing of Test Cases From Model-Based Specifications. Proc. FME’93, LNCS 670 (1993) 268–284 8, 9

Dijkstra, E.W.: Notes on Structured Programming. T.H. Rep. 70-WSK03 (1970) On line at: < http://www.cs.utexas.edu/users/EWD/ewd02xx/EWD249.PDF > 1, 3

Duesterwald, E., Gupta, R., Soffa, M. L.: Rigorous Data Flow Testing Through Output Influences. Proc. 2nd Irvine Software Symposium, Irvine, CA. (1992) 131–145 7

Duran, J. W., Ntafos, S.C.: An Evaluation of Random Testing. IEEE Trans. Software Eng. SE-10 (4) (1984) 438–444 5

Dunsmore, A., Roper, M., Wood, M.: Further Investigations into the Development and Evaluation of Reading Techniques for Object-Oriented Code Inspection. Proc. 24th Int. Conf. on Soft. Eng. Orlando, FL, USA (2002) 47–57 2

Edelstein, O., Farchi, E., Nir, Y., Ratsaby, G., Ur, S.: Multithreaded Java Program Test Generation. IBM Systems Journal 41 (2002) 111–125 10

Fagan, M.R.: Design and Code Inspections to Reduce Errors in Program Development. IBM Systems Journal 15 (3) (1976) 182–211 2

Fenton, N.E., Ohlsson, N.: Quantitative Analysis of Faults and Failures in a Complex Software System. IEEE Trans. Software Eng. 26 (8) (2000) 797–814 4

Forgacs, I., Bertolino, A.: Preventing Untestedness in Data-flow Based Testing. Soft. Testing, Verification and Reliability 12 (1)(2001) 29–61 7

Frankl, P.G. (Ed.): Proc. ACM Sigsoft Int. Symposium on Soft. Testing and Analysis ISSTA 2002, Soft. Engineering Notes 27 (4) Roma, Italy (July 2002) 17, 18, 19

Frankl, P.G., Hamlet, R.G., Littlewood, B., Strigini, L.: Evaluating Testing Methods by Delivered Reliability. IEEE Trans. Software Eng. 24 (8) (1998) 586–601 4, 5

Freedman, R. S.: Testability of Software Components. IEEE Trans. Software Engineering 17 (6) (1991) 553–564 14

Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns-Elements of Reusable Object-Oriented Software. Addison-Wesley (1994) 15

Gao, J.: Component Testability and Component Testing Challenges, Proc. ICSE Workshop on Component-Based Soft. Eng. (2000) http://www.sei.cmu.edu/cbs/cbse2000/paper/ 15

Gao, J., Gupta, K., Gupta, S., Shim, S.: On Building Testable Software Components. Proc. ICCBSS2002, LNCS 2255 (2002) 108–121 16

Gargantini, A., Riccobene, E.: ASM-Based Testing: Coverage Criteria and Automatic Test Sequence. J. of Universal Computer Science 7(11) (2001) 1050–1067. On line at: < http://www.jucs.org/jucs_7_11/asm_based_testing_coverage > 2

Grieskamp, W., Gurevich, Y., Schulte, W., Veanes, M.: Generating Finite State Machines from Abstract State Machines. In [36] 112–122 2

Hamlet, D.: Continuity in Software Systems. In [36] 196–200 1

Hamlet, D., Taylor, R.: Partition Testing Does Not Inspire Confidence. IEEE Trans. Software Eng. 16 (12) (1990) 1402–1411 5, 12

Article MathSciNet Google Scholar

Harrold, M. J.: Testing: A Roadmap. In A. Finkelstein (Ed.), The Future of Software Engineering, ACM Press (2000) 63–72 17

Harrold, M. J., Soffa, M. L.: Selection of Data for Integration Testing.IEEE Software (March 1991) 58–65 14

Hierons, R. M.: Testing from a Z Specification. Soft. Testing, Verification and Reliability 7(1997) 19–33 8

Hierons, R., Derrick, J. (Eds): Special Issue on Specification-based Testing. Soft. Testing, Verification and Reliability 10 (2000) 8

Hiller, M., Jhumka, A., Suri, N.: PROPANE: An Environment for Examining the Propagation of Errors in Software. In [36] 81–85 4

Howden, W.E.: Weak Mutation Testing and Completeness of Test Sets. IEEE Trans. Software Eng. 8 (4) (1982) 371–379 4

ISO/IEC 9126, Information Technology-Software Product Evaluation-Quality Characteristics and Guidelines for Their Use. (1991) 13

Fernandez, J.-C., Jard, C., Jeron, T., Nedelka, L., Viho, C.: Using On-the-fly Verification Techniques for the Generation of Test Suites. Proc. of the 8th Int. Conf. on Computer Aided Verification (1996) 8

Korel, B.: Automated Software Test Data Generation. IEEE Trans. Software Eng. 16 (8) (1990) 870–879 7

Laprie, J.C.: Dependability-Its Attributes, Impairments and Means. In [66] 3–18 3

Latella, D., Massink, M.: On Testing and Conformance Relations for UML Statechart Diagrams Behaviours. In [36] 144–153 8

Littlewood, B., Popov, P. T., Strigini, L., Shryane, N.: Modeling the Effects of Combining Diverse Software Fault Detection Techniques. IEEE Trans. Software Eng. 26 (12) (2000) 1157–1167 5

Littlewood, B., Strigini, L.: Validation of Ultra-High Dependability for Softwarebased Systems. Communications of the ACM. 36 (11) (1993) 69–80 13

Lyu, M.R. (Ed.): Handbook of Software Reliability Engineering. IEEE Comp. Soc. Press/McGraw-Hill (1996) 8, 12

Morell, L. J.: A Theory of Fault-based Testing. IEEE Trans. Software Eng. 16 (8) (1990) 844–857 4

Morris, J., Lee, G., Parker, K., Bundell, G. A., Lam, C. P.: Software Component Certification. IEEE Computer (September 2001) 30–36 16

Myers, G. J.: The Art of Software Testing. Wiley. (1979) 7, 8

Orso, A., Harrold, M. J., Rosenblum, D.: Component Metadata for Software Engineering Tasks. Proc. EDO2000, LNCS 1999 (2000) 129–144 16

Ostrand, T. J., Balcer, M. J.: The Category-Partition Method for Specifying and Generating Functional Tests. ACM Comm. 31 (6) (1988) 676–686 8

Pargas, R., Harrold, M. J., Peck, R.: Test-Data Generation Using Genetic Algorithms. J. of Soft. Testing, Verifications, and Reliability 9 (1999) 263–282 7

Randell, B., Laprie, J. C., Kopetz, H., Littlewood B., Eds.: Predictably Dependable Computing Systems, Springer (1995) 19, 20

Rapps, S., Weyuker, E. J.: Selecting Software Test Data Using Data Flow Information. IEEE Trans. Software Eng., SE-11 (1985) 367–375 7

Richardson, D.J, Clarke, L.A.: Partition Analysis: A Method Combining Testing and Verification. IEEE Trans. Software Eng., SE-11 (1985) 1477–1490 5

Richardson, D., Thompson, M.C.: The Relay Model for Error Detection and its Application. Proc. 2nd Wksp Soft. Testing, Verification, and Analysis. Ban. Alberta, ACM/Sigsoft and IEEE July 1988 223–230 4

Rothermel, G., Harrold, M. J.: Analyzing Regression Test Selection Techniques. IEEE Trans. Software Eng., 22 (8) (1996) 529–551 15

Stafford, J. A., Wolf, A. L.: Annotating Components to Support Component-Based Static Analyses of Software Systems, Proc. Grace Hopper Celeb. of Women in Computing (2001) 16

TESTNET-Integration of Testing Methodologies. < http://http://www-lor.int-evry.fr/testnet/ > 13

TGV-Test Generation from transitions systems using Verification techniques. On line at < http://www.inrialpes.fr/vasy/cadp/man/tgv.html > 8

Thevenod-Fosse P., Waeselynck H., Crouzet Y.: Software Statistical Testing. In [66] 253–272 5

J. Tretmans. Conformance Testing with Labeled Transition Systems: Implementation Relations and Test Generation. Computer Networks and ISDN Systems 29 (1996) 49–79 8

Vegas, S.: Characterisation Schema for Selecting Software Testing Techniques. Int. Software Engineering Research Network (ISERN’01) Strachclyde, Scotland (2001) 3, 6

Voas, J.M.: PIE: A Dynamic Failure-Based Technique. IEEE Trans. Software Eng. 18 (8) (1992) 717–727 4

Voas, J.M., Miller, K.W.: Software Testability: The New Verification. IEEE Software (May 1995) 17–28 14

Voas, J.: Certifying O.-the-Shelf Software Components. IEEE Computer (June 1998) 53–59 16

Voas, J.: Developing a Usage-Based Software Certification Process. IEEE Computer (August 2000) 32–37 16

Wang, Y., King, G., Wickburg, H.: A Method for Built-in Tests in Componentbased Software Maintenance, Proc. European Conference on Soft. Maintenance and Reengineering (1998) 186–189 16

Weyuker, E. J.: Translatability and Decidability Questions for Restricted Classes of Program Schemas. SIAM J. on Computers 8 (4) (1979) 587–598 7

Article MATH MathSciNet Google Scholar

Weyuker, E. J.: On Testing Non-testable Programs. The Computer Journal 25 (4) (1982) 465–470 10

Weyuker, E. J., Jeng, B.: Analyzing Partition Testing Strategies. IEEE Trans. Software Eng. 17 (7) (1991) 703–711 5

Weyuker, E. J., Ostrand, T. J.: Theories of Program Testing and the Application of Revealing Subdomains. IEEE Trans. Software Eng. SE-6 (1980) 236–246 5

Wood, M., Roper, M., Brooks, A., Miller, J.: Comparing and Combining Software Defect Detection Techniques: A Replicated Empirical Study. Proc. ESEC/FSE, LNCS 1301 (1997) 262–277 5

Zhu, H., Hall, P.A.V., May, J.H.R.: Software Unit Test Coverage and Adequacy. ACM Computing Surveys, 29 (1997) 366–427 4

Download references

Author information

Authors and affiliations.

ISTI-CNR, Area della Ricerca CNR di Pisa, Italy

Antonia Bertolino

You can also search for this author in PubMed Google Scholar

Editor information

Editors and affiliations.

Dipartimento di Informatica, Università di Pisa, Via Buonarotti 2, 56127, Pisa, Italy

Egon Börger

CEA, Università di Catania, Piazza Università 2, 95124, Catania, Italy

Angelo Gargantini

Dipartimento di Matematica e Informatica, Università di Catania, Città Universitaria, viale A. Doria 6, 95125, Catania, Italy

Elvinia Riccobene

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper.

Bertolino, A. (2003). Software Testing Research and Practice. In: Börger, E., Gargantini, A., Riccobene, E. (eds) Abstract State Machines 2003. ASM 2003. Lecture Notes in Computer Science, vol 2589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36498-6_1

Download citation

DOI : https://doi.org/10.1007/3-540-36498-6_1

Published : 14 March 2003

Publisher Name : Springer, Berlin, Heidelberg

Print ISBN : 978-3-540-00624-4

Online ISBN : 978-3-540-36498-6

eBook Packages : Springer Book Archive

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

Digital SAT Suite of Assessments

SAT Practice and Preparation

From free practice tests to a checklist of what to bring on test day, College Board provides everything you need to prepare for the digital SAT.

Step 1: Now

Download and install the Bluebook app.

Step 2: Two Weeks Before Test Day

Take a full-length practice test in Bluebook.

Step 3: Five Days Before Test Day

Complete exam setup in Bluebook and get your admission ticket.

Step 4: On Test Day

Arrive on time (check your admission ticket).

Studying and Practice Tests

Practice tests.

Find full-length practice tests on Bluebook™ as well as downloadable linear SAT practice tests.

Khan Academy

Official Digital SAT Prep on Khan Academy ® is free, comprehensive, and available to all students.

Assistive Technology

Get information on how to practice for the digital SAT if you're using assistive technology.

A young man sitting at a table with a calculator, typing on a laptop

My Practice

Take full-length digital SAT practice exams by first downloading Bluebook and completing practice tests. Then sign into My Practice to view practice test results and review practice exam items, answers, and explanations.

What to Bring and Do on Test Day

Find out everything you need to bring and do for the digital SAT.

SAT Student Guide (U.S.)

This guide provides helpful information for students taking the SAT during a weekend administration in Spring 2024.

SAT International Student Guide

A guide to the SAT for international students to learn how to prepare for test day. It covers the structure of the digital test, how to download the app and practice, information about policies, and testing rules.

SAT School Day Student Guide

Information about SAT School Day, sample test materials, and test-taking advice and tips.

SAT Practice Quick Start Guide

Learn how to practice for the SAT with this step-by-step guide.

Guía de inicio rápido de la práctica

Aprende cómo practicar para el SAT con esta guía de inicio rápido.

Why Should I Practice for the SAT?

This resource informs students about the benefits of practicing for the SAT and provides links to free practice resources.

¿Por qué debería practicar para el SAT?

Este folleto ofrece información sobre los beneficios de practicar para el SAT e incluye enlaces hacia recursos de práctica.

A Parent/Guardian's Guide to Official SAT Practice: Getting Your Teen Ready for the SAT

This resource provides parents and guardians with a schedule outline to help their child prepare for the SAT and includes links to free official practice materials.

A Parent/Guardian's Guide to Official SAT Practice: Getting Your Teen Ready for the SAT (Spanish)

Sat suite question bank: overview.

IEEE Account

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Federal Acquisition Regulation

Full far download in various formats.

FAC Number	Effective Date	HTML	DITA	PDF	Word	EPub	Apple Books	Kindle
2024-05	05/22/2024

Browse FAR Part/Subpart and Download in Various Formats

Parts/Subparts	HTML	DITA	Print

Data Initiatives
Regulations
Smart Matrix
Regulations Search
Acquisition Regulation Comparator (ARC)
Large Agencies
Small Agencies
CAOC History
CAOC Charter
Civilian Agency Acquisition Council (CAAC)
Federal Acquisition Regulatory Council
Interagency Suspension and Debarment Committee (ISDC)

ACQUISITION.GOV

An official website of the General Services Administration

Cheriton School of Computer Science systems and networking researchers receive 2024 CNOM Test of Time Paper Award

A team of systems and networking researchers from the Cheriton School of Computer Science has received the 2024 CNOM Test of Time Paper Award for “ Dynamic Controller Provisioning in Software Defined Networks ,” work that was presented originally at the IEEE/ACM/IFIP International Conference on Network and Service Management in 2013.

The CNOM Test of Time Paper Award recognizes exceptional papers published from 10 to 12 years in the past in flagship conferences and journals supported by CNOM, the IEEE Communications Society Technical Committee on Network Operation and Management. The prestigious annual award celebrates research that has been deemed outstanding and whose contents remain vibrant and useful today.

Under the direction of Professor Raouf Boutaba, graduate students Md. Faizul Bari, Arup Raton Roy, Shihabur Rahman Chowdhury, Qi Zhang, Mohamed Faten Zhani and Reaz Ahmed proposed a management framework that examines dynamic controller provisioning in software-defined networks. Since its publication in 2013, their paper has been cited almost 500 times as of June 2024 according to Google Scholar.

composite photo of the CNOM test-of-time award recipients

Left to right: Md. Faizul Bari, Arup Raton Roy, Shihabur Rahman Chowdhury, Qi Zhang, Mohamed Faten Zhani, Reaz Ahmed, Professor Raouf Boutaba

The CNOM award selection committee noted that the paper addresses the challenge of efficiently provisioning controllers in software defined networks to overcome limitations related to performance and scalability in large-scale wide-area network deployments. It is the first paper that introduces the Dynamic Controller Provisioning Problem and proposes a framework that dynamically adjusts the number and locations of controllers based on network conditions. Its contributions have paved the way for more efficient and scalable software defined network architectures, thereby influencing subsequent software defined network research and development efforts in the CNOM community and beyond.

More about this award-winning research

Software-defined networking is a new paradigm that uses network programming to configure and manage networks dynamically. By separating the control plane from the data plane and shifting the control plane to a conceptually centralized controller, software-defined networking allows network operators to implement a wide-range of network policies — such as routing, security, fault-tolerance — and to quickly deploy new network technologies.

The most common software-defined networking implementation used at the time the research team published their paper in 2013 relied on a logically centralized controller with a global view of the network. When a switch receives a new flow, it requests the controller to install appropriate forwarding rules along the desired flow path. The time required to complete this operation is known as the flow setup time. But in a large-scale wide-area network deployment, this rudimentary centralized approach has performance and scalability limits. First, it is not always possible to find an optimal placement of the controller that can ensure acceptable latencies between the controller and the switches in different geographic locations. Second, a single controller usually has a limited resource capacity and thus cannot handle large amount of flows originating from all the infrastructure switches. Here, the average flow setup time can rise significantly and degrade application and service performance

To address these limitations, proposals advocated deploying multiple controllers that work in tandem to better manage network traffic flows. But this approach introduces a new problem, namely, minimizing flow setup times by dynamically adapting the number of controllers and their locations according to demand fluctuations in the network. The researchers called this problem the Dynamic Controller Provisioning Problem .

Specifically, this problem requires enough controllers to handle the current network traffic, and their locations should ensure low switch-to-controller latencies. However, multi-controller deployment also requires regular state synchronization between the controllers to maintain a consistent view of the network. This communication overhead can be significant if the number of controllers in the network is large. Finally, as network traffic patterns and volumes at different locations can vary significantly over time, the controller placement scheme has to react to network hotspots and dynamically re-adjust the number and location of controllers. Hence, the solution to the Dynamic Controller Provisioning Problem requires finding the right trade-off between performance and overhead.

Earlier work by Heller and colleagues examined a static version of the problem where controller placement is fixed over time. These researchers analyzed the impact of the controller locations on the average and worst-case controller-to-switch propagation delay. However, a static controller placement configuration may not be suitable as network conditions can change over time.

To address this limitation, the Cheriton researchers proposed a management framework to dynamically deploy multiple controllers within a wide-area network. Specifically, they considered the dynamic version of the controller placement problem where both the numbers and locations of controllers are adjusted according to network dynamics. Their solution considered the dynamics of traffic patterns in the network, while minimizing costs for switch state collection, inter-controller synchronization, and switch-to-controller reassignment.

They formulated the Dynamic Controller Provisioning Problem mathematically as an integer linear program that considers all costs. They then proposed two heuristics that dynamically estimate the number of controllers and decide their placement to achieve the desired objectives. The effectiveness of their solution was then demonstrated using real-world traces and wide-area network topologies, with results that demonstrated that the proposed algorithms strike the right balance between the average flow setup time and inter-controller communication.

To learn more about this award-winning research upon which this article is based, please see Md. Faizul Bari, Arup Raton Roy, Shihabur Rahman Chowdhury, Qi Zhang, Mohamed Faten Zhani, Reaz Ahmed, Raouf Boutaba. Dynamic Controller Provisioning in Software Defined Networks , Proceedings of the 9 th International Conference on Network and Service Management (CNSM 2013), Zurich, Switzerland, 2013, pp. 18–25.

News by audience

Current undergraduate students (1)
Current graduate students (2)
Research Seminar (2)
Future undergraduate students (1)
Future graduate students (3)
Faculty (464)
Staff (442)
Alumni (389)
Parents (305)
Donors | Friends | Supporters (325)
Employers (299)
International (390)
Media (462)

News archive

February (1)
January (8)
December (6)
November (7)
October (7)
September (2)
February (4)
January (4)
December (2)
October (6)
September (6)
February (6)

Contact Computer Science

Work for Computer Science

Visit Computer Science

David R. Cheriton School of Computer Science University of Waterloo Waterloo, Ontario Canada N2L 3G1 Phone: 519-888-4567 ext. 33293 Fax: 519-885-1208

Contact Waterloo
Maps & Directions
Accessibility

The University of Waterloo acknowledges that much of our work takes place on the traditional territory of the Neutral, Anishinaabeg and Haudenosaunee peoples. Our main campus is situated on the Haldimand Tract, the land granted to the Six Nations that includes six miles on each side of the Grand River. Our active work toward reconciliation takes place across our campuses through research, learning, teaching, and community building, and is co-ordinated within the Office of Indigenous Relations .

COMMENTS

Software Testing Techniques: A Literature Review
Software testing is the process of evaluating software applications to determine whether there are any defects, ensuring the quality of the software product. However, software testing is generally ...
Artificial Intelligence in Software Testing : Impact, Problems
Artificial Intelligence is gradually changing the landscape of software engineering in general [5] and software testing in particular [6] both in research and industry as well. In the last two decades, AI has been found to have made a considerable impact on the way we are approach-ing software testing.
Artificial Intelligence in Software Testing: A Systematic Review
Software testing is a crucial component of software development. With the increasing complexity of software systems, traditional manual testing methods are becoming less feasible. Artificial Intelligence (AI) has emerged as a promising approach to software testing in recent years. This review paper aims to provide an in-depth understanding of the current state of software testing using AI. The ...
Mapping the structure and evolution of software testing research over
Research in software testing is growing and rapidly-evolving. Based on the keywords assigned to publications, we seek to identify predominant research topics and understand how they are connected and have evolved. ... Quantity versus impact of software engineering papers: a quantitative study. Scientometrics, 112 (2) (2017), pp. 963-1006, 10. ...
Artificial Intelligence Applied to Software Testing: A Tertiary Study
Skip 1INTRODUCTION Section 1 INTRODUCTION. Software testing (ST) and artificial intelligence (AI) are two research areas with a long and ripe history in computing. AI methodologies and techniques have been around for more than 50 years [] and, in the current century, with the advances in computational resources and the abundance of data, their potential has vastly increased.
Software Testing, Verification and Reliability
Software Testing, Verification and Reliability (STVR) is an international journal, publishing 8 issues per year. It publishes papers on theoretical and practical issues of software testing, verification and reliability. The goal of the journal is to publish high-quality papers that help researchers, educators and practitioners understand ...
Research on software testing techniques and software automation testing
Abstract: Software Testing is a process, which involves, executing of a software program/application and finding all errors or bugs in that program/application so that the result will be a defect-free software. Quality of any software can only be known through means of testing (software testing). Through the advancement of technology around the world, there increased the number of verification ...
A Literature Review on Software Testing Techniques
All the research work is maintained in a sequential manner, and ambiguous and unrelated studies were neglected. As this work is related to software testing techniques, there were an acceptance and excluding criteria to shortlist papers based on software testing techniques. Acceptance criteria are as follows: 1.
A Decade of Intelligent Software Testing Research: A Bibliometric Analysis
In this study, we used the Web of Science database to acquire bibliometric data about intelligent software testing papers which were conducted between 2012 and 2022, and we used Biblioshiny from the R bibliomerix package, alongside with VOSViewer in order to analyze the data and extract insights and answer research questions about the authors ...
1 Software Testing with Large Language Models: Survey, Landscape, and
Software Testing with Large Language Models: Survey, Landscape, and Vision Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, Qing Wang ... from both the software testing and LLMs perspectives. The paper presents a detailed discussion of the software testing tasks for ... research in this area, highlighting potential avenues for ex- ...
Software Testing Techniques: A Literature Review
Software testing is an inevitable part of the Software Development Lifecycle, and keeping in line with its criticality in the pre and post development process makes it something that should be catered with enhanced and efficient methodologies and techniques. ... This paper aims to discuss the existing as well as improved testing techniques for ...
Object-Oriented Software Testing: A Review
This paper presents a review about object-oriented software testing and to handle the challenges with the proposed techniques, methods, and different modeling approaches. OO testing is complex as it contains arbitrary components, and it is harder for testing teams than traditional models containing sequential components.
[2205.00210] Software Testing for Machine Learning
The paper provides a research agenda with elaborated directions for making progress toward advancing the state-of-the-art on testing of machine learning. Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2205.00210 [cs.SE] (or arXiv:2205.00210v1 [cs.SE] for this version)
A Comprehensive Bibliometric Assessment on Software Testing ...
The research study provides a comprehensive bibliometric assessment in the field of Software Testing (ST). The dynamic evolution in the field of ST is evident from the publication rate over the last six years. The research study is carried out to provide insight into the field of ST from various research bibliometric aspects. Our methodological approach includes dividing the six-year time ...
Software Testing Research and Practice
The paper attempts to provide a comprehensive view of the field of software testing. The objective is to put all the relevant issues into a unified context, although admittedly the overview is biased towards my own research and expertise. In view of the vastness of the field, for each topic problems and approaches are only briefly tackled, with ...
PDF Software Testing: A Research Travelogue (2000-2014)
for software testing. Where the Future of Software Engineering (FOSE) track is concerned, two such papers have appeared: Mary Jean Harrold's 2000 paper, "Testing: A Roadmap" [88] (already mentioned), and Antonia Bertolino's 2007 paper, "Software Test-ing Research: Achievements, Challenges, Dreams" [19]. We en-
Software-testing education: A systematic literature mapping
To more clearly understand, characterize and distinguish software-testing education in universities from training in industry, we model the concepts as a context diagram, as shown in Fig. 1.To clarify the focus of this SLM paper, we have highlighted our focus with a grey background in Fig. 1.On the left-hand side of Fig. 1 are the higher-education institutions that train students and produce ...
Software Testing Research Challenges: An Industrial Perspective
There have been rapid recent developments in automated software test design, repair and program improvement. Advances in artificial intelligence also have great potential impact to tackle software testing research problems. In this paper we highlight open research problems and challenges from an industrial perspective. This perspective draws on our experience at Meta Platforms, which has been ...
Introduction to Software Testing
- Use appropriate test terminology in communication; specifically: test fixture, logical test case, concrete test case, test script, test oracle, and fault. - Describe the motivations for white and black box testing. - Compare and contrast test-first and test-last development techniques. - Measure test adequacy using statement and branch coverage.
SAT Practice and Preparation
From free practice tests to a checklist of what to bring on test day, College Board provides everything you need to prepare for the digital SAT. Timeline. Step 1: Now Download and install the Bluebook app. Step 2: Two Weeks Before Test Day Take a full-length practice test in Bluebook. ...
Browse journals and books
Browse Calls for Papers beta. Browse 5,060 journals and 35,600 books. A; A Review on Diverse Neurological Disorders. ... The Nuclear Research Foundation School Certificate Integrated, Volume 2. Book ... Accelerated Testing and Validation. Testing, Engineering, and Management Tools for Lean Development. Book
Machine Learning Applied to Software Testing: A Systematic Mapping
Also, ML has been used to evaluate test oracle construction and to predict the cost of testing-related activities. The results of this paper outline the ML algorithms that are most commonly used to automate software-testing activities, helping researchers to understand the current state of research concerning ML applied to software testing.
FAR
FAC Number Effective Date HTML DITA PDF Word EPub Apple Books Kindle; 2024-05: 05/22/2024
Cheriton School of Computer Science systems and networking researchers
A team of systems and networking researchers from the Cheriton School of Computer Science has received the 2024 CNOM Test of Time Paper Award for "Dynamic Controller Provisioning in Software Defined Networks," work that was presented originally at the IEEE/ACM/IFIP International Conference on Network and Service Management in 2013. The CNOM Test of Time Paper Award recognizes