Data from: Comprehensive Evaluation of Association Measures for Fault Localization
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
This record contains the underlying research data for the publication "Comprehensive Evaluation of Association Measures for Fault Localization" and the full-text is available from: https://ink.library.smu.edu.sg/sis_research/1330
In statistics and data mining communities, there have been many measures proposed to gauge the strength of association between two variables of interest, such as odds ratio, confidence, Yule-Y, Yule-Q, Kappa, and gini index. These association measures have been used in various domains, for example, to evaluate whether a particular medical practice is associated positively to a cure of a disease or whether a particular marketing strategy is associated positively to an increase in revenue, etc. This paper models the problem of locating faults as association between the execution or non-execution of particular program elements with failures. There have been special measures, termed as suspiciousness measures, proposed for the task. Two state-of-the-art measures are Tarantula and Ochiai, which are different from many other statistical measures. To the best of our knowledge, there is no study that comprehensively investigates the effectiveness of various association measures in localizing faults. This paper fills in the gap by evaluating 20 wellknown association measures and compares their effectiveness in fault localization tasks with Tarantula and Ochiai. Evaluation on the Siemens programs show that a number of association measures perform statistically comparable as Tarantula and Ochiai.