Date of this Version
Concurrency faults are one of the most damaging types of faults that can affect the dependability of today’s computer systems. Currently, concurrency faults such as process-level races, order violations, and atomicity violations represent the largest class of faults that has been reported to various Linux bug repositories. Clearly, existing approaches for testing such faults during software development processes are not adequate as these faults escape in-house testing efforts and are discovered during deployment and must be debugged.
The main reason concurrency faults are hard to test is because the conditions that allow these to occur can be difficult to replicate, causing them to appear non- deterministically. Once these faults have been discovered during deployment and reported back to engineers, they are still very challenging to reproduce for the same reason. Furthermore, since concurrency faults can be complex, it is difficult for users to diagnose faults correctly. This can lead to bug reports that do not contain sufficient information or are totally incorrect.
The goal of this dissertation is to make the process of reproducing concurrency faults more effective and efficient. Effectiveness means that we can reproduce faults more deterministically, and engineers can continue to debug applications in spite of incomplete reports. Efficiency means that using our proposed approaches, engineers take less time to perform the debugging process. This includes less time to develop detectors, less time to identify applications that can instigate reported faults, and less time to run the applications to reproduce reported faults. The results of our empirical evaluations reveal that the proposed systems collectively allow concurrency fault reproduction to be more effective, efficient, and accurate.
Advisors: Witawas Srisa-an and Gregg Rothermel