Date of this Version
Perpetual availability is an important operational goal in today's computer systems. However, achieving this goal is challenging because modern software systems contain faults that can cause them to fail. For example, multi-threading is widely used in modern software to fully utilize the computing capability of multicore processors. However, employing multi-threading can lead to concurrency faults such as deadlock and data race that are notoriously difficult to to isolate, detect, and repair.Data races, which involves two concurrent accesses to the same data where at least one is a write, are the most common concurrency faults.
As our first step, we investigate the main sources of race detection overhead and find that a large effort is spent on repeatedly monitoring operations that cannot cause data races or have already been identified as causes of races. Based on these observations, we propose two orthogonal optimizations for race detection: Stationary Object Suppression (SOS) and Loop Iteration Sampling (LIS). SOS employs a dynamic program analysis technique to filter out Stationary Objects; which are read-only objects that can be shared by multiple threads. As such, they can never participate in data races. By eliminating monitoring operations on Stationary Objects, SOS can detect up to six times more races within an overhead budget than Pacer, a state-of-the-art sampling based race detector.
Although SOS can greatly reduce the number of objects to monitor for race detection, it repeatedly monitors identified sources of races. A further investigation shows that loops in a program substantially contribute to occurrence of such repetitive data races. We propose a sampling based race detector, LIS, which adjusts sampling rate for data access operations within loops to be inversely proportional to number of iterations.
To achieve perpetual availability, the next step is to address these software faults as they are detected during deployment. We propose a race healing system that can automatically generate and apply repairs during program execution. The system applies a fix immediately after a race is detected to prevent the race from occurring again.
Advisers: Witawas Srisa-an and Matthew B. Dwyer