Date of this Version
In this paper, we propose an architectural approach, Supplementary Partial Parity (SPP), to addressing the availability issue of parity encoded RAID systems. SPP exploits free storage space and idle time to generate and update a set of partial parity units that cover a subset of disks (or data stripe units) during failure-free and idle/lightly-loaded periods, thus supplementing the existing full parity units for improved availability. By applying the exclusive OR operations appropriately among partial parity, full parity and data units, SPP can reconstruct the data on the failed disks with a fraction of the original overhead that is proportional to the partial parity coverage, thus significantly reducing the overhead of data regeneration, especially under heavy workload. By providing redundant parity coverage, SPP can potentially tolerate more than one disk failure with much better flexibility, thus significantly improving the system’s reliability and availability.
Due to its supplementary nature, SPP provides a more efficient and flexible redundancy protection mechanism than the conventional full parity approach. SPP offers multiple optional levels depending on partial parity coverage and performance/cost targets. According to the actual workload and the available re-source, the SPP approach can be adaptively and dynamically activated, deactivated and adjusted while the original RAID system continues to serve user requests on-line. We conduct extensive trace-driven experiments to evaluate the performance of the SPP approach. The experiments results demonstrate that SPP significantly improves the reconstruction time and user response time simultaneously.