Computer Science and Engineering, Department of


Date of this Version

Fall 12-14-2012


A DISSERTATION Presented to the Faculty of The Graduate School at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Doctor of Philosophy, Major: Engineering, Under the Supervision of Professors Hong Jiang and Sharad C. Seth. Lincoln, Nebraska: December, 2012

Copyright (c) 2012 Dongyuan Zhan


Judicious management of on-chip last-level caches (LLC) is critical to alleviating the memory wall of chip multiprocessors (CMP). Although there already exist many LLC management proposals, belonging to either the spatial or temporal dimension, they fail to capture and utilize the inherent interplays between the two dimensions in capacity management. Therefore, this dissertation is targeted at exploring and exploiting the spatiotemporal interactions in LLC capacity management to improve CMPs' performance. Based on this general idea, we address four specific research problems in the dissertation.

For the private LLC organization, prior-art proposals can improve the efficacy of inter-core cooperative caching at the coarse-grained application level. However, they are still suboptimal because they are unable to take advantage of the diverse capacity demands at the fine-grained set level. We introduce the SNUG LLC design that exploits the set-level non-uniformity of capacity demands and thus further improves performance.

Still for the private LLC management, we notice that neither spatial nor temporal LLC management schemes, working separately as in prior work, can deliver robust performance under various circumstances due to set-level non-uniform capacity demands. We propose a novel adaptive scheme, called STEM, to solve the problem by interactively managing both spatial and temporal dimensions of capacity demands at the set level.

For the shared LLC organization, existing proposals try to improve either locality or utility for heterogeneous workloads. But we find that none of them can deliver consistently the best performance under a variety of workloads due to applications' diverse locality and utility features. To address the problem, we present the CLU LLC design that co-optimizes the locality & utility of co-scheduled threads and thus adapts to more diverse workloads than the prior-arts.

To make a cache management strategy practical for industry, we will need to cut the overhead of the re-reference prediction value (RRPV). We observe that delicately-tuned replacement policies rooted in single-bit RRPVs can closely approximate the performance of their correspondents with log{associativity}-bit RRPVs. Therefore, we propose a novel practical shared LLC design, called COOP, which entails a 1-bit RRPV per cacheline and a lightweight monitor per core for locality & utility co-optimization. At a considerably low storage cost, COOP achieves higher performance than the two recent practical replacement policies that rely on 2-bit RRPVs but are oriented towards locality optimization only.

Advisers: Hong Jiang and Sharad C. Seth