Computing, School of

First Advisor

Hong Jiang

Second Advisor

Sharad C. Seth

Date of this Version

Fall 12-14-2012

Document Type

Dissertation

Comments

A dissertation presented to the faculty of the Graduate School at the University of Nebraska in partial fulfillment of requirements for the degree of Doctor of Philosophy

Major: Engineering, Under the supervision of Professors Hong Jiang and Sharad C. Seth. Lincoln, Nebraska, December 2012

Abstract

Judicious management of on-chip last-level caches (LLC) is critical to alleviating the memory wall of chip multiprocessors (CMP). Although there already exist many LLC management proposals, belonging to either the spatial or temporal dimension, they fail to capture and utilize the inherent interplays between the two dimensions in capacity management. Therefore, this dissertation is targeted at exploring and exploiting the spatiotemporal interactions in LLC capacity management to improve CMPs' performance. Based on this general idea, we address four specific research problems in the dissertation.

For the private LLC organization, prior-art proposals can improve the efficacy of inter-core cooperative caching at the coarse-grained application level. However, they are still suboptimal because they are unable to take advantage of the diverse capacity demands at the fine-grained set level. We introduce the SNUG LLC design that exploits the set-level non-uniformity of capacity demands and thus further improves performance.

Still for the private LLC management, we notice that neither spatial nor temporal LLC management schemes, working separately as in prior work, can deliver robust performance under various circumstances due to set-level non-uniform capacity demands. We propose a novel adaptive scheme, called STEM, to solve the problem by interactively managing both spatial and temporal dimensions of capacity demands at the set level.

For the shared LLC organization, existing proposals try to improve either locality or utility for heterogeneous workloads. But we find that none of them can deliver consistently the best performance under a variety of workloads due to applications' diverse locality and utility features. To address the problem, we present the CLU LLC design that co-optimizes the locality & utility of co-scheduled threads and thus adapts to more diverse workloads than the prior-arts.

To make a cache management strategy practical for industry, we will need to cut the overhead of the re-reference prediction value (RRPV). We observe that delicately-tuned replacement policies rooted in single-bit RRPVs can closely approximate the performance of their correspondents with log{associativity}-bit RRPVs. Therefore, we propose a novel practical shared LLC design, called COOP, which entails a 1-bit RRPV per cacheline and a lightweight monitor per core for locality & utility co-optimization. At a considerably low storage cost, COOP achieves higher performance than the two recent practical replacement policies that rely on 2-bit RRPVs but are oriented towards locality optimization only.

Advisors: Hong Jiang and Sharad C. Seth

Download

Included in

Computer and Systems Architecture Commons

COinS

Computing, School of

School of Computing: Dissertations, Theses, and Student Research

Spatiotemporal Capacity Management for the Last Level Caches of Chip Multiprocessors

First Advisor

Second Advisor

Date of this Version

Document Type

Comments

Abstract

Included in

Search

Browse

Author Corner

Links

Computing, School of

School of Computing: Dissertations, Theses, and Student Research

Spatiotemporal Capacity Management for the Last Level Caches of Chip Multiprocessors

Authors

First Advisor

Second Advisor

Date of this Version

Document Type

Comments

Abstract

Included in

Share

Search

Browse

Author Corner

Links