Date of this Version
IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 7, JULY 2014
Most chip-multiprocessors nowadays adopt a large shared last-level cache (SLLC). This paper is motivated by our analysis and evaluation of state-of-the-art cache management proposals which reveal a common weakness. That is, the existing alternative replacement policies and cache partitioning schemes, targeted at optimizing either locality or utility of co-scheduled threads, cannot deliver consistently the best performance under a variety of workloads. Therefore, we propose a novel adaptive scheme, called CLU, to interactively co-optimize the locality and utility of co-scheduled threads in thread-aware SLLC capacity management. CLU employs lightweight monitors to dynamically profile the LRU (least recently used) and BIP (bimodal insertion policy) hit curves of individual threads on runtime, enabling the scheme to co-optimize the locality and utility of concurrent threads and thus adapt to more diverse workloads than the existing approaches. We provide results from extensive execution-driven simulation experiments to demonstrate the feasibility and efficacy of CLU over the existing approaches (TADIP, NUCACHE, TA-DRRIP, UCP, and PIPP).