Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.
Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Scalability of multi-threaded applications on manycore processors
Modern applications deploy multiple threads to take advantage of the manycore processors. However, many recent studies have shown that neither applications' execution time nor energy consumption consistently reaps the expected benefits from multi-thread execution on manycore processors. This dissertation investigates the scalability of multi-threaded Java applications and parallel I/Os, and seeks to design and implement multiple thread schedulers to improve both performance and energy consumption of manycore processor systems. ^ As the first example, garbage collection of the multi-threaded Java applications unexpectedly takes longer with more CPUs. This behavior not only prolongs the execution time but also incurs higher energy consumption. We discover that parallelism’s ability to improve scalability can also hinder performance by introducing more competition for limited heap resources. Our study shows that Completely Fair Scheduler is responsible for increased competition among the threads. If the threads are instead scheduled in a first-in-first-out order, the scalability of both the garbage collector and applications can be significantly improved. ^ Based on this observation, we propose three approaches to optimally selecting scheduling policies. Our first approach uses the profile information collected from one execution to fine-tune subsequent executions. Our second approach, ARES, dynamically collects information and performs on-the-fly policy selection during execution. Our third approach, PASS, identifies execution phases with parallelism during execution and selects a different scheduler for those periods accordingly. ^ As the second example, NUMA-based NVMe devices use multiple I/O threads to achieve higher throughput and shorter latency. However, the access penalty of an NVMe SSD attached to a remote CPU affects both the scalability of multi-I/O-threaded applications and the energy consumption of the CPU. Our comprehensive study shows that the penalty resulting from CPU contention is much smaller than that of accessing remote NVMe SSDs in a NUMA system. Based on this insight, we develop a performance-energy algorithm to select "the lesser of the two evil". The proposed algorithm, ESN, is an energy-efficient profiling-based I/O threads scheduler to manage I/O threads accessing NVMe SSDs on NUMA systems. Our empirical evaluation shows that ESN can achieve optimal I/O performance but consume up to 75% less energy by using fewer CPUs. ^
Qian, Junjie, "Scalability of multi-threaded applications on manycore processors" (2016). ETD collection for University of Nebraska - Lincoln. AAI10141645.