Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.
Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Scalable file systems and operating systems support for big data applications
Abstract
The decades-old concepts and assumptions behind traditional file system design have been rendered partially invalid. They are now sources of performance bottlenecks by the arrival of Big Data computing at the application level, and emerging substrate technologies such as Manycore CPUs and Storage Class Memory (SCM) at the architecture level. This dissertation starts by conducting a thorough literature review to provide a big picture of existing data-management solutions, as well as the state-of-art OS scalability research in the context of manycore CPUs and SCM devices. This dissertation then presents three related but orthogonal solutions that tackle the file system scalability issues from different layers. The first solution is a distributed file-search service called Propeller. The main challenge in this research is to provide a high-performance and accurate file-search service without significantly impacting intensive IOs in large file systems. Thus, in this work, we explore and exploit the application-aware access patterns, captured by the Access-Causality Graphs (ACGs), to assist Propeller in optimizing the file-index partitioning strategy. By applying the ACGs captured from applications, Propeller is able to retain access locality within a small index, which further improves the index performance. In the second solution, we re-evaluate the design of modern file systems, and re-think and re-conceptualize the file system namespace. We propose a new form of file system: searchable file system, in which, the identity of files and directories are based on user-queries. A prototype of this new form of file system, called Versatile Searchable File System (VSFS), is implemented to demonstrate the feasibility and benefits of such a new file system. The third solution proposed in this dissertation is to evaluate the scalability of the latest Linux kernel storage stack on top of Manycore and SCM. Our evaluations credibly demonstrate that the current Linux storage stack scales poorly on high-core-count NUMA systems. It strongly suggests the Linux kernel developers to revise the shared-memory model for the design of the Linux storage stack. A distributed Virtual File System is proposed to eliminate the cache coherence overhead for directory entry and to improve the I/O parallelism by reducing the contention on locks.
Subject Area
Computer Engineering
Recommended Citation
Xu, Lei, "Scalable file systems and operating systems support for big data applications" (2014). ETD collection for University of Nebraska-Lincoln. AAI3631019.
https://digitalcommons.unl.edu/dissertations/AAI3631019