Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.

Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Enabling distributed scientific computing on the campus

Derek Weitzel, University of Nebraska - Lincoln

Abstract

Campus research computing has evolved from many small decentralized resources, such as individual desktops, to fewer, larger centralized resources, such as clusters. This change has been necessitated by the increasing size of researcher's workloads, but this change has harmed the researcher's user experience. We propose to improve the user experience on the computational resources by creating an overlay cluster they are able to control. This overlay should transparently scale to national cyberinfrastructure as the user's demands increase. We explore methods for improving the user experience when submitting jobs on a campus grid. To this end, we created a remote submission and overlay computational framework called Bosco. This framework can remotely submit processing from the user's laptop to clusters on the campus or on national cyberinfrastructure. To illustrate the possibilities of improving the user experience of remote submission, we created BoscoR, an interface to Bosco in the popular statistics and data processing programming language, R. Bosco improves the user experience of submitting to campus clusters, while also being an efficient method for job management. In order to solve some of the issues with data distribution on opportunistic resources, we created the CacheD, a data management framework for managing and provisioning storage resources on the campus. The CacheD additionally optimizes transfers to multiple resources by using the peer-to-peer transfer protocol, BitTorrent. Further, the CacheD optimizes shared data between multiple jobs by caching the input data directly on the execution resources. The CacheD decreases the stage-in time over current transfer methods and significantly decreases stage-in time when the data is already cached. Finally, we explain how to control data distribution on a campus through a comprehensive policy framework. This framework is implemented in the CacheD. We present the policy language, its currently available attributes, and how to extend the policy language beyond the default behavior. Multiple examples are given for different data distribution scenarios observed on campus resources. Combining easy-to-use campus job submission with Bosco, efficient data distribution with the CacheD, and a policy language to manage the data distribution, we have created a unified framework for campus computing.

Subject Area

Computer science

Recommended Citation

Weitzel, Derek, "Enabling distributed scientific computing on the campus" (2015). ETD collection for University of Nebraska-Lincoln. AAI3716626.
https://digitalcommons.unl.edu/dissertations/AAI3716626

Share

COinS