Date of this Version
Björn Barrefors. Dynamic Data Management In A Data Grid Environment. Master’s thesis, University of Nebraska-Lincoln, 1400 R Street, Lincoln, NE, USA 68588, 12 2015.
A data grid is a geographically distributed set of resources providing a facility for computationally intensive analysis of large datasets to a large number of geographically distributed users. In the scientific community, data grids have become increasingly popular as scientific research is driven by large datasets. Until recently, developments in data management for data grids have focused on management of data at lower layers in the data grid architecture. With dataset sizes expected to approach exabyte scale in coming years, data management in data grids are facing a new set of challenges. In particularly, the problem of automatically placing and deleting data replicas to optimally use grid resources.
This thesis describes a dynamic data management framework to handle automatic replica creation and deletion in a data grid environment. The dynamic data manager uses machine learning to predict data popularity and balance the system for improved end-user performance. We implement the dynamic data manager for CMS, one of the largest high-energy physics experiments in the world, and evaluate the performance of the deployed system.
Adviser: David Swanson