Computer Science and Engineering, Department of


Date of this Version

Fall 12-4-2015

Document Type



Björn Barrefors. Dynamic Data Management In A Data Grid Environment. Master’s thesis, University of Nebraska-Lincoln, 1400 R Street, Lincoln, NE, USA 68588, 12 2015.


A THESIS Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Master of Science, Major: Computer Science, Under the Supervision of Professor David Swanson. Lincoln, Nebraska: December, 2015

Copyright © 2015 Björn Barrefors.

The code for this thesis can be found at


A data grid is a geographically distributed set of resources providing a facility for computationally intensive analysis of large datasets to a large number of geographically distributed users. In the scientific community, data grids have become increasingly popular as scientific research is driven by large datasets. Until recently, developments in data management for data grids have focused on management of data at lower layers in the data grid architecture. With dataset sizes expected to approach exabyte scale in coming years, data management in data grids are facing a new set of challenges. In particularly, the problem of automatically placing and deleting data replicas to optimally use grid resources.

This thesis describes a dynamic data management framework to handle automatic replica creation and deletion in a data grid environment. The dynamic data manager uses machine learning to predict data popularity and balance the system for improved end-user performance. We implement the dynamic data manager for CMS, one of the largest high-energy physics experiments in the world, and evaluate the performance of the deployed system.

Adviser: David Swanson