Computer Science and Engineering, Department of


Date of this Version



Published in Journal of Physics: Conference Series 180 (2009) 012047. Copyright 2009 IOP Publishing Ltd.


Hadoop is an open-source data processing framework that includes a scalable, fault- tolerant distributed file system, HDFS. Although HDFS was designed to work in conjunction with Hadoop's job scheduler, we have re-purposed it to serve as a grid storage element by adding GridFTP and SRM servers. We have tested the system thoroughly in order to understand its scalability and fault tolerance. The turn-on of the Large Hadron Collider (LHC) in 2009 poses a significant data management and storage challenge; we have been working to introduce HDFS as a solution for data storage for one LHC experiment, the Compact Muon Solenoid (CMS).