Computing, School of

 

School of Computing: Dissertations, Theses, and Student Research

Accessibility Remediation

If you are unable to use this item in its current form due to accessibility barriers, you may request remediation through our remediation request form.

First Advisor

David R. Swanson

Second Advisor

Ying Lu

Date of this Version

Summer 7-25-2011

Document Type

Thesis

Comments

A thesis presented to the faculty of the Graduate College at the University of Nebraska in partial fulfillment of requirements For the Degree of Master of Science

Major: Computer Science

Under the Supervision of Professor David Swanson and Professor Ying Lu. Lincoln, Nebraska: May 2011

Copyright 2011 Chen He

Abstract

Molecular Dynamics (MD) simulation is a computationally intensive application used in multiple fields. It can exploit a distributed environment due to inherent computational parallelism. However, most of the existing implementations focus on performance enhancement. They may not provide fault-tolerance for every time-step.

MapReduce is a framework first proposed by Google for processing huge amounts of data in a distributed environment. The simplicity of the programming model and fault- tolerance for node failure during run-time make it very popular not only for commercial applications but also in scientific computing.

In this thesis, we develop a novel communication-free and each time-step fault- tolerant solution for MD simulation based on Hadoop MapReduce (MDMR). Through emulation of Hadoop MapReduce and introduction of a run-time program monitor, we can predict the execution time of a given size MD simulation system. We also demonstrate the performance and energy consumption improvement from implementing MDMR in a hybrid MapReduce environment with GPU hardware (MDMR-G).

To evaluate MDMR, we construct a 32 node MapReduce cluster and a run-time MapReduce program monitor. We emulate MDMR and propose a prediction formula of MDMR execution time for Map and Reduce stages. The emulation results demonstrate our formula can predict MDMR execution time within 9.1% variance. Our run-time monitor shows that MDMR can obtain high computational power efficiency for large MD simulation systems. We also build a hybrid MapReduce cluster with GPGPU. MDMR in this environment obtains 20 times speedup and reduces energy consumption 95% compared with the same size cluster without GPGPU accelerators.

Advisor: David Swanson and Ying Lu

Share

COinS