Date of this Version
Molecular Dynamics (MD) simulation is a computationally intensive application used in multiple fields. It can exploit a distributed environment due to inherent computational parallelism. However, most of the existing implementations focus on performance enhancement. They may not provide fault-tolerance for every time-step.
MapReduce is a framework first proposed by Google for processing huge amounts of data in a distributed environment. The simplicity of the programming model and fault- tolerance for node failure during run-time make it very popular not only for commercial applications but also in scientific computing.
In this thesis, we develop a novel communication-free and each time-step fault- tolerant solution for MD simulation based on Hadoop MapReduce (MDMR). Through emulation of Hadoop MapReduce and introduction of a run-time program monitor, we can predict the execution time of a given size MD simulation system. We also demonstrate the performance and energy consumption improvement from implementing MDMR in a hybrid MapReduce environment with GPU hardware (MDMR-G).
To evaluate MDMR, we construct a 32 node MapReduce cluster and a run-time MapReduce program monitor. We emulate MDMR and propose a prediction formula of MDMR execution time for Map and Reduce stages. The emulation results demonstrate our formula can predict MDMR execution time within 9.1% variance. Our run-time monitor shows that MDMR can obtain high computational power efficiency for large MD simulation systems. We also build a hybrid MapReduce cluster with GPGPU. MDMR in this environment obtains 20 times speedup and reduces energy consumption 95% compared with the same size cluster without GPGPU accelerators.
Adviser: David Swanson and Ying Lu