Computer Science and Engineering, Department of


Date of this Version



University of Nebraska-Lincoln, Computer Science and Engineering
Technical Report # TR-UNL-CSE-2005-0009


Most of the parallelism associated with scientific/numeric applications exists in the form of loops, and thus transforming loops has been extensively studied in the past, especially in the areas of programming languages and compiler designs. Almost all the existing transformation approaches are control-centric, in which the transformation process starts from partitioning the iteration space, followed by the decomposition of the data space only as a side-effect. Originally designed for shared-memory multi-processors, these control-centric approaches might not be suitable under some circumstances for current loosely-coupled clusters and the Grid with physically distributed memories. In this paper, we introduce a novel data-centric and design-pattern based approach called DCDP to transform loops and automatically generate parallel code to execute on clusters. DCDP partitions the data space first, and then generates the processing code for each data partition, so as to minimize data communication and synchronization among cluster nodes. For the sake of generating more efficient parallel code, DCDP incorporates the notion of design pattern popularly used in software engineering into the field of parallel compiler. Instead of generating MPI-like code, DCDP outputs self-contained objects to distribute and execute on cluster nodes, thus exploiting the advantages of object-oriented programming. In addition, a feedback mechanism is employed by DCDP to gather and analyze dynamic runtime environment information to further optimize the parallelization process. To evaluate the feasibility and advantages of DCDP, we have designed and implemented a proof-of-concept compiler called PJava, and compared the performance of the code generated automatically by PJava to that of the handcrafted JOPI (Java Object-Passing Interface, a Java dialect of MPI) code on the benchmark application – matrix multiplication. Experiments show that the PJava-generated code achieves comparable performance to the JOPI code.