Date of this Version
Cluster computing has become an important paradigm for solving large-scale problems. However, as the size of a cluster increases, so does the complexity of resource management and maintenance. Therefore, automated performance control and re- source management are expected to play critical roles in sustaining the evolution of cluster computing. The current cluster scheduling practice is similar in sophistication to early supercomputer batch scheduling algorithms, and no consideration is given to desired quality-of-service (QoS) attributes. To fully avail the power of computational clusters, new scheduling algorithms that provides high performance, QoS assurance, fault-tolerance, energy savings and streamlined management of the cluster resources needs to be developed.
The challenge, however, in developing real-time scheduling algorithms for cluster and grid computing is to support various types of applications. Broadly speaking, computational loads submitted to a cluster can be categorized into three types: sequential, modularly divisible and arbitrarily divisible. An arbitrarily divisible work- load model is a good approximation of many real-world applications, e.g., distributed search for a pattern in text, audio, graphical, and database files; distributed processing of big measurement data files; and many simulation problems. All elements in such an application often demand an identical type of processing, and relative to the huge total computation, the processing on each individual element is infinitesimally small. As such applications become a major type of cluster workloads and thus providing QoS to arbitrarily divisible loads becomes a significant problem for cluster-based research computing facilities.
The problem of providing performance guarantees to divisible load applications has not been studied systematically. The objective of this dissertation is to provide assured QoS performance to cluster and grid applications through the development of new real-time scheduling theory and algorithms, particularly, real-time divisible load scheduling algorithms for cluster computing. We develop and apply real-time scheduling algorithms for cluster computing, providing QoS for the gird and High Performance Computing (HPC) applications. In this dissertation, we address the aforementioned challenges by investigating and developing 1) real-time scheduling algorithms for divisible loads, 2) a real-time scheduling algorithm for divisible loads with advance resource reservation, 3) an efficient real-time divisible load scheduling algorithm for large clusters and 4) feedback-control based real-time divisible load scheduling algorithms that provide predictable performance in unpredictable environments.
Advisor: Ying Lu