//img.uscri.be/pth/ca22242279cd5adc3372d5ff4afa08c7d86f527a
Cet ouvrage fait partie de la bibliothèque YouScribe
Obtenez un accès à la bibliothèque pour le lire en ligne
En savoir plus

Grid workflow scheduling based on incomplete information [Elektronische Ressource] / vorgelegt von Jörg Schneider

De
135 pages
Grid Work ow Scheduling based onIncomplete Informationvorgelegt vonDiplom-InformatikerJ org Schneideraus Berlinvon der Fakult at IV - Elektrotechnik und Informatikder Technischen Universit at Berlinzur Erlangung des akademischen GradesDoktor der Ingenieurwissenschaften{ Dr.-Ing. {genehmigte DissertationPromotionsausschuss:Vorsitzender: Prof. Dr. Peter PepperGutachter: Prof. Dr. Hans-Ulrich Hei hter: Prof. Dr. Odej KaoGutachter: Prof. Dr. Cesar de RoseTag der wissenschaftlichen Aussprache: 10.11.2009Berlin 2010D 83AcknowledgmentsI would like to thank especially my supervisor Prof. Dr. Hans-Ulrich Hei for theopportunity to work with him and to write this thesis. I really appreciated the inde-pendence and freedom working in his research group. I also thank Prof. Dr. Cesarde Rose, PUCRS Porto Alegre, and Prof. Dr. Odej Kao for serving as reviewer forthis thesis.Special thanks to my colleagues and the students I worked with for the dis-cussions, motivation, and support. In particular, I like to thank Sebastian Boelter,Dr. Lars-Olof Burchard, J org Decker, Stanimir Dragiev, Tiago Ferreto, Julius Gehr,Barry Linnert, Arthur Lochstampfer, and Robert Lubkoll.3Contents1 Introduction 92 Coupling Resources in the Grid 152.1 How to build up a Grid? . . . . . . . . . . . . . . . . . . . . . . . . . 152.2 Timing: Advance Reservation . . . . . . . . . . . . . . . . . . . . . . 172.2.1 Formalization . . . . . . . . . . . . . . . . . . . . . . . . . . .
Voir plus Voir moins

Grid Work ow Scheduling based on
Incomplete Information
vorgelegt von
Diplom-Informatiker
J org Schneider
aus Berlin
von der Fakult at IV - Elektrotechnik und Informatik
der Technischen Universit at Berlin
zur Erlangung des akademischen Grades
Doktor der Ingenieurwissenschaften
{ Dr.-Ing. {
genehmigte Dissertation
Promotionsausschuss:
Vorsitzender: Prof. Dr. Peter Pepper
Gutachter: Prof. Dr. Hans-Ulrich Hei hter: Prof. Dr. Odej Kao
Gutachter: Prof. Dr. Cesar de Rose
Tag der wissenschaftlichen Aussprache: 10.11.2009
Berlin 2010
D 83Acknowledgments
I would like to thank especially my supervisor Prof. Dr. Hans-Ulrich Hei for the
opportunity to work with him and to write this thesis. I really appreciated the inde-
pendence and freedom working in his research group. I also thank Prof. Dr. Cesar
de Rose, PUCRS Porto Alegre, and Prof. Dr. Odej Kao for serving as reviewer for
this thesis.
Special thanks to my colleagues and the students I worked with for the dis-
cussions, motivation, and support. In particular, I like to thank Sebastian Boelter,
Dr. Lars-Olof Burchard, J org Decker, Stanimir Dragiev, Tiago Ferreto, Julius Gehr,
Barry Linnert, Arthur Lochstampfer, and Robert Lubkoll.
3Contents
1 Introduction 9
2 Coupling Resources in the Grid 15
2.1 How to build up a Grid? . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Timing: Advance Reservation . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Formalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Manage advance reservations . . . . . . . . . . . . . . . . . . 23
2.2.3 Measuring Fragmentation . . . . . . . . . . . . . . . . . . . . 29
2.2.4 Typical properties of advance reservation systems . . . . . . . 33
2.3 Grid work ow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.1 Example Grid Work ows . . . . . . . . . . . . . . . . . . . . 36
2.3.2 Formalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.3 Heterogeneous Grid environment and moldable jobs . . . . . 40
2.3.4 Generating workloads containing Grid work ows . . . . . . . 40
2.4 The Virtual Resource Manager - A Grid Resource Manager . . . . . 43
3 Grid Work ow Languages 45
3.1 Work ow Language and Representations . . . . . . . . . . . . . . . . 45
3.2 Taxonomy of Grid Work ow Languages . . . . . . . . . . . . . . . . 47
3.2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.2 Language Properties . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.3 Representation Properties . . . . . . . . . . . . . . . . . . . . 53
3.3 Examples of Grid Work ow Languages . . . . . . . . . . . . . . . . . 54
3.3.1 YAWL - Yet another work ow language . . . . . . . . . . . . 54
3.3.2 AGWL and CGWL . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.3 GJobDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.4 GridAnt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.5 Unicore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.6 VRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4 Evaluation of the Taxonomy . . . . . . . . . . . . . . . . . . . . . . . 58
4 Fast Admission of Grid Work ows 61
4.1 What is a Good Online Scheduler? . . . . . . . . . . . . . . . . . . . 61
5Contents
4.2 Admission Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.1 Communication between a Grid Broker and Resources . . . . 65
4.2.2 between a User and Grid Brokers . . . . . . 66
4.2.3 Level of Commitment vs. Information Hiding . . . . . . . . . 67
4.3 Related Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Greedy Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4.1 Using single job admissions . . . . . . . . . . . . . . . . . . . 72
4.4.2 List Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.4.3 Reduce makespan: HEFTSync . . . . . . . . . . . . . . . . . 73
4.4.4 HEFTSyncBT: Backtracking Extension . . . . . . . . . . . . 76
4.4.5 HLST - Applying the HEFT Algorithm Backwards . . . . . . 76
4.5 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5.1 De ning a Constraint Satisfaction Problem . . . . . . . . . . 78
4.5.2 Optimization Function . . . . . . . . . . . . . . . . . . . . . . 80
4.5.3 Heterogeneous Grid environments and moldable jobs . . . . . 81
4.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6.2 In uence of the proposed admission Protocol . . . . . . . . . 84
4.6.3 Online scheduling algorithms . . . . . . . . . . . . . . . . . . 87
5 Re-Optimizing the Schedule 89
5.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2 What is a Good O ine Scheduler? . . . . . . . . . . . . . . . . . . . 91
5.3 Grid Level vs. Resource Level . . . . . . . . . . . . . . . . . . . . . . 92
5.4 Grid initiated optimization . . . . . . . . . . . . . . . . . . . . . . . 93
5.4.1 Extending the admission protocol . . . . . . . . . . . . . . . . 93
5.4.2 Optimization on Demand . . . . . . . . . . . . . . . . . . . . 95
5.4.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.4.4 Reduce the Search Space . . . . . . . . . . . . . . . . . . . . 97
5.5 Local initiated optimization . . . . . . . . . . . . . . . . . . . . . . . 98
5.5.1 Extending the Admission protocol . . . . . . . . . . . . . . . 98
5.5.2 Optimization Scheme . . . . . . . . . . . . . . . . . . . . . . 100
5.6 Local only optimization . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.6.1 Rerouting of network reservations . . . . . . . . . . . . . . . 101
5.7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.7.1 Optimization at Grid Level . . . . . . . . . . . . . . . . . . . 103
5.7.2 Network rerouting . . . . . . . . . . . . . . . . . . . . . . . . 104
6 Handling Resource Outages by Re-Scheduling 107
6.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2 Failure Model and Objectives . . . . . . . . . . . . . . . . . . . . . . 108
6Contents
6.3 De ning a Rescue horizon . . . . . . . . . . . . . . . . . . . . . . . . 109
6.3.1 Simple reference models . . . . . . . . . . . . . . . . . . . . . 110
6.3.2 Objectives for the rescue horizon . . . . . . . . . . . . . . . . 111
6.3.3 Load based horizon de nition . . . . . . . . . . . . . . . . . . 112
6.3.4 Feedback based horizon de nition . . . . . . . . . . . . . . . 113
6.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7 Conclusion and Future Work 119
Previously published work 123
Bibliography 125
71 Introduction
optimize handle resource
schedule downtime
While the processing power of single computers is continuously rising, there is still
a need for supercomputers as the size of the problems to be solved increases, too.
Most sup are now connected to the Internet making them easily ac-
cessible from all over the world and scientists even have a choice of using multiple
supercomputers of di erent providers. These providers are not only classical com-
pute centers, but also companies and scienti c institutions owning high performance
computers and selling unused capacity.
In order to use the potential of all the available computing power, it is necessary
to know the hardware and software equipment as well as administrative regulation
of each supercomputer. To solve this problem, the concept of Grid computing
was developed. The goal is to provide high performance computing power in a
reliable, standardized, and easy to use way, similar to electrical power in the energy
grid. In the same way as the energy grid, the computational Grid spans multiple
loosely coupled organizations providing services using a wide range of hardware.
The computational Grid even includes non-uniform services, i.e., also non-computer
resources like network bandwidth, storage capacity, or rare scienti c instruments.
A similar concept is the Cloud computing. The Cloud enables the access to
diverse resources, too. However, resources are booked for an unde ned time period
in the Cloud and provide capacity for repeated service usages, e.g., by webservice
calls. In the de nition used in this thesis, Grid jobs are executed only once and
report then the results back to the submitter. Despite a lot of similarities between
the Cloud and the Grid, the techniques developed in this thesis focus on Grid
environments with scienti c applications. However, some results can be adapted to
be used in the Cloud.
To use the variety of services and resources in a coordinated manner, the user
needs a more complex interface than the power plug in the energy grid. Grid
work ows have been established as a way to de ne complex Grid applications as a
set of jobs and to express the interdependencies between them. The user models
his application as a Grid work ow and submits it to a Grid broker. The Grid
broker then negotiates with all connected resource provider allocations matching the
resource and timing requirements of the user. The Grid broker can also guarantee
a speci c service level, e.g., that all involved resources provide security measures
or that there will be enough capacity to complete the Grid work ow by a user-
speci ed deadline. In this case the Grid broker also has to control whether all
9
fast specifies
admission input1 Introduction
optimize handle resource
schedule downtime
Figure 1.1: The whole lifecycle of a Grid work ow from the Grid brokers point of
view.
involved resources stick to the guaranteed service level.
The Grid broker can allocate resources most e ciently being in control of every
resource in the Grid with very detailed information about their state. However,
in a widely distributed and multi-organizational setup this is rarely possible. Re-
source providers desire to stay in charge of their resources and also control which
information is available about the resource and its state. Typical information to
hide is the load of the resource. This is necessary for a variety of reasons, like
preventing a competing company from monitoring the load and inferring about in-
ternal processes of the resource owner, e.g., the release cycles of future products.
Thus, the Grid broker utilizing such resources has to make its decisions with as
little information and control as possible.
This thesis provides means for the Grid broker to handle Grid work ows in such
an environment. There are four stages in the lifecycle of the Grid work ow from the
Grid broker’s point of view: speci cation, online scheduling, o ine scheduling, and
rescue scheduling. For all these four stages, the speci c requirements are analyzed
and techniques are presented to e ciently cope with them.
In order to provide guarantees for the deadline and to support co-allocation |
the allocation of multiple resources for the very same time | advance reservations
have to be used. In contrast to the widely used queuing approach, an advance
reservation enabled resource manager negotiates the actual execution time during
10
fast specifies
admission input