Grid workflow scheduling based on incomplete information [Elektronische Ressource] / vorgelegt von Jörg Schneider

technische_universitat_berlin - Joerg Schneider

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

135 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Sujets

Informatik

Informations

Publié par	technische_universitat_berlin
Publié le	01 janvier 2010
Nombre de lectures	9
Langue	English
Poids de l'ouvrage	2 Mo

Extrait

Grid Work ow Scheduling based on
Incomplete Information
vorgelegt von
Diplom-Informatiker
J org Schneider
aus Berlin
von der Fakult at IV - Elektrotechnik und Informatik
der Technischen Universit at Berlin
zur Erlangung des akademischen Grades
Doktor der Ingenieurwissenschaften
{ Dr.-Ing. {
genehmigte Dissertation
Promotionsausschuss:
Vorsitzender: Prof. Dr. Peter Pepper
Gutachter: Prof. Dr. Hans-Ulrich Hei hter: Prof. Dr. Odej Kao
Gutachter: Prof. Dr. Cesar de Rose
Tag der wissenschaftlichen Aussprache: 10.11.2009
Berlin 2010
D 83Acknowledgments
I would like to thank especially my supervisor Prof. Dr. Hans-Ulrich Hei for the
opportunity to work with him and to write this thesis. I really appreciated the inde-
pendence and freedom working in his research group. I also thank Prof. Dr. Cesar
de Rose, PUCRS Porto Alegre, and Prof. Dr. Odej Kao for serving as reviewer for
this thesis.
Special thanks to my colleagues and the students I worked with for the dis-
cussions, motivation, and support. In particular, I like to thank Sebastian Boelter,
Dr. Lars-Olof Burchard, J org Decker, Stanimir Dragiev, Tiago Ferreto, Julius Gehr,
Barry Linnert, Arthur Lochstampfer, and Robert Lubkoll.
3Contents
1 Introduction 9
2 Coupling Resources in the Grid 15
2.1 How to build up a Grid? . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Timing: Advance Reservation . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Formalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Manage advance reservations . . . . . . . . . . . . . . . . . . 23
2.2.3 Measuring Fragmentation . . . . . . . . . . . . . . . . . . . . 29
2.2.4 Typical properties of advance reservation systems . . . . . . . 33
2.3 Grid work ow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.1 Example Grid Work ows . . . . . . . . . . . . . . . . . . . . 36
2.3.2 Formalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.3 Heterogeneous Grid environment and moldable jobs . . . . . 40
2.3.4 Generating workloads containing Grid work ows . . . . . . . 40
2.4 The Virtual Resource Manager - A Grid Resource Manager . . . . . 43
3 Grid Work ow Languages 45
3.1 Work ow Language and Representations . . . . . . . . . . . . . . . . 45
3.2 Taxonomy of Grid Work ow Languages . . . . . . . . . . . . . . . . 47
3.2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.2 Language Properties . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.3 Representation Properties . . . . . . . . . . . . . . . . . . . . 53
3.3 Examples of Grid Work ow Languages . . . . . . . . . . . . . . . . . 54
3.3.1 YAWL - Yet another work ow language . . . . . . . . . . . . 54
3.3.2 AGWL and CGWL . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.3 GJobDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.4 GridAnt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.5 Unicore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.6 VRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4 Evaluation of the Taxonomy . . . . . . . . . . . . . . . . . . . . . . . 58
4 Fast Admission of Grid Work ows 61
4.1 What is a Good Online Scheduler? . . . . . . . . . . . . . . . . . . . 61
5Contents
4.2 Admission Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.1 Communication between a Grid Broker and Resources . . . . 65
4.2.2 between a User and Grid Brokers . . . . . . 66
4.2.3 Level of Commitment vs. Information Hiding . . . . . . . . . 67
4.3 Related Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Greedy Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4.1 Using single job admissions . . . . . . . . . . . . . . . . . . . 72
4.4.2 List Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.4.3 Reduce makespan: HEFTSync . . . . . . . . . . . . . . . . . 73
4.4.4 HEFTSyncBT: Backtracking Extension . . . . . . . . . . . . 76
4.4.5 HLST - Applying the HEFT Algorithm Backwards . . . . . . 76
4.5 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5.1 De ning a Constraint Satisfaction Problem . . . . . . . . . . 78
4.5.2 Optimization Function . . . . . . . . . . . . . . . . . . . . . . 80
4.5.3 Heterogeneous Grid environments and moldable jobs . . . . . 81
4.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6.2 In uence of the proposed admission Protocol . . . . . . . . . 84
4.6.3 Online scheduling algorithms . . . . . . . . . . . . . . . . . . 87
5 Re-Optimizing the Schedule 89
5.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2 What is a Good O ine Scheduler? . . . . . . . . . . . . . . . . . . . 91
5.3 Grid Level vs. Resource Level . . . . . . . . . . . . . . . . . . . . . . 92
5.4 Grid initiated optimization . . . . . . . . . . . . . . . . . . . . . . . 93
5.4.1 Extending the admission protocol . . . . . . . . . . . . . . . . 93
5.4.2 Optimization on Demand . . . . . . . . . . . . . . . . . . . . 95
5.4.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.4.4 Reduce the Search Space . . . . . . . . . . . . . . . . . . . . 97
5.5 Local initiated optimization . . . . . . . . . . . . . . . . . . . . . . . 98
5.5.1 Extending the Admission protocol . . . . . . . . . . . . . . . 98
5.5.2 Optimization Scheme . . . . . . . . . . . . . . . . . . . . . . 100
5.6 Local only optimization . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.6.1 Rerouting of network reservations . . . . . . . . . . . . . . . 101
5.7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.7.1 Optimization at Grid Level . . . . . . . . . . . . . . . . . . . 103
5.7.2 Network rerouting . . . . . . . . . . . . . . . . . . . . . . . . 104
6 Handling Resource Outages by Re-Scheduling 107
6.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2 Failure Model and Objectives . . . . . . . . . . . . . . . . . . . . . . 108
6Contents
6.3 De ning a Rescue horizon . . . . . . . . . . . . . . . . . . . . . . . . 109
6.3.1 Simple reference models . . . . . . . . . . . . . . . . . . . . . 110
6.3.2 Objectives for the rescue horizon . . . . . . . . . . . . . . . . 111
6.3.3 Load based horizon de nition . . . . . . . . . . . . . . . . . . 112
6.3.4 Feedback based horizon de nition . . . . . . . . . . . . . . . 113
6.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7 Conclusion and Future Work 119
Previously published work 123
Bibliography 125
71 Introduction
optimize handle resource
schedule downtime
While the processing power of single computers is continuously rising, there is still
a need for supercomputers as the size of the problems to be solved increases, too.
Most sup are now connected to the Internet making them easily ac-
cessible from all over the world and scientists even have a choice of using multiple
supercomputers of di erent providers. These providers are not only classical com-
pute centers, but also companies and scienti c institutions owning high performance
computers and selling unused capacity.
In order to use the potential of all the available computing power, it is necessary
to know the hardware and software equipment as well as administrative regulation
of each supercomputer. To solve this problem, the concept of Grid computing
was developed. The goal is to provide high performance computing power in a
reliable, standardized, and easy to use way, similar to electrical power in the energy
grid. In the same way as the energy grid, the computational Grid spans multiple
loosely coupled organizations providing services using a wide range of hardware.
The computational Grid even includes non-uniform services, i.e., also non-computer
resources like network bandwidth, storage capacity, or rare scienti c instruments.
A similar concept is the Cloud computing. The Cloud enables the access to
diverse resources, too. However, resources are booked for an unde ned time period
in the Cloud and provide capacity for repeated service usages, e.g., by webservice
calls. In the de nition used in this thesis, Grid jobs are executed only once and
report then the results back to the submitter. Despite a lot of similarities between
the Cloud and the Grid, the techniques developed in this thesis focus on Grid
environments with scienti c applications. However, some results can be adapted to
be used in the Cloud.
To use the variety of services and resources in a coordinated manner, the user
needs a more complex interface than the power plug in the energy grid. Grid
work ows have been established as a way to de ne complex Grid applications as a
set of jobs and to express the interdependencies between them. The user models
his application as a Grid work ow and submits it to a Grid broker. The Grid
broker then negotiates with all connected resource provider allocations matching the
resource and timing requirements of the user. The Grid broker can also guarantee
a speci c service level, e.g., that all involved resources provide security measures
or that there will be en