La lecture en ligne est gratuite
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
Télécharger Lire

Self healing distributed systems [Elektronische Ressource] / presented by Benjamin Satzger

De
238 pages
G0HCNRSOOMKABUSelf-healing distributed systemsDissertationfor the degree ofDoctor of Natural Sciences (Dr. rer. nat.)submitted to theDepartment of Computer Scienceof the University of Augsburgpresented byBenjamin Satzgerin 2008Examiner: Prof. Dr. rer. nat. Theo UngererCo-examiner: Prof. Dr. rer. nat. Bernhard BauerDate of oral examination: 2008-12-18AbstractThe growing complexity of distributed systems demands for new ways ofcontrol. This work addresses self-healing in distributed environments. Theterm self-healing represents a quite new area of research and is used in afairly broad way, but can be seen as dynamic fault tolerance. This workproposes generic concepts and algorithms to build self-healing systems.The detection of node failures in distributed environments is a non-trivialproblem. Failure detectors are an important component of many fault tol-erant distributed systems. In this work a new failure detection algorithm isproposed with noteworthy features like a high flexibility and good perfor-mance. Furthermore an approach is presented to save the message overheadof failure detectors.New grouping algorithms are introduced in this work to enable a scalableself-monitoring property. This allows an autonomous installation of moni-toring relations in complex large scale distributed systems.A failure recovery engine based on automated planning, which manages adistributed system according to user-defined objectives, is proposed.
Voir plus Voir moins

Self-healing distributed systems
Dissertation
for the degree of
Doctor of Natural Sciences (Dr. rer. nat.)
submitted to the
Department of Computer Science
of the University of Augsburg
presented by
Benjamin Satzger
in 2008
S
C
O
KAB
N
M
H
O
G
R
0
UExaminer: Prof. Dr. rer. nat. Theo Ungerer
Co-examiner: Prof. Dr. rer. nat. Bernhard Bauer
Date of oral examination: 2008-12-18Abstract
The growing complexity of distributed systems demands for new ways of
control. This work addresses self-healing in distributed environments. The
term self-healing represents a quite new area of research and is used in a
fairly broad way, but can be seen as dynamic fault tolerance. This work
proposes generic concepts and algorithms to build self-healing systems.
The detection of node failures in distributed environments is a non-trivial
problem. Failure detectors are an important component of many fault tol-
erant distributed systems. In this work a new failure detection algorithm is
proposed with noteworthy features like a high flexibility and good perfor-
mance. Furthermore an approach is presented to save the message overhead
of failure detectors.
New grouping algorithms are introduced in this work to enable a scalable
self-monitoring property. This allows an autonomous installation of moni-
toring relations in complex large scale distributed systems.
A failure recovery engine based on automated planning, which manages a
distributed system according to user-defined objectives, is proposed. It is
able to generate and execute plans to autonomously recover a system from
unwanted states.
Finally, ideas for a generic self-healing architecture for highly complex dis-
tributed systems are presented. The design is based on psychological and
sociological concepts.
iZusammenfassung
Aufgrund der zunehmenden Komplexitat¨ verteilter Systeme werden neue
Steuerungs- und Administrierungsmethodiken benotigt.¨ Die vorliegende
Arbeit befasst sich mit der Thematik der Selbstheilung in verteilten Umge-
bungen. Der Begriff Selbstheilung stellt einen relativ neuen Forschungs-
bereich dar und wird thematisch breit benutzt, kann jedoch als dynami-
sche Fehlertoleranz aufgefasst werden. Diese Arbeit schlagt¨ generische
Konzepte zur Erstellung selbstheilender Systeme vor.
Das Erkennen von Knotenausfallen¨ in verteilten Systemen ist ein nicht-
triviales Problem. Fehlerdetektoren sind eine wichtige Komponente vieler
fehlertoleranter verteilter Systeme. Diese Arbeit fuhrt¨ einen neuen, beson-
ders flexiblen Fehlerdetektionsalgorithmus mit guten Erkennunsraten ein.
Zusatzlich¨ wird ein Ansatz prasentiert,¨ der den Einsatz von Fehlerdetek-
toren effizienter gestaltet.
Es werden neue Gruppierungsalgorithmen eingefuhrt,¨ die eine skalier-
¨bare Selbstuberwachung¨ ermoglichen¨ und Uberwachungsbeziehungen au-
tonom aufbauen.
Eine Fehlerbehebungskomponente basierend auf einem automatischen
Planungsansatz wird vorgestellt, die ein verteiltes System gemaߨ be-
¨nutzerdefinierter Ziele verwaltet. Sie ist in der Lage, Plane zu gener-
ieren und auszufuhr¨ en, um selbstandig¨ einen spezifizierten Systemzustand
wiederherzustellen.
Den Abschluss dieser Arbeit bilden Ideen einer generischen Architektur fur¨
hochkomplexe selbstheilende Systeme, basierend auf psychologischen und
soziologischen Konzepten.
iiiAcknowledgements
First, I would like to thank my adviser Prof. Dr. Theo Ungerer for his
excellent mentoring. I have not only learnt about computer science, but
also about performing research, organising things, and communicating with
others.
I would also like to thank my co-chair Prof. Dr. Bernhard Bauer for his
dedication to my writing and Prof. Dr. Elisabeth Andre´ for accepting my
request for being examiner.
This is a great opportunity to thank all my colleagues at the University of
Augsburg for their support, discussions, and comments on my work.
I would like to thank all members of the Organic Computing research com-
munity for inspiring me with their excellent research.
I am grateful for my sources of funding: the priority program 1183 “Organic
Computing” of the German Research Foundation (DFG) and the Bavarian
state government.
Finally, I wish to thank my wife Melanie Lucas-Satzger who is a great help
in anything but computer science.
vDedicated to my family

Un pour Un
Permettre à tous d'accéder à la lecture
Pour chaque accès à la bibliothèque, YouScribe donne un accès à une personne dans le besoin