Local evaluation of policies for discounted Markov decision problems [Elektronische Ressource] / vorgelegt von Andreas Tuchscherer

technische_universitat_berlin

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

208 pages

Deutsch

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Sujets

Mathematics

Informations

Publié par	technische_universitat_berlin
Publié le	01 janvier 2010
Nombre de lectures	19
Langue	Deutsch
Poids de l'ouvrage	3 Mo

Extrait

Local Evaluation of Policies
for Discounted Markov Decision Problems
vorgelegt von
Dipl.-Math. oec. Andreas Tuchscherer
Von der Fakultat II { Mathematik und Naturwissenschaften
der Technischen Universitat Berlin
zur Erlangung des akademischen Grades
Doktor der Naturwissenschaften
{ Dr. rer. nat. {
genehmigte Dissertation
Berichter: Prof. Dr. Dr. h.c. mult. Martin Grotschel
Prof. Dr. Jorg Rambau
Vorsitzender: Prof. Dr. Fredi Troltzsch
Tag der wissenschaftlichen Aussprache: 8. Dezember 2010
Berlin 2010
D 83Zusammenfassung
Das Bestimmen realistischer Indikatoren fur die Gute von Online-Algorith-
men fur gegebene Online-Optimierungsprobleme ist eine schwierige Aufga-
be. Bisher ubliche Ansatze, wie die kompetitive Analyse, weisen erhebliche
Nachteile auf. Sofern stochastische Informationen uber zukunftige Anfragen
zur Verfugung stehen, konnten Markov-Entscheidungs-Probleme (MPDs) ei-
ne geeignete Alternative darstellen. Da jedoch die Anzahl an Zustanden in
MDPs, die aus praktischen Anwendungen hervorgehen, ublicherweise expo-
nentiell in den ursprunglichen Eingabeparametern ist, sind Standardverfah-
ren zur Analyse von Strategien bzw. Online-Algorithmen nicht anwendbar.
In dieser Arbeit wird ein neues algorithmisches Verfahren zur lokalen
Evaluierung der Gute von Strategien fur diskontierte MDPs vorgestellt. Der
Ansatz basiert auf einem Spaltengenerierungsalgorithums zur Approximati-
on der gesamten erwarteten diskontierten Kosten einer unbekannten optima-
len Strategie, einer gegebenen Strategie oder einer einzelnen Entscheidung.
Der Algorithmus bestimmt eine"-Approximation, indem lediglich ein relativ
kleiner lokaler Teil des gesamten Zustandsraumes untersucht wird. Es wird
gezeigt, dass die Anzahl der zur Approximation erforderlichen Zustande un-
abhangig von der Gesamtzahl an Zustanden ist. Die berechneten Approxi-
mationen sind normalerweise deutlich besser als die theoretische Schranken,
die andere Ansatze liefern.
Neben dem Pricing-Problem wird die Struktur der linearen Programme
untersucht, die in der Spaltengenerierung auftreten. Daruber hinaus werden
verschiedene Erweiterungen des Algorithmus vorgeschlagen und analysiert.
Diese zielen darauf ab, gute Approximationen schnell berechnen zu konnen.
Das Potential des Verfahrens wird beispielhaft anhand von diskontier-
ten MDPs untersucht, die aus Online-Optimierungsproblemen hervorgehen.
Bei diesen handelt es sich um das Bin-Coloring-Problem, ein Terminvergabe-
Problem und ein Aufzugssteuerungs-Problem. Das Verfahren ist zumeist in
der Lage, Indikatoren fur die Gute von Online-Algorithmen zu bestimmen, die
Beobachtungen aus Simulationen deutlich besser widerspiegeln als die kompe-
titive Analyse. Auerdem lassen sich Schwachstellen der betrachteten Online-
Algorithmen aufdecken. Dadurch konnte ein neuer Online-Algorithmus fur
das Bin-Coloring-Problem entwickelt werden, der auch in Simulationen bes-
ser abschneidet als bisher existierende Algorithmen.
iiiAbstract
Providing realistic performance indicators of online algorithms for a given
online optimization problem is a dicult task in general. Due to signi-
cant drawbacks of other concepts like competitive analysis, Markov decision
problems (MDPs) may yield an attractive alternative whenever reasonable
stochastic information about future requests is available. However, the num-
ber of states in MDPs emerging from real applications is usually exponential
in the original input parameters. Therefore, the standard methods for ana-
lyzing policies, i. e., online algorithms in our context, are infeasible.
In this thesis we propose a new computational tool to evaluate the behav-
ior of policies for discounted MDPs locally, i. e., depending on a particular
initial state. The method is based on a column generation algorithm for
approximating the total expected discounted cost of an unknown optimal
policy, a concrete policy, or a single action (which assumes actions at other
states to be made according to an optimal policy). The algorithm determines
an"-approximation by inspecting only relatively small local parts of the total
state space. We prove that the number of states required for providing the
approximation is independent of the total number of states, which underlines
the practicability of the algorithm. The approximations obtained by our al-
gorithm are typically much better than the theoretical bounds obtained by
other approaches.
We investigate the pricing problem and the structure of the linear pro-
grams encountered in the column generation. Moreover, we propose and
analyze di erent extensions of the basic algorithm in order to achieve good
approximations fast.
The potential of our analysis tool is exemplied for discounted MDPs
emerging from di erent online optimization problems, namely online bin col-
oring, online target date assignment, and online elevator control. The results
of the experiments are quite encouraging: our method is mostly capable to
provide performance indicators for online algorithms that much better re ect
observations made in simulations than competitive analysis does. Moreover,
the analysis allows to reveal weaknesses of the considered online algorithms.
This way, we developed a new online algorithm for the online bin coloring
problem that outperforms existing ones in our analyses and simulations.
vAcknowledgments
This thesis emerged from the projects\Combinatorial aspects of logistics"and
\Stability, sensitivity, and robustness in combinatorial online-optimization"
of the DFG Research Center Matheon. Firstly, I want to thank Martin
Grotschel for giving me the opportunity to collaborate in these projects, for
his support, and for the freedom he gave me concerning the main focus of
my research. My thanks also go to Ralf Borndorfer for providing the general
conditions for me at the Zuse Institute Berlin to nish my thesis.
Related to the considered subject, I especially thank Jorg Rambau for
coming up with this exciting topic, his valuable advice, his encouragement,
and much support with regard to contents. The cooperation with him was
very pleasant, unfortunately it was intensive only for a short time since he
moved to Bayreuth (and I was not willing to do so). First studies concerning
the local evaluation of policies for discounted Markov decision problems have
been done in collaboration with Jorg Rambau, Stefan Heinz, Volker Kaibel,
and Matthias Peinhardt. I am grateful for the nice and fruitful joint research.
Particularly, I thank Benjamin Hiller for many inspiring discussions, valuable
feedback to my preliminary results, and some support concerning the use of
ALT X. I also thank Siarhei Makarevich who was a great help by implementingE
parts of the code for the proposed method and by providing many of the
computational results and associated illustrations.
Moreover, I am indebted to my proofreaders Benjamin Hiller, Siarhei
Makarevich, and Cornelia Tuchscherer. In particular, Benjamin’s feedback
and suggestions were very fruitful and helped me a lot to improve the pre-
sentation.
I thank my parents Cornelia and Wolfgang and my parents-in-law Brigitte
and Karl-Heinz for all their support in the last months by often taking care
of my son Rafael. At last, I have to mention two very important persons.
Special thanks go to my wife Susanne for her patience and bearing the stress-
ful time we recently had. And, I exceedingly thank Jesus Christ for making
all of this possible.
vii