Studies in Continuous Black-box Optimization [Elektronische Ressource] / Tom Schaul. Gutachter: Hans-Joachim Bungartz ; H. Jürgen Schmidhuber. Betreuer: H. Jürgen Schmidhuber

technische_universitat_munchen

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

134 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Informations

Publié par	technische_universitat_munchen
Publié le	01 janvier 2011
Nombre de lectures	10
Langue	English
Poids de l'ouvrage	2 Mo

Extrait

Technische Universität München
Lehrstuhl VI: Echtzeitsysteme und Robotik
Studies in Continuous
Black-box Optimization
Tom Schaul
Vollständiger Abdruck der von der Fakultät für Informatik der Technischen
Universität München zur Erlangung des akademischen Grades eines
Doktors der Naturwissenschaften (Dr. rer. nat.)
genehmigten Dissertation.
Vorsitzender: Univ.-Prof. B. Brügge, Ph.D
Prüfer der Dissertation: 1. Univ.-Prof. Dr. H.-J. Bungartz
2. Univ.-Prof. Dr. H. J. Schmidhuber
DieDissertationwurdeam06.06.2011beiderTechnischenUniversitätMünchen
eingereicht und durch die Fakultät für Informatik am 28.06.2011 angenommen.iiAbstract
ptimization is the research ﬁeld that studies that studies the design of
O algorithms for ﬁnding the best solutions to problems we humans throw
at them. While the whole domain is of important practical utility, the present
thesis will focus on the subﬁeld of continuous black-box optimization, presenting
a collection of novel, state-of-the-art algorithms for solving problems in that
class.
First, we introduce a general-purpose algorithm called Natural Evolution
Strategies (NES). In contrast to typical evolutionary algorithms which search
in the vicinity of the ﬁttest individuals in a population, evolution strategies
aim at repeating the type of mutations that led to those individuals. We can
characterize those mutations by a search distribution. The key idea of NES is
to ascend the gradient on the parameters of that distribution towards higher
expected ﬁtness. We show how plain gradient ascent is destined to fail, and
provide a viable alternative that instead descends along the natural gradient
to adapt the search distribution, which appropriately normalizes the update
step with respect to its uncertainty. Being derived from ﬁrst principles, the
NES approach can be extended to all types of search distributions that allow a
parametric form, not just the classical multivariate Gaussian one. We derive a
numberofNESvariantsfordiﬀerentdistributions,andshowhowtheyareuseful
on diﬀerent problem classes. In addition, we rein in the computational cost,
avoidingcostly matrixinversionsthroughanincrementalchangeof coordinates.
Two additional, novel techniques, importance mixing and adaptation sampling,
allow us to automatically tune the learning rate and batch size to the problem,
and thereby further reduce the average number of required ﬁtness evaluations.
A third technique, restart strategies, provides the algorithm with additional
robustness in the presence of multiple local optima, or noise.
Second,weintroduceanewapproachtocostly black-boxoptimization,when
ﬁtnessevaluationsareveryexpensive. Here,we modelthe ﬁtnessfunctionusing
state-of-the-art Gaussian process regression, and use the principle of artiﬁcial
curiosity to direct exploration towards the most informative next evaluation
candidate. Boththeexpectedﬁtnessimprovementandtheexpectedinformation
gaincanbederivedexplicitlyfromtheGaussianprocessmodel,andourmethod
constructsa frontof Pareto-optimalpoints accordingto these twocriteria. This
iiimakes the exploration-exploitation trade-oﬀ explicit, and permits maximally
informed candidate selection.
We empirically validate our algorithms on a broad set of benchmarks. We
also include a study of how continuous black-box optimization can solve chal-
lengingneuro-evolutionproblems,wheremulti-dimensionalrecurrentneuralnet-
works are trained to play the game of Go. At the same time, we establish to
what degree those network architectures can transfer good playing behavior
from small to large board sizes.
In summary, this dissertation presents a collection of novel algorithms, for
the general problem of continuous black-box optimization as well as a number
of special cases, each validated empirically.
ivDedicated to
Joé Schaul
In memory of
Adina Mosincat
vviTable of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
List of Figures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
0.1 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
0.2 Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . xviii
0.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
1 Introduction: Continuous Black-box Optimization . . . . . . 1
1.1 Problem Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Continuous Optimization: Real-valued Solution Space . . 2
1.1.2 Black-box Optimization: Unknown Function . . . . . . . 2
1.1.3 Local Optimization: Find a Nearby Optimum . . . . . . . 3
1.1.4 Noisy Optimization: Functions Corrupted by Noise . . . . 3
1.1.5 Multi-objective Optimization . . . . . . . . . . . . . . . . 4
1.2 Evaluating Optimization Algorithms . . . . . . . . . . . . . . . . 5
1.3 State-of-the-Art Approaches . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Evolutionary Methods . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Response Surface Methods . . . . . . . . . . . . . . . . . 7
1.4 Impact and Applications . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Related Problem Domains . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Open Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Natural Evolution Strategies . . . . . . . . . . . . . . . . . . . 11
2.1 The NES Family . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Chapter Outline . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Search Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
vii2.2.1 Search Gradients for Gaussian Distributions . . . . . . . . 14
2.2.2 Limitations of Plain Search Gradients . . . . . . . . . . . 15
2.3 Using the Natural Gradient . . . . . . . . . . . . . . . . . . . . . 16
2.4 Performance and Robustness Techniques . . . . . . . . . . . . . . 19
2.4.1 Fitness Shaping . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.2 Fitness Baselines . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.3 Importance Mixing . . . . . . . . . . . . . . . . . . . . . . 22
2.4.4 Adaptation Sampling. . . . . . . . . . . . . . . . . . . . . 24
2.4.5 Restart Strategies . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Techniques for Multinormal Distributions . . . . . . . . . . . . . 28
2.5.1 Using Exponential Parameterization . . . . . . . . . . . . 28
2.5.2 Using Natural Coordinates . . . . . . . . . . . . . . . . . 29
2.5.3 OrthogonalDecompositionofMultinormalParameterSpace 31
2.5.4 Connection to CMA-ES . . . . . . . . . . . . . . . . . . . 32
2.5.5 Elitism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Use for Multi-objective NES . . . . . . . . . . . . . . . . . 37
2.6 Beyond Multinormal Distributions . . . . . . . . . . . . . . . . . 38
2.6.1 Separable NES . . . . . . . . . . . . . . . . . . . . . . . . 38
2.6.2 Rotationally-symmetric Distributions . . . . . . . . . . . 40
Sampling from Radial Distributions . . . . . . . . . . . . 41
Computing the Fisher Information Matrix . . . . . . . . . 41
2.6.3 Heavy-tailed NES . . . . . . . . . . . . . . . . . . . . . . 42
Groups of Invariances . . . . . . . . . . . . . . . . . . . . 43
Cauchy Distribution . . . . . . . . . . . . . . . . . . . . . 44
2.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.7.1 Experimental Setup and Hyperparameters . . . . . . . . . 46
2.7.2 Black-box Optimization Benchmarks . . . . . . . . . . . . 47
2.7.3 Separable NES . . . . . . . . . . . . . . . . . . . . . . . . 48
Separable and Non-separable Benchmarks . . . . . . . . . 48
Neuro-evolution. . . . . . . . . . . . . . . . . . . . . . . . 53
Lennard-Jones Potentials . . . . . . . . . . . . . . . . . . 54
2.7.4 Heavy Tails and Global Optimization . . . . . . . . . . . 56
2.7.5 Results Summary . . . . . . . . . . . . . . . . . . . . . . . 57
2.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3 Curiosity-driven Optimization . . . . . . . . . . . . . . . . . . 61
3.1 Artiﬁcial Curiosity . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.1.2 Artiﬁcial Curiosity as a Guide for Optimization . . . . . . 62
3.1.3 Formal Framework . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Exploration/Exploitation Trade-oﬀ.. . . . . . . . . . . . . . . . . 63
3.3 Curiosity-driven Optimization (General Form) . . . . . . . . . . 65
3.4 Models of Expected Fitness and Information Gain . . . . . . . . 65
viii3.4.1 A Good Model Choice: Gaussian Processes . . . . . . . . 66
3.4.2 Derivation of Gaussian Process Information Gain . . . . . 66
3.5 Curiosity-driven Optimization with Gaussian Processes. . . . . . 67
3.6 Minimal Asymptotic Requirements . . . . . . . . . . . . . . . . . 68
3.6.1 Reaching Optima at Arbitrary Distance . . . . . . . . . . 68
3.6.2 Locating Optima with Arbitrary Precisi