Data-driven system identification via evolutionary retrieval of Takagi-Sugeno fuzzy models [Elektronische Ressource] / von: Ingo Renners

otto-von-guericke-universitat_magdeburg - Ingo Renners

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

183 pages

Deutsch

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

Sujets

Informatik

Informations

Publié par	otto-von-guericke-universitat_magdeburg
Publié le	01 janvier 2004
Nombre de lectures	21
Langue	Deutsch
Poids de l'ouvrage	4 Mo

Extrait

Data Driven System Identiﬁcation
via
Evolutionary Retrieval of Takagi Sugeno Fuzzy Models
Dissertation
zur Erlangung des akademischen Grades
Doktoringenieur (Dr. Ing.)
vorgelegt der Fakultat¨ fur¨ Informatik
der Otto von Guericke Universit at¨ Magdeburg
von: Dipl. Inf. Ingo Renners
geb. am 16. August 1969 in Opladen
Soest, den 9. Marz¨ 2004Zusammenfassung
Systemidentiﬁkation hat die Aufgabe, eine Anzahl von zusammengehorenden¨ Kom
ponenten der realen Welt in einem Modell abzubilden. Wenn diese Abbildung durch
den Transfer von menschlichem Expertenwissen in ein Modell geschieht, wird dies als
wissensbasierte Modellierung bezeichnet. Wenn die Informationen uber¨ das System
allerdings nur implizit und formlos in Datenbestanden¨ vorliegen, wird die Abbildung
dieses Wissens mit Hilfe von Algorithmen als datengetriebene Modellierung bezeich
net.
In dieser Arbeit wird vorgeschlagen, fur¨ die datengetriebene Systemidenti
ﬁzierung die Klasse der sogenannten Takai Sugeno Fuzzy Modelle zu benutzen.
Dies wird durch das Vorhandensein effektiver Lernalgorithmen fur¨ diese Klasse von
Modellen begrundet.¨ Des weiteren ist es oft vorteilhaft, die bei der Systemidenti
ﬁzierung gefundenen Modelle auch interpretieren zu konnen.¨ Daher wird auf die
Formulierung verschiedener Interpretierbarkeitsfaktoren, welche zu einem objektiven
und leicht zu implementierendeneitsmaß fur¨ Takagi Sugeno Modelle
zusammengefuhrt¨ werden konnen,¨ besonderer Wert gelegt.
Um optimale Strukturen der Modelle zu identiﬁzieren, werden neue Konzepte
¨aus dem Bereich der Heuristik, speziell der evolutionaren Berechnungsmethoden,
als generell nutzbare Suchmethode angewendet. Optimale und schlanke Modell
strukturen sind in Hinsicht auf Genauigkeit, aber insbesondere im Hinblick auf die
Generalisierungfahigk¨ eit von Modellen sehr wunschenswert.¨ Allerdings spielt die
notwendige Kodierung von potentiellen Modellen innerhalb einer kunstlichen¨ Evolu
tion eine bedeutende, wenn nicht sogar die entscheidende Rolle. Aus diesem Grunde
wird in dieser Arbeit eine in diesem Zusammenhang neuartige Methode der Kodierung
vorgeschlagen. Dabei wird der Suchraum eines evolutionaren¨ Algorithmus durch
sogenannte Genotyp Schablonen aufgespannt, welche mit Hilfe einer kontextfreien
Grammatik formuliert werden.
iiiiv
Die vorgeschlagene Methode zur Systemidentiﬁzierung mittels Takagi Sugeno
Modellen wird dann an einem kunstlichen¨ und einem komplexen realen Problem
getestet. In der realen Problemstellung geht es um die Identiﬁkation von Modellen,
welche die Toxizitat¨ von Molekulen¨ vorhersagen. Diese Modelle sollen also einen
Zusammenhang von einfach zu messenden oder zu berechnenden Eigenschaften von
Molekulen,¨ sogenannten molekularen Deskriptoren, zu deren Giftigkeit aufdecken
und herstellen.Abstract
System identiﬁcation is the task to map several related components of a real world
system into a model. If this is done by transferring human expertise into a model, the
process is called knowledge driven modeling. If the system information is embedded
in data bases and the implicit existent expertise is mapped by algorithms into a model,
the process is called data driven modeling.
This thesis suggests for data driven system identiﬁcation the class of Takagi
Sugeno fuzzy models as target. This class of models provides the possibility to make
use of powerful learning algorithms. On the other hand the human interpretability of
the resulting models can be assured.
Because of this, necessary interpretability factors are worked out and an objective
interpretability measure for Takagi Sugeno fuzzy models is formulated.
Evolutionary computation, as a general search method, is used to identify an
optimal model structure. Optimal and sparse model structures are desirable for reasons
of accuracy and generalization capability. The way in which candidate solutions (i.e.
models), are encoded in evolutionary algorithms is a central factor in population based
search methods. The author proposes a novel grammar based method to formulate
genotype templates. These templates will be used to deﬁne the genotype search space.
The presented approach of data driven system identiﬁcation via evolutionary re
trieval of Takagi Sugeno fuzzy models is tested with artiﬁcial data and with a complex
real world dataset considering the prediction of molecular toxicity.
vAcknowledgments
First of all I want to thank my wife. The possibility to get the impression that my real
love is a computer was always existent but never true nor accused.
I also want to thank Prof. Grauel, one of the most reliable men I know, for steady
support, motivating discussions and holding off much of the bureaucracy many people
have to ﬁght with. I consider the granted scientiﬁc freedom during my research as a
valuable present.
This scientiﬁc freedom also was supported by the Ministry of Sciences and Re
search, North Rhine Westphalia, through ﬁnancial support and especially within the
European Union project COMET. Concerning this project my special thanks go to
Dr. Benfenati who provided me with the newest toxicity dataset used in the successor
project IMAGETOX.
I also want to thank Prof. Kruse who recommended this doctorate. Furthermore
he and his colleague Dr. Borgelt provided some very helpful suggestions which ac
counts for the completion of this thesis.
Finally I want to honor the idea of open source software, allowing me and mil
lions of other people to use thousands of algorithms and applications for free.
viContents
1 Introduction 1
1.1 Problem Statement and Motivation . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Organization of the Content . . . . . . . . . . . . . . . . . . 3
1.3.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 System Identiﬁcation 6
2.1 Model Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Scaled Models . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Flowcharts . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Look Up Tables . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Mathematical Models . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Application Areas of System Identiﬁcation . . . . . . . . . . . . . . . 9
2.2.1 System Identiﬁcation with Computational Intelligence . . . . 10
2.2.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.5 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.6 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Tasks in System Identiﬁcation . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Selecting a Model Class . . . . . . . . . . . . . . . . . . . . 14
2.3.2 the Model Structure . . . . . . . . . . . . . . . . . 14
2.3.3 Parameter Optimization of the Model . . . . . . . . . . . . . 15
2.4 Parameter Optimization with Different Error Measures . . . . . . . . 15
viiviii Contents
2.4.1 Loss Functions and Cost Functions . . . . . . . . . . . . . . 15
2.4.2 Linear Parameter Optimization . . . . . . . . . . . . . . . . . 18
2.4.3 Polynomial Models . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Model Complexity and Regularization . . . . . . . . . . . . . . . . . 23
2.5.1 Model Complexity and Model Flexibility . . . . . . . . . . . 24
2.5.2 Bias Error and Variance Error . . . . . . . . . . . . . . . . . 25
2.5.3 Bias/Variance Tradeoff . . . . . . . . . . . . . . . . . . . . . 26
2.5.4 Implicit Structure Optimization . . . . . . . . . . . . . . . . 27
2.5.5 Explicit . . . . . . . . . . . . . . . . 27
2.6 Model Generalization Estimation . . . . . . . . . . . . . . . . . . . . 29
2.6.1 Good and Best Feature Subset . . . . . . . . . . . . . . . . . 29
2.6.2 Training , Validation and Test Dataset . . . . . . . . . . . . 30
2.6.3 Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6.4 Information Criteria . . . . . . . . . . . . . . . . . . . . . . 33
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 Takagi Sugeno Fuzzy Models 35
3.1 Fuzzy Logic and Fuzzy Models . . . . . . . . . . . . . . . . . . . . . 35
3.2 Fuzzy Inference Systems . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 Membership Function Types . . . . . . . . . . . . . . . . . . 38
3.2.2 Fuzzy Operators . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.3 Reasoning Mechanism . . . . . . . . . . . . . . . . . . . . . 42
3.2.4 Defuzziﬁcation Method . . . . . . . . . . . . . . . . . . . . 42
3.2.5 The Output Evaluation of a Takagi Sugeno Fuzzy Model . . . 43
3.3 Interpretability Conditions of Fuzzy Models . . . . . . . . . . . . . . 44
3.3.1 Fuzzy Set Conﬁgurations Causing Semantic Inconsistency . . 46
3.3.2 Interpretability Factors . . . . . . . . . . . . . . . . . . . . . 47
3.3.3 An Exemplary Interpretability Measure . . . . . . . . . . . . 49
3.3.