Causal inference from statistical data [Elektronische Ressource] / von Xiaohai Sun

karlsruher_institut_fur_technologie

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

220 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

Sujets

Causal Inference from Statistical Data
zur Erlangung des akademischen Grades eines
Doktors der Naturwissenschaften
von der Fakultät für Informatik
der Universität Fridericiana zu Karlsruhe (TH)
genehmigte
D i s s e r t a t i o n
von
Xiaohai Sun
aus Shanghai, China
Tag der mündlichen Prüfung: 15. April 2008
Erster Gutachter: PD Dr. Dominik Janzing
Zweiter Gutachter: Prof. Dr. Bernhard SchölkopfAbstract
“Automatic causal discovery” is a rather young research area, to which increasing atten-
tion is paid in recent years as more and better data have become available. Until the early
nineties, most researchers still shunned away from discussing formal methods for infer-
ring causal structure from purely observational statistical data without using controlled ex-
periments, i.e., interventions. The seminal works of Spirtes, Glymour, and Scheines [153]
and the works of Pearl [125] in the last ﬁfteen years have established a promising basis
of learning causality from such data. Bayesian networks are used as a concrete vehicle,
where the corresponding directed acyclic graph can be interpreted causally. The test of
statistical (conditional) independence between observed random variables provides a pri-
mary tool for learning such causal graphs. The theory and the practical applications of
their approach, however, are far from fully developed. The essential shortcomings are the
following. For the one thing, the test of independence is based on the strict assumption
of multivariate Gaussian distribution. Moreover, if very few independence relationships
are present, only few causal directions can be determined. The contribution of this thesis
includes a direct attempt to address these problems.
A so-called kernel-based test of independence is further developed, which is conducted
without making any speciﬁc assumption about the distribution. The kernel method maps
data into an appropriate feature space by a nonlinear transformation, where also the non-
linear relations in the original space can be captured by correlations in the feature space.
The singular values of the inherent covariance matrix provide a measurement of the mag-
nitude of statistical dependences, which serves as a very useful additional tool for learning
causal structures.
A new inference principle of determining the causal directions is developed for the case
when no statistical independence relations are present. The complexity of conditional
distributions gives hints on the causal direction in such situations that are rarely examined.
Experiments with many simulated and real-world data show that the proposed methods
surpass in certain aspects other state-of-the-art approaches to the same problem.Zusammenfassung
“Automatisiertes Erkennen von kausalen Zusammenhängen” ist ein noch recht junges
Forschungsgebiet, das seit den letzten Jahren immer mehr Aufmerksamkeit bekommt, weil
mehr und bessere Daten zur Verfügung stehen. Bis zum Anfang der neunziger Jahre
zögerten noch die meisten Wissenschaftler sich mit dem Lernen von Ursache-Wirkungs-
Beziehungen anhand von statistischen Daten zu beschäftigen, die lediglich auf Beobach-
tungen beruhen, d.h. ohne Zuhilfenahme von Interventionen. In den vergangenen fünfzehn
Jahren sind vielversprechende Grundlagen für das maschinelle Learnen von Kausalstruk-
turen von Spirtes, Glymour und Scheines [153] sowie von Pearl [125] geschaffen worden.
Diese beruhen auf Bayesschen Netzen, bei denen der zugehörige gerichtete azyklische
Graph kausal interpretiert werden kann. Wichtigstes Hilfsmittel zum Lernen von solchen
Kausalgraphen bilden dabei Tests auf (bedingte) statistische Abhängigkeiten zwischen
den betrachteten Zufallsvariablen. Die Theorie und die praktische Umsetzung dieser An-
sätze sind allerdings bei weitem nicht ausgereift. Die wichtigsten Unzulänglichkeiten sind
folgende zu nennen: Zum einen basieren die Unabhängigkeitstests auf der starken An-
nahme multivariater Gauß-Verteilungen. Zum anderen lassen sich nur wenige kausale
Richtungen identiﬁzieren, wenn wenige Unabhängigkeitsbeziehungen vorliegen. Der
Beitrag dieser Arbeit setzt gerade bei diesen beiden Nachteilen an.
Es wird ein sogenannter kern-basierter Unabhängigkeitstest weiter enwickelt, der ohne
die Annahme einer speziellen Verteilung auskommt. Die Kernmethode bildet Daten durch
eine nicht-lineare Transformation in einen geeigneten Merkmalsraum ab, in dem sich
auch ursprünglich nicht-lineare Zusammenhänge als Korrelationen im Merkmalsraum
manifestieren. Die Singulärwerte der Kovarianzmatrix liefern eine Quantiﬁzierung der
Stärke der statistischen Abhängigkeiten, die sich sehr gut als zusätzliches Hilfsmittel zum
Lernen von Kausalstrukturen einsetzen ließ.
Es wird ein neues Inferenzprinzip entwickelt zum Schätzen von Kausalrichtungen für
den bisher kaum betrachteten Fall dass keine statistischen Unabhängigkeiten vorliegen.
Dabei liefert die Komplexität bedingter Verteilungen Hinweise auf die kausalrichtung.
Experimente mit simulierten und realen Daten zeigen, dass die vorgeschlagenen Metho-
den in mancher Hinsicht die aktuell bestehenden, anerkannten Ansätze zur Lösung des-
selben Problems übertreffen.Contents
1. Introduction and Motivation 1
1.1. Causal modeling framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2. Task of causal inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3. State-of-the-art causal inference algorithms . . . . . . . . . . . . . . . . . . . . 9
1.4. Inductive causation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5. Thesis goal and outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2. Kernel Dependence Measure 17
2.1. Linear and nonlinear dependence . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2. Positive deﬁnite kernel and RKHS . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3. Cross-covariance operator and independence . . . . . . . . . . . . . . . . . . . . 22
2.4. Conditional cross-covariance operator and conditional independence . . . . . . . 23
2.5. Hilbert-Schmidt dependence measure . . . . . . . . . . . . . . . . . . . . . . . 25
2.6. Empirical estimation of Hilbert-Schmidt dependence measure . . . . . . . . . . 27
2.7. Computation of empirical Hilbert-Schmidt dependence measure . . . . . . . . . 30
3. Kernel Statistical Test of Independence 32
3.1. State-of-the-art tests of independence . . . . . . . . . . . . . . . . . . . . . . . 32
3.2. Statistical test via kernel dependence measure . . . . . . . . . . . . . . . . . . . 34
3.3. Simulated experiments with kernel independence test . . . . . . . . . . . . . . . 36
3.3.1. Examples for kernel independence test on continuous domains . . . . . . 37
3.3.2. Examples for kernel independence test on time series . . . . . . . . . . . 38
3.3.3. Numerical comparison of independence tests on continuous domain . . . 41
3.3.4. Numerical comparison of independence tests on discrete domain . . . . . 48
3.4. Real-world experiments with kernel independence test . . . . . . . . . . . . . . 50
3.4.1. Digoxin clearance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4.2. Rats’ weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4.3. Doctor visits and age/gender . . . . . . . . . . . . . . . . . . . . . . . . 53
4. From Independence Relations to Causal Structure 58
4.1. Logic of independence relations in DAG . . . . . . . . . . . . . . . . . . . . . . 58
4.2. Conﬂicts of representing independence relations . . . . . . . . . . . . . . . . . . 59
4.2.1. Relevant Independence constraints . . . . . . . . . . . . . . . . . . . . . 59
4.2.2. Non-transitivity conﬂicts . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.3. Non-intersection conﬂicts . . . . . . . . . . . . . . . . . . . . . . . . . 62
i4.2.4. Non-chordality conﬂicts . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3. Constraint-based clustering algorithm . . . . . . . . . . . . . . . . . . . . . . . 70
4.4. Constraint-based orientation algorithm . . . . . . . . . . . . . . . . . . . . . . . 71
4.5. Robust causal learning algorithm (RCL) . . . . . . . . . . . . . . . . . . . . . . 73
4.6. Real-world Experiments with RCL . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6.1. College plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6.2. Egyptian skulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.6.3. Montana outlook poll . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.6.4. Caenorhabditis elegans . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.6.5. Metastatic melanoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5. From Magnitude of Dependences to Causal Structure 90
5.1. Problems of learning structure via independence tests . . . . . . . . . . . . . . . 90
5.2. Identifying colliders via magnitude of dependences . . . . . . . . . . . . . . . . 93
5.3. Orientation heuristics via collider identiﬁcation . . . . . . . . . . . . . . . . . . 95
5.4. Simulated experiments with orientation heuristics . . . . . . . . . . . . . . . . . 97
5.4.1. Simulated data from noisy OR gates . . . . . . . . . . . . . . . . . . . . 97
5.4.2. Simulated data from