Knowledge-intensive subgroup mining [Elektronische Ressource] : techniques for automatic and interactive discovery / vorgelegt von Martin Atzmüller
222 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Knowledge-intensive subgroup mining [Elektronische Ressource] : techniques for automatic and interactive discovery / vorgelegt von Martin Atzmüller

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
222 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Knowledge IntensiveSubgroupMining–TechniquesforAutomaticandInteractiveDiscoveryDissertationzurErlangungdesnaturwissenschaftlichenDoktorgradesderBayerischenJulius–Maximilians–Universitat¨ Wurzb¨ urgvorgelegtvonMartinAtzmuller¨ausBadKissingenWurzb¨ urg,2006Eingereichtam: 01.06.2006beiderFakultat¨ fur¨ MathematikundInformatik1. Gutachter: Prof. Dr. FrankPuppe2. Prof. Dr. KatharinaMorik3. Gutachter: Prof. Dr. NadaLavracTagdermundlichen¨ Prufung:¨ 15.12.2006AbstractData mining has proved its significance in various domains and applications. As an im portant subfield of the general data mining task, subgroup mining can be used, e.g., formarketing purposes in business domains, or for quality profiling and analysis in medicaldomains. The goal is to efficiently discover novel, potentially useful and ultimately inter-esting knowledge. However, in real world situations these requirements often cannot befulfilled,e.g.,iftheappliedmethodsdonotscaleforlargedatasets,iftoomanyresultsarepresentedtotheuser,orifmanyofthediscoveredpatternsarealreadyknowntotheuser.Thisthesisproposesacombinationofseveraltechniquesinordertocopewiththesketchedproblems: We discuss automatic methods, including heuristic and exhaustive approaches,and especially present the novel SD Map algorithm for exhaustive subgroup discoverythat is fast and effective. For an interactive approach we describe techniques for sub groupintrospectionandanalysis,andwepresentadvancedvisualizationmethods,e.g.

Sujets

Informations

Publié par
Publié le 01 janvier 2006
Nombre de lectures 14
Langue English
Poids de l'ouvrage 7 Mo

Extrait

Knowledge IntensiveSubgroupMining–
TechniquesforAutomatic
andInteractiveDiscovery
DissertationzurErlangungdes
naturwissenschaftlichenDoktorgrades
derBayerischenJulius–Maximilians–Universitat¨ Wurzb¨ urg
vorgelegtvon
MartinAtzmuller¨
aus
BadKissingen
Wurzb¨ urg,2006Eingereichtam: 01.06.2006
beiderFakultat¨ fur¨ MathematikundInformatik
1. Gutachter: Prof. Dr. FrankPuppe
2. Prof. Dr. KatharinaMorik
3. Gutachter: Prof. Dr. NadaLavrac
Tagdermundlichen¨ Prufung:¨ 15.12.2006Abstract
Data mining has proved its significance in various domains and applications. As an im
portant subfield of the general data mining task, subgroup mining can be used, e.g., for
marketing purposes in business domains, or for quality profiling and analysis in medical
domains. The goal is to efficiently discover novel, potentially useful and ultimately inter-
esting knowledge. However, in real world situations these requirements often cannot be
fulfilled,e.g.,iftheappliedmethodsdonotscaleforlargedatasets,iftoomanyresultsare
presentedtotheuser,orifmanyofthediscoveredpatternsarealreadyknowntotheuser.
Thisthesisproposesacombinationofseveraltechniquesinordertocopewiththesketched
problems: We discuss automatic methods, including heuristic and exhaustive approaches,
and especially present the novel SD Map algorithm for exhaustive subgroup discovery
that is fast and effective. For an interactive approach we describe techniques for sub
groupintrospectionandanalysis,andwepresentadvancedvisualizationmethods,e.g.,the
zoomtable that directly shows the most important parameters of a subgroup and that can
beusedforoptimizationandexploration. Wealsodescribevariousvisualizationsforsub
group comparison and evaluation in order to support the user during these essential steps.
Furthermore,weproposetoincludepossiblyavailablebackgroundknowledgethatiseasy
to formalize into the mining process. We can utilize the knowledge in many ways: To fo
custhesearchprocess,torestrictthesearchspace,andultimatelytoincreasetheefficiency
of the discovery method. We especially present background knowledge to be applied for
filteringtheelementsoftheproblemdomain,forconstructingabstractions,foraggregating
values of attributes, and for the post processing of the discovered set of patterns. Finally,
thetechniquesarecombinedintoaknowledge intensiveprocesssupportingbothautomatic
and interactive methods for subgroup mining. The practical significance of the proposed
approach strongly depends on the available tools. We introduce the VIKAMINE system
asahighly integratedenvironmentforknowledge intensiveactivesubgroupmining.
Also,wepresentanevaluationconsistingoftwoparts: Withrespecttoobjectiveevaluation
criteria, i.e., comparing the efficiency and the effectiveness of the subgroup discovery
methods, we provide an experimental evaluation using generated data. For that task we
present a novel data generator that allows a simple and intuitive specification of the data
characteristics. TheresultsoftheexperimentalevaluationindicatethatthenovelSD Map
methodoutperformstheotherdescribedalgorithmsusingdatasetssimilartotheintended
application concerning the efficiency, and also with respect to precision and recall for the
heuristicmethods. Subjectiveevaluationcriteriaincludetheuseracceptance,thebenefitof
the approach, and the interestingness of the results. We present five case studies utilizing
the presented techniques: The approach has been successfully implemented in medical
andtechnicalapplicationsusingreal worlddatasets. Themethodwasverywellaccepted
bytheusersthatwereabletodiscovernovel,useful,andinterestingknowledge.Acknowledgments
My work was supported by many people: Without their helpful advice and encourage
ment, it would have been impossible to finish this thesis. First, I would like to thank my
supervisorProf.Dr.FrankPuppefromtheUniversityofWurzb¨ urg: Heprovidedmewitha
perfectenvironment of research withopen doorsand openminds. Inspiring mein fruitful
discussions, he has been a source of invaluable ideas. Furthermore, I want to thank the
reviewers Prof. Dr. Nada Lavrac from the Jozef Stefan Institute, Ljubljana, Slovenia, and
Prof.Dr.KatharinaMorikfromtheUniversityofDortmund,Germany.
Additionally, I want to thank my colleagues of the AI working group at the University of
¨Wurzburg, especially Joachim Baumeister for being my partner in the learning methods
dream team. He supported my work by contributing ideas, discussing countless issues,
andwritingpublicationswithme. AlexanderHornlein,¨ ManuelFehler,NormanBrummer¨ ,
MartinSchuhmann,RainerHerrler,FranziskaKlugl Frohnme¨ yerandmyothercolleagues
also supported me during the years. Last but not least, I want to thank Petra Braun -
the good heart of our group - who especially helped me cope with various problems of
bureaucracy and other things of ordinary life. The practical parts of my work would have
notbeenpossiblewithoutProf.Dr.Hans PeterBuscher(DRK KlinikenBerlin K openick)¨
and Achim Hemsing (Department of Prosthodontics, University of Wurzb¨ urg) who were
mypartnersinthemedicalprojects. Iamgratefulfortheirvaluableandconstructiveinput
tomywork. FurtherthanksgotoGeorgBuscher,forhissupportwiththemedicaldatasets.
I would especially like to thank Julia Eischer, Joachim Baumeister, Alexander Hornlein,¨
ManuelFehler,andNormanBrummer¨ forreadingandpolishingpartsofmythesis.
I thank my family for their encouragement, their support, and for providing me with an
anchor of life, especially my father and mother, and my brother. Finally, I owe the most
important gratitude to my wife Irene. She supported me in our private life and always
encouragedmewhilefinishingthiswork.
Ithankyouall.
MartinAtzmuller¨
Wurzb¨ urg,2006Contents
I Introduction 1
1 Motivation 3
1.1 ResearchGoalandContext . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 OverviewoftheApproach . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 AlgorithmsforSubgroupDiscovery . . . . . . . . . . . . . . . . 5
1.2.2 ExploitingBackgroundKnowledge . . . . . . . . . . . . . . . . 5
1.2.3 SubgroupIntrospectionandAnalysis . . . . . . . . . . . . . . . 6
1.2.4 VisualizationMethods . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.5 Knowledge IntensiveActiveSubgroupMining . . . . . . . . . . 7
1.3 CriteriaandProblemsofanEvaluation . . . . . . . . . . . . . . . . . . . 8
1.3.1 Predictivevs. DescriptiveDataMining . . . . . . . . . . . . . . 8
1.3.2 EvaluatingtheEffectivenessandtheEfficiency . . . . . . . . . . 8
1.3.3 UserAcceptanceoftheDiscoveredPatterns . . . . . . . . . . . . 9
1.3.4 IncludingSubjectiveInterestingnessMeasures . . . . . . . . . . 10
1.4 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 StructureofthisWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 TheKnowledge IntensiveActiveSubgroupMiningProcess 15
2.1 MotivationforKnowledge IntensiveActiveSubgroupMining . . . . . . 15
2.2 SubgroupMining–AnOverview . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 RelationtoKnowledgeDiscoveryinDatabases . . . . . . . . . . 15
2.2.2 SubgroupDiscovery . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 TheGeneralSubgroupMiningProcess . . . . . . . . . . . . . . 18
2.3 ProcessModelforKnowledge IntensiveActiveSubgroupMining . . . . 19
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
II Methods 21
3 SubgroupDiscovery 23
3.1 TheSubgroupDiscoveryApproach . . . . . . . . . . . . . . . . . . . . 23
3.1.1 DefinitionofBasicOntologicalKnowledge . . . . . . . . . . . . 24
3.1.2 TargetVariable . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.3 SubgroupDiscoveryContext . . . . . . . . . . . . . . . . . . . . 25
3.1.4DescriptionLanguage . . . . . . . . . . . . . . . . . . 25vi Contents
3.1.5 SubgroupQualityandInterestingnessMeasures . . . . . . . . . . 27
3.1.6 SearchStrategiesforSubgroupDiscovery . . . . . . . . . . . . . 30
3.2 PruningtheSearchSpace . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 RedundancyManagementofSetsofSubgroups . . . . . . . . . . . . . . 34
3.3.1 GroupingSubgroupsbyClustering . . . . . . . . . . . . . . . . 36
3.3.2 SubgroupSelectionbyWeightedCovering . . . . . . . . . . . . 36
3.3.3 CausalSubgroupAnalysisandSelection . . . . . . . . . . . . . 38
3.4 SubgroupDiscoveryAlgorithms . . . . . . . . . . . . . . . . . . . . . . 42
3.4.1 SizeoftheSearchSpace . . . . . . . . . . . . . . . . . . . . . . 42
3.4.2 Exhaustive . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4.3 BeamSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.4 PRIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.5 Apriori SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.6 SD Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 Knowledge IntensiveSubgroupMining 55
4.1 ProblemDefinition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 TypesandClassesofBackgroundKnowledge . . . . . . . . . . . . . . . 56
4.2.1 ConstraintKnowledge . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.2 OntologicalKno . . . . . . . . . . . . . . . . . . . . . . 58
4.2.3 AbstractionKnowledge . . . . . . . . .

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents