A Logic-Based Approach toMultimedia InterpretationVom Promotionsausschuss derTechnischen Universit at Hamburg-Harburgzur Erlangung des akademischen GradesDoktor der Naturwissenschaften (Dr. rer. nat.)genehmigte DissertationvonAtila Kayaaus Izmir, Turk ei2011Reviewers:Prof. Dr. Ralf M ollerProf. Dr. Bernd NeumannProf. Dr. Rolf-Rainer GrigatDay of the defense:28.02.2011AbstractThe availability of metadata about the semantics of information in mul-timedia documents is crucial for building semantic applications that o erconvenient access to relevant information and services. In this work, wepresent a novel approach for the automatic generation of rich semanticmetadata based on surface-level information. For the extraction of therequired surface-level information state-of-the-art analysis tools are used.The approach exploits a logic-based formalism as the foundation for knowl-edge representation and reasoning. To develop a declarative approach, weformalize a multimedia interpretation algorithm that exploits formal infer-ence services o ered by a state-of-the-art reasoning engine. Furthermore,we present the semantic interpretation engine, a software system that im-plements the logic-based multimedia interpretation approach, and test itthrough experimental studies. We use the results of our tests to evaluatethe tness of our logic-based approach in practice. Finally, we conclude thiswork by highlighting promising areas for future work.
A LogicBased Approach to MultimediaInterpretation
Vom Promotionsausschuss der Technischen Universität HamburgHarburg zur Erlangung des akademischen Grades Doktor der Naturwissenschaften (Dr. rer. nat.) genehmigte Dissertation
von
Atila Kaya
aus Izmir, Türkei
2011
Reviewers: Prof. Dr. Ralf Möller Prof. Dr. Bernd Neumann Prof. Dr. RolfRainer Grigat
Day of the defense: 28.02.2011
Abstract
The availability of metadata about the semantics of information in mul timedia documents is crucial for building semantic applications that offer convenient access to relevant information and services. In this work, we present a novel approach for the automatic generation of rich semantic metadata based on surfacelevel information. For the extraction of the required surfacelevel information stateoftheart analysis tools are used. The approach exploits a logicbased formalism as the foundation for knowl edge representation and reasoning. To develop a declarative approach, we formalize a multimedia interpretation algorithm that exploits formal infer ence services offered by a stateoftheart reasoning engine. Furthermore, we present the semantic interpretation engine, a software system that im plements the logicbased multimedia interpretation approach, and test it through experimental studies. We use the results of our tests to evaluate the fitness of our logicbased approach in practice. Finally, we conclude this work by highlighting promising areas for future work.
To my dear parents and wife Sevgili anneme, babama ve esime .
i
Acknowledgements
This thesis is the result of five years work in the Institute for Software Systems (STS) research group at the Hamburg University of Technology (TUHH). I am grateful to my advisor Prof. Dr. Ralf Möller for giving me the opportunity to conduct such exciting research and mentoring me. I would also like to thank Prof. Dr. Bernd Neumann and Prof. Dr. Rolf Rainer Grigat for reviewing this work.
I would like to express my gratitude to all my colleagues at the STS re search group: Sofia Espinosa, Sylvia Melzer, Alissa Kaplunova, Tobias Näth, Kamil Sokolski, Maurice Rosenfeld, Oliver Gries, Anahita Nafissi, Dr. HansWerner Sehring, Olaf Bauer, Rainer Marrone, Sebastian Wan delt, Volker Menrad and Gustav Munkby. Special thanks go to Dr. Patrick Hupe and Dr. Michael Wessel, who always supported and encouraged me.
I am also indebted to STS staff Hartmut Gau, Ulrike Hantschmann, Thomas Rahmlow, Thomas Sidow for their excellent administrative and technical support.
Finally, I would like to thank my parents Tükez and Dursun, and my wife Justyna for their love, care and continuous support.
The hybrid approach for obtaining deep semantic annotations . . . . . . Interpretation of complex concept descriptions . . . . . . . . . . . . . . A graphical representation of the concept definitionP erson, which re quires modeling of a triangular structure . . . . . . . . . . . . . . . . . . A graphical representation of an ABox with an inferred role assertion (dashed) caused by the transitive role R . . . . . . . . . . . . . . . . . . An example UML class diagram . . . . . . . . . . . . . . . . . . . . . . . An example TBoxT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . The multimedia interpretation process. Input: analysis ABox, Output: interpretation ABox(es), The background knowledge: Domain ontology and interpretation rules . . . . . . . . . . . . . . . . . . . . . . . . . . . Interpretation of a document consisting of observations and their expla nations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The multimedia interpretation approach including processing steps for analysis, interpretation and fusion . . . . . . . . . . . . . . . . . . . . . A rule used by the Wimp3 system for network construction . . . . . . . The Bayesian network constructed for plan recognition . . . . . . . . . .
34 40
50
51 52 53
60
62
64 73 74
The architecture of the semantic interpretation engine, which is deployed into the Apache Tomcat servlet container. The Apache Axis is a core engine for web services. The semantic interpretation engine exploits the inference services offered by RacerPro. Each RacerPro instance is dedicated to a single modality. . . . . . . . . . . . . . . . . . . . . . . . 111 A sample web page with athletics news . . . . . . . . . . . . . . . . . . . 115 The image taken from the sample web page in Figure 4.2 . . . . . . . . 116
The ABoximageABox01representing the results of image analysis for the image in Figure 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 An excerpt of the TBoxTfor the athletics domain . . . . . . . . . . . . 117 An excerpt of the image interpretation rulesRimafor the athletics domain117 ′ The ABoxAafter the addition of Δ1. . . . . . . . . . . . . . . . . . . 120 The interpretation ABoxesimageABox01 interpretation1andimageABox01 interpretation2 returned by the semantic interpretation engine . . . . . . . . . . . . . . 123 The caption of the image shown in Figure 4.3 . . . . . . . . . . . . . . . 123 The ABoxcaptionABox01representing the results of text analysis for the caption in Figure 4.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Another excerpt of the TBoxTfor the athletics domain . . . . . . . . . 125 An excerpt of the caption interpretation rulesRcapfor the athletics domain125 The interpretation ABoxcaptionABox01 interpretation1returned by the semantic interpretation engine . . . . . . . . . . . . . . . . . . . . . . . . 129 The first paragraph of the text segment of the sample web page . . . . . 129 The ABoxtextABox01representing the results of text analysis for the text segment in Figure 4.14 . . . . . . . . . . . . . . . . . . . . . . . . . 130 Another excerpt of the TBoxTfor the athletics domain . . . . . . . . . 131 An excerpt of the text interpretation rulesRtex131for the athletics domain ′ The ABoxAafter the addition of the explanation Δ2134. . . . . . . . . . The interpretation ABoxtextABox01 interpretation1returned by the semantic interpretation engine . . . . . . . . . . . . . . . . . . . . . . . . 137 The ABoxsampleABox1. . . . . . . . . . . . . . . . . . . . . . . . . . . 139 A sample TBoxT140. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A set of text interpretation rulesR1. . . . . . . . . . . . . . . . . . . . 140 Two possible interpretation results for the same analysis ABoxsam pleABox1141, where the one on the lefthand side is preferred . . . . . . . . The ABoxsampleABox2142. . . . . . . . . . . . . . . . . . . . . . . . . . . A set of text interpretation rulesR2containing a single rule . . . . . . . 142 Two different interpretation results for the analysis ABoxsampleABox2, where the one on the lefthand side is preferred . . . . . . . . . . . . . . 144 The sample analysis ABoxsampleABox3145. . . . . . . . . . . . . . . . . A set of text interpretation rulesR3145. . . . . . . . . . . . . . . . . . . .
4.29 4.30 4.31 4.32 4.33 4.34 4.35
5.1
5.2
5.3
5.4
Two different interpretation results for the analysis ABoxsampleABox3, where the one on the lefthand side is preferred . . . . . . . . . . . . . . 146 An excerpt of the axioms, which are added to the background knowledgeT149 All assertions of the interpretation ABoxcaptionABox01 interpretation1 as returned by the semantic interpretation engine . . . . . . . . . . . . . 152 The analysis ABox of a sample web page . . . . . . . . . . . . . . . . . . 156 A sample image interpretation ABox . . . . . . . . . . . . . . . . . . . . 156 A sample caption interpretation ABox . . . . . . . . . . . . . . . . . . . 157 The fused interpretation ABox of the sample web page . . . . . . . . . . 160
The number of fiat assertions (x) and the time (y) spent in minutes for the interpretation of 500 text analysis ABoxes. . . . . . . . . . . . . . . 164 The number of fiat assertions (x) and the time (y) spent in minutes for the interpretation of selected text analysis ABoxes. . . . . . . . . . . . . 165 The sum of fiat and bona fide assertions (x) and the time (y) spent in minutes for the interpretation of 500 text analysis ABoxes. . . . . . . . 166 The number of fiat and bona fide assertions (x) and the time (y) spent in minutes for the interpretation of selected text analysis ABoxes. . . . . 168