A probabilistic framework for information modelling and retrieval based on user annotations on digital objects [Elektronische Ressource] / von Ingo Peter August Frommholz

universitat_duisburg-essen - Ingo Frommholz

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

277 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Sujets

Informatik

Informations

Publié par	universitat_duisburg-essen
Publié le	01 janvier 2008
Nombre de lectures	22
Langue	English
Poids de l'ouvrage	4 Mo

Extrait

A Probabilistic Framework for Information
Modelling and Retrieval Based on User Annotations
on Digital Objects
Vom Fachbereich Ingenieurwissenschaften
der Universität Duisburg-Essen
zur Erlangung des akademischen Grades eines
Doktors der Naturwissenschaften
genehmigte Dissertation
von
Diplom-Informatiker
Ingo Peter August Frommholz
aus Bochum-Wattenscheid
Referent: Prof. Dr.-Ing. Norbert Fuhr
Korreferentin: Prof. Dr. Maristella Agosti
Tag der mündlichen Prüfung: 21. Oktober 2008Für Damaris.Abstract
Annotations are a means to make critical remarks, to explain and comment things, to add notes
and give opinions, and to relate objects. Nowadays, they can be found in digital libraries and
collaboratories, for example as a building block for scientiﬁc discussion on the one hand or as
private notes on the other. We further ﬁnd them in product reviews, scientiﬁc databases and
many “Web 2.0” applications; even well-established concepts like emails can be regarded as
annotations in a certain sense. Digital annotations can be (textual) comments, markings (i.e.
highlighted parts) and references to other documents or document parts. Since annotations
convey information which is potentially important to satisfy a user’s information need, this
thesis tries to answer the question of how to exploit annotations for retrieval. It
gives a ﬁrst answer to the if retrieval eﬀectiveness can be improved with annotations.
A survey of the “annotation universe” reveals some facets of annotations; for example, they
can be content level annotations (extending the content of the annotation object) or meta level
ones (saying something about the annotated object). Besides the annotations themselves, other
objects created during the process of annotation can be interesting for retrieval, these being the
annotated fragments. These objects are integrated into an object-oriented model comprising
digital objects such as structured documents and annotations as well as fragments. In this
model, the diﬀerent relationships among the various objects are reﬂected. From this model,
the basic data structure for annotation-based retrieval, the structured annotation hypertext, is
derived.
In order to thoroughly exploit the information contained in structured annotation hyper-
texts, a probabilistic, object-oriented logical framework called POLAR is introduced. In PO-
LAR, structured annotation hypertexts can be modelled by means of probabilistic propositions
and four-valued logics. POLAR allows for specifying several relationships among annotations
and annotated (sub)parts or fragments. Queries can be posed to extract the knowledge con-
tained in structured annotation hypertexts. POLAR supports annotation-based retrieval, i.e.
document and discussion search, by applying an augmentation strategy (knowledge augmenta-
tion, propagating propositions from subcontexts like annotations, or relevance augmentation,
where retrieval status values are propagated) in conjunction with probabilistic inference, where
P(d→ q), the probability that a document d implies a query q, is estimated. POLAR’s se-
mantics is based on possible worlds and accessibility relations. It is implemented on top of
four-valued probabilistic Datalog.
POLAR’scoreretrievalfunctionality, knowledgeaugmentationwithprobabilisticinference, is
evaluated for discussion and document search. The experiments show that all relevant POLAR
objects, merged annotation targets, fragments and content annotations, are able to increase
retrieval eﬀectiveness when used as a context for discussion or document search. Additional
experiments reveal that we can determine the polarity of annotations with an accuracy of
around 80%.Acknowledgements
I would like to take this opportunity to thank those who accompanied me on the long way
throughout the time this thesis was created, who supported me in several ways and who showed
interest in my work.
I thank my former and current colleagues at Fraunhofer IPSI in Darmstadt and the Infor-
mation Systems group at the University of Duisburg-Essen, especially Holger Brocks, André
Everts, Marcello L’Abbate, Adelheit Stein, Matthias Hemmje, Sascha Kriewel and Claus-Peter
Klas. They always found the time for discussion, to give technical support or just to listen.
Special thanks go to Henrik Nottelmann, a brilliant nice guy who left us much too early, to
Erich Neuhold, who was involved in my work when he was institute director at IPSI, and Piklu
Gupta, a native English and a near-native German speaker (and, besides, a nice guy), who
helped to translate even complicated German sentences into English. I also thank Marc Lecht-
enfeld for his fantastic master thesis on machine-learning methods to determine the polarity
of annotations, Dennis Korbar, who helped me by providing the infrastructure to create the
ZDNet testbed, and Ray Larson for reading an early version of this thesis.
Ulrich Thiel was the one who mentored me during my time at Fraunhofer IPSI. He showed
me the “other side of IR”, namely the cognitive, more user-oriented one. Ulrich’s comments
sometimes gave me a very hard time, but made me learn a lot.
Thomas Rölleke made the heart of this work possible by providing his superb HySpirit
framework. Without him, none of the proposed framework could actually be executed. Thanks
for good advice, a nice afternoon on a sailing boat and your patience for answering many
questions. And of course thanks for POOL.
During a visit to Padua, I had the opportunity for good and fruitful discussions with Maris-
tella Agosti and Nicola Ferro. Their collaboration enriched my work signiﬁcantly. I’d like to
thank them for good advice, the nice time I had with them, for the good collaboration in
DELOS and for their interest in my work.
Especially I’d like to thank Norbert Fuhr. He is the person mainly involved in my work. His
inspiration, his deep knowledge and his support paved the way to make this thesis possible. He
was also the one giving me the opportunity to continue the work started in Darmstadt when I
began working at his chair in Duisburg.
Finally, very hearty thanks go to my family and especially my wife Damaris. She is the one
who was suﬀering most when I was writing up this thesis, and her inﬁnite patience cannot be
measured.
Thank you.
Ingo Frommholz
Darmstadt/Duisburg, November 2008Contents
1 Introduction 1
I The Annotation Universe 7
2 The Universe – Applications, Facets and Properties 9
2.1 Digital Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Deﬁnition and Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Annotations in Digital Libraries and Collaboratories . . . . . . . . . . . 11
2.1.3 on the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.4 Email Discussions and Usenet News . . . . . . . . . . . . . . . . . . . . 16
2.1.5 Semantic Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.6 Scientiﬁc Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.7 Linguistic Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Facets of Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 as Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 Annotations as Content . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.3 as Dialogue Acts . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.4 as References . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.5 Polarity of Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.6 Annotations and Hypertexts . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 A Model of the Annotation Universe for Annotation-based IR 23
3.1 Main Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.1 Digital Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.2 Structured Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.3 Annotatable Objects and Annotations . . . . . . . . . . . . . . . . . . . 27
3.1.4 Fragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.5 Annotation Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.6 Scope, Permission and Polarity . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.7 Multiclassiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Structured Annotation Hypertext . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Hypertext . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2 Structured Annotation Hypertext . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36ii Contents
II The POLAR Framework 39
4 Annotation-based Knowledge Modelling and Retrieval with POLAR 41
4.1 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.2 An Overview of Retrieval Models . . . . . . . . . . . . . . . . . . . . . . 43
4.1.3 Hypertext, Structured Document and Web Retrieval . . . . . . . . . . . 47
4.1.4 An