A schema-based peer-to-peer infrastructure for digital library networks [Elektronische Ressource] / von Wolf Siberski

gottfried_wilhelm_leibniz_universitat_hannover - Wolf Siberski

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

120 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Sujets

Informatik

Informations

Publié par	gottfried_wilhelm_leibniz_universitat_hannover
Publié le	01 janvier 2006
Nombre de lectures	6
Langue	English

Extrait

A Schema-based Peer-to-Peer Infrastructure
for Digital Library Networks
Der Fakultat¨ fur¨ Elektrotechnik und Informatik
¨der Gottfried Wilhelm Leibniz Universitat Hannover
zur Erlangung des Grades
Doktor der Naturwissenschaften
Dr. rer. nat.
genehmigte Dissertation von
Dipl.-Inform. Wolf Siberski
geboren am 10. Februar 1966 in Gottingen¨
2006Referent: Prof. Dr. Wolfgang Nejdl
Ko-Referenten: Prof. Dr. Karl Aberer
Prof. Dr. Udo Lipeck
Tag der Promotion: 15. Dezember 2006Ben Zoma said: Who is wise? He who learns from every man, as it is said:
“From all my teachers have I gained understanding”
Pirkei Avot 4,1
iACKNOWLEDGEMENTS
First and foremost, I would like to thank my advisor Prof. Dr. Wolgang Nejdl. He introduced
me to methodical research, always had time for scientiﬁc discussions, gave me the freedom to
pursue my research goals, and provided an excellent research environment, to mention just a
few points. In short, this thesis would not have been possible without his ample support and
guidance.
I also would like to thank my other referees Prof. Dr. Karl Aberer and Prof. Dr. Uwe Lipeck
for their very helpful comments and suggestions.
I’m grateful to Prof. Dr. Heinz Zulligho¨ ven and Prof. Dr. Christiane Floyd, who have shaped
my understanding not only of software, but also of computer science in general.
The collaboration and discussion with my colleagues at L3S Research Center, University of
Hannover, and elsewhere was an indispensable source of information and has spawned a lot of
insights for this thesis. But what is even more important, our joint work was always a pleasure,
and we had a lot of fun together. I would like to thank all of my colleagues for their cooperation
and openness, especially Dr. Uwe Thaden, Dr. Wolf-Tilo Balke, and Dr. Peter Dolog.
It is tremendously helpful to work in a smooth administrative and technical environment. Katia
Capelli, Thomas Losch,¨ Dr. Christoph Strutz, Iris Zieseniß, Claudia Saalbach, and Marko
Brosowski provide such an environment for L3S, and were always very supportive when I
came to them with my minor or major requests.
During the creation of a thesis, it is probably inevitable to face some stumbling blocks. The
guide of Dr. Alexandra Fischer-Flebbe helped me in overcoming mine.
I will always be grateful for the love and care of my parents. They gave me self-conﬁdence
and intellectual curiosity, the basis for all my work.
Finally, my wife Susanne and my children Dana and Jona bore it with exceeding patience that
I couldn’t spend enough time with them, and sustained me every day with their their love and
affection.
iiABSTRACT
A Schema-based Peer-to-Peer Infrastructure for Digital Libraries
in
English
In today’s connected world, users are not content with searching only one local library or
archive, but want and need to take a substantial number of collections into account when
looking for relevant information. Currently, most digital libraries and catalog systems only
support local search, and only few facilities offer federated search over several libraries. One
reason is that central federation instances cause signiﬁcant infrastructure costs, and there are
only limited incentives for libraries to offer such services. An appealing solution is to avoid
a central federation instance and use a completely distributed infrastructure instead, thus also
distributing the infrastructure efforts. In this thesis, we will present such an infrastructure
which combines peer-to-peer, distributed database and Semantic Web technology to provide
seamless search in an open network of digital libraries.
The proposed solution is based on a super-peer topology, where the most powerful nodes
form a network backbone and take over mediator-like responsibilities to distribute queries and
merge results. The network content is modeled as a database fragmented over all nodes. Our
basic algorithm, SPQR (super-peer-based query routing), allows processing of queries accord-
ing to the classic relational algebra, and is shown to always produce the correct result set with
respect to this fragmented database. We present an implementation of our approach which en-
ables the interconnection of library systems conforming to established Open Archive Initiative
standards. An extension of SPQR for preference-based queries allows users to retrieve ’best
matches’ for their queries instead of only exact matches. Extensive evaluations based on a
peer-to-peer simulation framework show the algorithm’s performance and scalability.
Keywords: peer-to-peer networks, distributed databases, digital libraries
iiiABSTRACT
A Schema-based Peer-to-Peer Infrastructure for Digital Libraries
in
Deutsch
Die heutige Vernetzung bringt es mit sich, dass Nutzer von Bibliotheken und Archiven sich
nicht mehr mit einer einzigen Informationsquelle begnugen,¨ wenn sie nach relevanter Informa-
tion suche, sondern eine mehr oder weniger große Anzahl von Informationsanbietern konsul-
tieren wollen und mussen.¨ Momentan unterstutzen¨ die meisten Katalogsysteme und digitalen
Bibliotheken nur lokale Suche, und es gibt nur eine geringe Anzahl von Serviceangeboten fur¨
¨ ¨ ¨foderierte Suche uber viele Bibliotheken hinweg. Ein Grund dafur ist, dass solche Services
merkliche Infrastrukturkosten mit sich bringen, und es fur¨ jede einzelne Bibliothek wenig An-
reize gibt, diese Kosten zu tragen. Eine attraktive Losung¨ fur¨ diese Problematik ist, zentrale
Services ganz zu vermeiden, und stattdessen eine vollstandig¨ verteilte Infrastruktur zu verwen-
den; auf diese Weise werden auch die Aufwendungen fur¨ die Infrastruktur uber¨ alle beteiligten
Bibliotheken verteilt. In dieser Arbeit stellen wir eine solche vor, die Ansatze¨
aus Peer-to-Peer-Netzwerken, verteilten Datenbanken und dem Semantic Web kombiniert, um
transparente Suche in einem offenen Netzwerk digitaler Bibliotheken zu ermoglichen.¨
Die vorgeschlagene Losung¨ basiert auf einer Super-Peer-Topologie, in der die leistungsfahig-¨
sten Knoten ein Netzwerk-Backbone formen und Mediator-Aufgaben der Verteilung von An-
fragen und Zusammenfuhrung¨ der Ergebnisse ubernehmen.¨ Die im Netzwerk angebotenen
Informationen werden als uber¨ alle Knoten fragmentierte Datenbank modelliert. Zur Ver-
arbeitung relationaler Anfragen in dieser verteilten Datenbank dient der Algorithmus SPQR
(Super-peer-based Query Routing), dessen Korrektheit gezeigt wird. Weiterhin wird die Im-
plementierung eines auf SPQR basierenden Netzwerks beschrieben, mit dem Bibliothekssys-
teme vernetzt werden konnen,¨ die konform zu etablierten Standards der Open Archive Initia-
tive sind. Aufbauend auf SPQR stellen wir einen Algorithmus fur¨ die Verarbeitung praferenz-¨
basierter Anfragen vor, der es erlaubt, ’beste Treffer’ fur¨ Benutzeranfragen zu identiﬁzieren.
Umfangreiche Evaluierungen mit Hilfe eines Simulationsframeworks fur¨ Peer-to-Peer-
Netzwerke zeigen die Efﬁzienz und Skalierbarkeit der prasentierten¨ Algorithmen.
Stichworte:Peer-to-Peer-Netzwerke, Verteilte Datenbanken, digitale Bibliotheken
ivContents
1 Introduction................................................................. 1
1.1 A Short History of Library Catalogs..................... 1
1.2 Digital Libraries ............................... 5
1.3 Problem Statement and Outline ....................... 8
2 Foundations.................................................................. 11
2.1 Relational Databases............................. 1
2.2 Distributed ............................ 16
2.3 Semantic Web ................................ 20
2.4 Peer-to-Peer Networks 27
3 Design Dimensions of Schema-Based Peer-to-Peer Networks.................... 32
3.1 Network Properties 3
3.2 Data Storage and Access........................... 36
3.3 Data Integration ............................... 37
3.4 Overview of Schema-Based P2P Algorithms and Systems ......... 38
3.5 Summary................................... 41
4 Super-Peer-Based Query Routing............................................. 42
4.1 Assumptions ................................. 43
4.2 The HyperCuP Super-Peer Topology .................... 44
4.3 Model .................................... 47
4.4 Index Structures 48
4.5 Query Routing ................................ 50
4.6 Index Updates 52
4.7 A Simulation Framework for Schema-based Peer-to-Peer Networks .... 54
4.8 Evaluation .................................. 57
v5 A Digital Library Network Prototype for Open Archives....................... 62
5.1 The Open Archives Initiative Protocol for Metadata Harvesting ...... 62
5.2 Edutella Architecture and Implementation ................. 64
5.3 A Query Exchange Language ........................ 6
5.4 OAI-P2P Architecture and 69
5.5 Experiences ................................. 71
6 Preference-based Query Evaluation for Super-Peer Networks.................. 72
6.1 Preference-based Querying for Relational Databases ............ 73
6.2 Basic Scoring Functions for Document Search ............... 74
6.3 Progressive, Preference-based SPQR .................... 7
6.4 Evaluation .................................. 82
7 Summary and Future Work................................................... 85
7.1 Su