Processing rank aware queries in schema based P2P systems [Elektronische Ressource] / von Katja Hose

technische_universitat_ilmenau - Wilken

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

262 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

Sujets

Informatik

Informations

Publié par	technische_universitat_ilmenau
Publié le	01 janvier 2009
Nombre de lectures	6
Langue	English
Poids de l'ouvrage	4 Mo

Extrait

Processing Rank-Aware Queries in
Schema-Based P2P Systems
Dissertation
Zur Erlangung des akademischen Grades
Doktor-Ingenieur (Dr.-Ing.)
vorgelegt der
¨ ¨Fakultat fur Informatik und Automatisierung
¨der Technischen Universitat Ilmenau
von
Dipl.-Inf. Katja Hose
Tag der Einreichung: 27. Oktober 2008
Tag der wissenschaftlichen Aussprache: 17. April 2009
Gutachter
1. Kai-Uwe Sattler
2. Alon Halevy
3. Felix Naumann
urn:nbn:de:gbv:ilm1-2009000084Processing Rank-Aware Queries in
Schema-Based P2P Systems
Katja Hose
A dissertation submitted
in partial fulfillment of the requirements
for the degree of
Doktor-Ingenieur (Dr.-Ing.)
Faculty of Computer Science and Automation
¨Technische Universitat Ilmenau
October 2008
Reading Committee
1. Kai-Uwe Sattler
2. Alon Halevy
3. Felix NaumannAbstract
In recent years, there has been considerable research with respect to query processing
in data integration and P2P systems. Conventional data integration systems consist
of multiple sources with possibly diﬀerent schemas, adhere to a hierarchical structure,
and have a central component (mediator) that manages a global schema. Queries are
formulated against this global schema and the mediator processes them by retrieving
relevant data from the sources transparently to the user. Arising from these systems,
eventually Peer Data Management Systems (PDMSs), or schema-based P2P systems
respectively, have attracted attention. Peers participating in a PDMS can act both as
a mediator and as a data source, are autonomous, and might leave or join the network
at will. Due to these reasons peers often hold incomplete or erroneous data sets and
mappings. The possibly huge amount of data available in such a network often results
in large query result sets that are hard to manage. Due to these reasons, retrieving the
complete result set is in most cases diﬃcult or even impossible. Applying rank-aware
query operators such as top-N and skyline, possibly in conjunction with approximation
techniques, is a remedy to these problems as these operators select only those result
records that are most relevant to the user. Being aware that in most cases only a small
fraction of the complete result set is actually output to the user, retrieving the complete
set before evaluating such operators is obviously ineﬃcient.
Therefore, the questions we want to answer in this dissertation are how to compute
such queries in PDMSs and how to do that eﬃciently. We propose strategies for eﬃcient
query processing in PDMSs that exploit the characteristics of rank-aware queries and op-
tionally apply approximation techniques. A peer’s relevance is determined on two levels:
on schema-level and on data-level. According to its relevance a peer is either considered
for query processing or not. Because of heterogeneity queries need to be rewritten, en-
abling cooperation between peers that use diﬀerent schemas. As existing query rewriting
techniques mostly consider conjunctive queries only, we present an extension that al-
lows for rewriting queries involving rank-aware query operators. As PDMSs are dynamic
systems and peers might update their local data, this dissertation addresses not only
the problem of considering such structures within a query processing strategy but also
the of keeping them up-to-date. Finally, we provide a system-level evaluation
by presenting SmurfPDMS (SiMUlating enviRonment For Peer Data Management Sys-
tems) – a system created in the context of this dissertation implementing all presented
techniques.
Processing Rank-Aware Queries in Schema-Based P2P Systems vZusammenfassung
Eﬃziente Anfragebearbeitung in Datenintegrationssystemen sowie in P2P-Systemen ist
bereits seit einigen Jahren ein Aspekt aktueller Forschung. Konventionelle Datenintegra-
tionssysteme bestehen aus mehreren Datenquellen mit ggf. unterschiedlichen Schemata,
sind hierarchisch aufgebaut und besitzen eine zentrale Komponente: den Mediator, der
ein globales Schema verwaltet. Anfragen an das System werden auf diesem globalen
Schema formuliert und vom Mediator bearbeitet, indem relevante Daten von den Daten-
quellen transparent fur¨ den Benutzer angefragt werden. Aufbauend auf diesen Systemen
entstanden schließlich Peer-Daten-Management-Systeme (PDMSs) bzw. schemabasierte
P2P-Systeme. An einem PDMS teilnehmende Knoten (Peers) k¨ onnen einerseits als Me-
diatoren agieren andererseits jedoch ebenso als Datenquellen. Darub¨ er hinaus sind diese
Peers autonom und k¨ onnen das Netzwerk jederzeit verlassen bzw. betreten. Die potentiell
riesige Datenmenge, die in einem derartigen Netzwerk verfugbar¨ ist, fuhrt¨ zudem in der
Regel zu sehr großen Anfrageergebnissen, die nur schwer zu bew¨altigen sind. Daher ist das
Bestimmen einer vollst¨ andigen Ergebnismenge in vielen F¨ allen ¨außerst aufw¨ andig oder
sogar unm¨ oglich. In diesen F¨ allen bietet sich die Anwendung von Top-N- und Skyline-
Operatoren, ggf. in Verbindung mit Approximationstechniken, an, da diese Operatoren
lediglich diejenigen Datens¨ atze als Ergebnis ausgeben, die aufgrund nutzerdeﬁnierter
Ranking-Funktionen am relevantesten fur¨ den Benutzer sind. Da durch die Anwendung
dieser Operatoren zumeist nur ein kleiner Teil des Ergebnisses tats¨ achlich dem Benutzer
ausgegeben wird, muss nicht zwangsl¨ auﬁg die vollst¨ andige Ergebnismenge berechnet wer-
den sondern nur der Teil, der tats¨ achlich relevant fur¨ das Endergebnis ist.
Die Frage ist nun, wie man derartige Anfragen durch die Ausnutzung dieser Erkennt-
nis eﬃzient in PDMSs bearbeiten kann. Die Beantwortung dieser Frage ist das Haupt-
anliegen dieser Dissertation. Zur L¨ osung dieser Problemstellung stellen wir eﬃziente
Anfragebearbeitungsstrategien in PDMSs vor, die die charakteristischen Eigenschaften
ranking-basierter Operatoren sowie Approximationstechniken ausnutzen. Peers werden
dabei sowohl auf Schema- als auch auf Datenebene hinsichtlich der Relevanz ihrer Daten
gepruft¨ und dementsprechend in die Anfragebearbeitung einbezogen oder ausgeschlossen.
Durch die Heterogenit¨ at der Peers werden Techniken zum Umschreiben einer Anfrage von
einem Schema in ein anderes n¨ otig. Da existierende Techniken zum Umschreiben von An-
fragen zumeist nur konjunktive Anfragen betrachten, stellen wir eine Erweiterung dieser
Techniken vor, die Anfragen mit ranking-basierten Anfrageoperatoren beruc¨ ksichtigt. Da
PDMSs dynamische Systeme sind und teilnehmende Peers jederzeit ihre Daten ¨andern
k¨ onnen, betrachten wir in dieser Dissertation nicht nur wie Routing-Indexe verwendet
werden, um die Relevanz eines Peers auf Datenebene zu bestimmen, sondern auch wie
sie gepﬂegt werden k¨ onnen. Schließlich stellen wir SmurfPDMS (SiMUlating enviRon-
ment For Peer Data Management Systems) vor, ein System, welches im Rahmen dieser
Dissertation entwickelt wurde und alle vorgestellten Techniken implementiert.
Processing Rank-Aware Queries in Schema-Based P2P Systems viiAcknowledgements
There are a couple of people I want to thank. First, I would like to thank Kai-Uwe
Sattler not only for his guidance and for supervising the project but also for oﬀering me
a PhD position after I graduated – without the oﬀer I might never have started a career
in database research. Second, I would like to thank Alon Halevy and Felix Naumann for
many discussions and for the feedback they have given me. Another person who paved
the way for me to become a computer scientist is Rainer Werner. It was he who helped
me to get along with my ﬁrst PC – especially after I have carefreely tried out some DOS
commands without knowing what they actually did – the format command in particular
had a surprising and long-lasting eﬀect. I am also grateful to my parents and my sister for
encouraging me to keep going no matter how much eﬀort it cost to reach my goal. I would
also like to thank Marcel Karnstedt and all other fellow PhD students at the department
who joined me on the way to receive a PhD for those many discussions and support in
all aspects. I would like to particularly mention Anja Fischer for reading several drafts
of this dissertation. I also appreciate the help of the students whose diploma theses I
supervised and who contributed to the development of SmurfPDMS: Christian Lemke,
Jana Quasebarth, Daniel Zinn, and Andreas Job.
Processing Rank-Aware Queries in Schema-Based P2P Systems ix