Advanced indexing and query processing for multidimensional databases [Elektronische Ressource] / vorgelegt von Evangelos Dellis

philipps-universitat_marburg

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

197 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Sujets

Informatik

Informations

Publié par	philipps-universitat_marburg
Publié le	01 janvier 2007
Nombre de lectures	6
Langue	English
Poids de l'ouvrage	1 Mo

Extrait

Advanced Indexing and Query Processing
for Multidimensional Databases
Dissertation zur Erlangung des
Doktorgrades der Naturwissenschaften
(Dr. rer. nat.)
dem Fachbereich Mathematik und Informatik
der Philipps-Universit˜at Marburg vorgelegt von
Evangelos Dellis
aus Trikala
Marburg/Lahn 2007Vom Fachbereich Mathematik und Informatik
der Philipps-Universit˜at Marburg als Dissertation am 3. September 2007
angenommen.
Erstgutachter: Prof. Dr. Bernhard Seeger
Zweitgutachter: Prof. Dr. Yannis Theodoridis (University of Piraeus)
Tag der Mundlic˜ hen Prufung˜ am 7. September 2007.Acknowledgement
I would like to express my thanks to all people who supported me during the
past years while I have been working on this thesis. I extend my warmest
thanks to my supervisor, Professor Dr. Bernhard Seeger. He took particular
care to maintain a good working atmosphere within the group and to provide
a supportive and inspiring environment. I am grateful to Prof. Dr. Yannis
Theodoridis who was readily willing to act as the second referee to this work.
I would also like to thank Dr. J˜org K˜amper, Max Planck Institute for Terres-
trial Microbiology, Marburg, Germany and Dr. Georgios Paliouras, National
Center for Scientiﬂc Research ’Demokritos’, Athens, Greece for their ﬂnancial
support. This work could not have grown and matured without the discus-
sionswithmycolleagues.InparticularIwouldliketothankmycolleaguesIlya
VladimirskiyandespeciallyAkriviVlachouwhoinspiredmetoagreatextent.
Otherfruitfuldiscussionswhichbroughtthisworkforwardtookplacewith(in
alphabetical order): Michael Cammert, Christoph Heinz, Jurgen˜ Kr˜amer, To-
biasRiemenschneiderandSonyVaupel.Ithankthemall.Ialsoappreciatethe
substantial help of the students whose study thesis or diploma thesis I super-
vised,including:YuchenDeng,LiangDong,TaiLin,HuiminYangandLiping
Yu. Particular thanks go to Oliver Dippel and Mechthild Kessler. Finally I
like to thank my family and friends, who constantly supported me during the
development of this thesis. Especially my parents who always supported my
career and encouraged me to ﬂnd my way.
Evangelos Dellis
Marburg, July 2007.Abstract
Many new applications, such as multimedia databases, employ the so-called
feature transformation which transforms important features or properties of
data objects into high-dimensional points. Searching for ’similar’ or ’non-
dominated’ objects based on these features is thus a search of points in this
feature space. To support e–cient query processing in these high dimensional
databases, high-dimensional indexes are required to prune the search space
and e–cient query processing strategies employing these indexes have to be
designed.Inordertomanagethehugevolumeofsuchcomplexdata,advanced
database systems are employed. In contrast to conventional database systems
thatsupportexactmatchqueries,theuseroftheseadvanced
focuses on applying similarity search and skyline retrieval techniques.
Based on an analysis of typical advanced database systems - such as mul-
timedia databases, electronic market places, and decision support systems -
thefollowingfourchallengingcharacteristicsofcomplexityaredetected:high-
dimensionality nature of data, re-usability of existing index structures, novel
(moreexpressive)queryoperatorsforadvanceddatabasesystemsande–cient
analysis of complex high-dimensional data. Therefore, the general goal of this
thesis is the improvement of the e–ciency of index based query processing in
high-dimensional data spaces and the development of novel query operators.
The ﬂrst part of this thesis deals with similarity query processing tech-
niques. We introduce a new approach to indexing multidimensional data that
isparticularlysuitableforthee–cientincrementalprocessingofnearestneigh-
bor queries. The basic idea is to split the data space vertically into multiple
low-andmedium-dimensionaldataspaces.Thedatafromeachoftheselower-
dimensional subspaces is organized by using a standard multi-dimensional
index structure. In order to perform incremental NN-queries on top of index-
striping e–ciently, we ﬂrst develop an algorithm for merging the results re-
ceived from the underlying indexes. Then, an accurate cost model relying on
a power law is presented that determines an appropriate number of indexes.
Moreover, we consider the problem of dimension assignment, where each di-
mension is assigned to a lower-dimensional subspace, such that the cost ofX Abstract
nearest neighbor queries is minimized. Furthermore, a generalization of the
iDistance technique, called Multidimensional iDistance (MiD), for k nearest
neighbor query processing is presented. Three main steps are performed for
buildingMiD.InagreementwithiDistance,ﬂrstly,datapointsarepartitioned
into clusters and, secondly, a reference point is determined for every cluster.
However, the third step substantially diﬁers from iDistance as a data object
is mapped to a m-dimensional distance vector where m > 1 generally holds.
The m dimensions are generated by splitting the original data space into m
subspaces and computing the partial distance between the object and the
reference point for every subspace. The resulting m-dimensional points can
be indexed by an arbitrary point access method like an R-tree. Our crucial
parameter m is derived from a cost model that is based on a power law. We
present range and k-NN query processing algorithms for MiD.
The second part of this thesis deals with skyline query processing tech-
niques. We ﬂrst introduce the problem of Constrained Subspace Skyline
Queries (CSSQ). We present a query processing algorithm which builds on
multiple low-dimensional index structures. Due to the usage of well perform-
inglowdimensionalindices,constrainedsubspaceskylinequeriesforarbitrary
large subspaces are e–ciently supported. Eﬁective pruning strategies are ap-
plied to discard points from dominated regions. An important ingredient of
our approach is the workload-adaptive strategy for determining the number
of indexes and the assignment of dimensions to the indexes. Furthermore, we
introduce the concept of Reverse Skyline Queries (RSQ). Given a set of data
points P and a query point q, an RSQ returns the data objects that have
the query object in the set of their ’dynamic’ skyline. Such kind of dynamic
skyline corresponds to the skyline of a transformed data space where point
q becomes the origin and all points are represented by their distance to q.
In order to compute the reverse skyline of an arbitrary query point, we ﬂrst
propose a Branch and Bound algorithm (called BBRS), which is an improved
customization of the original BBS algorithm. To reduce even more the com-
putational cost of determining if a point belongs to the reverse skyline, we
propose an enhanced algorithm (called RSSA), that is based on accurate pre-
computed approximations of the skylines. These approximations are used to
identify whether a point belongs to the reverse skyline or not.
The eﬁectiveness and e–ciency of all proposed techniques are discussed
and veriﬂed by comparison with conventional approaches in versatile experi-
mental evaluations on real-world datasets.Abstract (in German)
Eine weit verbreitete Technik in vielen neuen Anwendungen, wie z. B. Mul-
timedia Datenbanken, ist die sogenannte Feature-Transformation, bei der
wichtige Eigenschaften der Datenobjekte in Punkte eines mehrdimension-
alen Vektorraums, die sogenannten Feature-Vektoren ub˜ erfuhrt˜ werden. Die
Suche nach ’˜ahnlichen’ oder ’nicht-dominierten’ Objekten, die auf diesen
Feature-Vektoren basieren ist realisiert als Suche nach Punkten in diesem
mehrdimensionalen Vektorraum. Damit eine e–ziente Anfragebearbeitung in
diesen hoch-dimensionalen Datenbanken unterstutzt˜ werden kann, sind hoch-
dimensionale Indexstrukturen notwendig, die den Suchraum beschr˜anken,
und es mussen˜ e–ziente Anfragebearbeitungsstrategien, die diese Indexe
verwenden, entwickelt werden. Bei der eﬁectiven Verwaltung dieser grossen
MengevonkomplexenDatenkommenDatenbanksystemefur˜ Nicht-Standard-
Anwendungen zum einsatz. Im Gegensatz zu herk˜ommlichen Datenbanksys-
temen, die ’exact-match’ Anfragen unterstutzen,˜ fokussiert der Benutzer
˜dieser Nicht-Standard Datenbanksysteme auf Ahnlichkeitsuche und sogenan-
nte Skyline-Techniken.
AusgehendvoneinerAnalysedertypischenhochentwickeltenDatenbanksys-
teme, wie Multimedia-Datenbanksysteme, elektronische Marktpl˜atze, und
Entscheidungsunterstutzungssysteme,˜ werden die folgenden vier grundlegen-
denCharakteristikafestgestellt:hoch-dimensionaleNaturderDaten,Wiederver-
wendbarkeitvonexistierendenIndexstrukturen,neue(ausdrucksst˜arkere)An-
fragetypen fur˜ Nicht-Standard-Datenbanksysteme, und e–ziente Analyse von
komplexenhoch-dimensionalenDaten.DasHauptzieldervorliegendenArbeit
ist deshalb die Verbesserung der Performanz bei der indexbasierten Anfrage-
bearbeitung in hochdimensionalen Datenr˜aumen und die Entwicklung neuer
Anfrageoperatoren.
˜DerersteTeilderArbeitbesch˜aftigtsichmitMethodenderAhnlichkeitssuche.
Wirpr˜asentiereneineneueIndexierungstechnikfur˜ mehrdimensionalenDaten,
die besonders geeignet ist fur˜ die inkrementelle Bearbeitung von N˜achste-
Nachbar (NN) Anfragen. Die Daten werden zuerst in mehreren niedrigdi-
mensionalen R˜aumen verteilt und dann mit Standard-XII Abstract (in German)
Ind