Xcerpt: a rule-based query and transformation language for the web [Elektronische Ressource] / von Sebastian Schaffert

ludwig-maximilians-universitat_munchen - Sebastian Schaffert

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

257 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Sujets

Informatik

Informations

Publié par	ludwig-maximilians-universitat_munchen
Publié le	01 janvier 2004
Nombre de lectures	27
Langue	English
Poids de l'ouvrage	2 Mo

Extrait

Xcerpt: A Rule-Based Query and Transformation Language
for the Web
Dissertation
zur Erlangung des akademischen Grades des
Doktors der Naturwissenschaften
¨ ¨an der Fakultat fur Mathematik, Informatik und Statistik
¨ ¨der Ludwig-Maximilians-Universitat Munchen
von
Sebastian Schaffert
Oktober 2004Erstgutachter: Prof. Dr. Franc¸ois Bry (Universita¨t Mu¨nchen)
Zweitgutachter: Prof. Dr. Georg Gottlob (TU Wien)
Tag der mu¨ndlichen Pru¨fung: 13. Dezember 2004Abstract
This thesis investigates querying the Web and the Semantic Web. It proposes a new rule-
based query language called Xcerpt. Xcerpt differs from other query languages in that it uses
patterns instead of paths for the selection of data, and in that it supports both rule chaining
and recursion. Rule chaining serves for structuring large queries, as well as for designing
complex query programs (e.g. involving queries to the Semantic Web), and for modelling
inference rules. Query patterns may contain special constructs like partial subqueries, optional
subqueries, or negated subqueries that account for the particularly ﬂexible structure of data on
the Web.
Furthermore, this thesis introduces the syntax of the language Xcerpt, which is illustrated
on a large collection of use cases both from the conventional Web and the Semantic Web.
In addition, a declarative semantics in form of a Tarski-style model theory is described, and
an algorithm is proposed that performs a backward chaining evaluation of Xcerpt programs.
This algorithm has also been implemented (partly) in a prototypical runtime system. A salient
aspect of this algorithm is the speciﬁcation of a non-standard uniﬁcation algorithm called
simulation uniﬁcation that supports the new query constructs described above. This uniﬁcation
is symmetric in the sense that variables in both terms can be bound. On the other hand it is in
contrast to standard uniﬁcation assymmetric in the sense that the uniﬁcation determines that
the one term is a subterm of the other term.
Zusammenfassung
Diese Arbeit untersucht das Anfragen des Webs und des Semantischen Webs. Sie stellt
eine neue regel-basierte Anfragesprache namens Xcerpt vor. Xcerpt unterscheidet sich von an-
deren Anfragesprachen insofern, als dass es zur Selektion von Daten sog. Pattern (,,Muster”)
verwendet und sowohl Regelschliessen als auch Rekursion unterstu¨tzt, was sowohl zur Struk-
turierung gro¨ßerer Anfragen als auch zur Erstellung komplexer Anfrageprogramme, und zur
Modellierung von Inferenzregeln dient. Anfrage-Pattern ko¨nnen spezielle Konstrukte, wie
partielle Teilanfragen, optionale Teilanfragen, oder negierte Teilanfragen, enthalten, die der
besonders ﬂexiblen Struktur von Daten im Web genu¨gen.
In dieser Arbeit wird weiterhin die Syntax von Xcerpt eingefu¨hrt, und mit Hilfe mehrerer
Anwendungsszenarien sowohl aus dem konventionellen als auch aus dem semantischen Web
erla¨utert. Ausserdem wird eine deklarative Semantik im Stil von Tarski’s Modelltheorie
beschrieben und ein Algorithmus vorgeschlagen, der eine ru¨ckwa¨rtsschliessende Auswer-
tung von Xcerpt durchfu¨hrt und in einem prototypischen Laufzeitsystem implementiert wurde.
Wesentlicher Bestandteil des Ru¨ckwa¨rtsschliessens ist die Speziﬁkation eines nicht-standard
Uniﬁkations-Algorithmus, der die oben genannten speziellen Xcerpt-Konstrukte beru¨ck-
sichtigt. Diese Uniﬁkation ist symmetrisch in dem Sinne, dass sie Variablen in beiden
angeglichenen (,,uniﬁzierten”) Termen binden kann. Andererseits ist sie im Gegensatz zur
Standarduniﬁkation assymmetrisch in dem Sinne, dass der dadurch geleistete Angleich den
einen Term als ,,Teilterm” des anderen erkennt.
Sebastian Schaffert III“At times our own light goes out and is rekindled by a spark from another person. Each of us
has cause to think with deep gratitude of those who have lighted the ﬂame within us.”
— Albert Schweitzer
Acknowledgements
The language Xcerpt presented in this thesis would not exist today without the continuous support and
contribution of many fellow researchers and students at the University of Munich, at the University of
Linko¨ping (Sweden), and elsewhere. Of those, particular gratitude goes to the following colleagues:
• Franc¸ois Bry, University of Munich, with whom I had many – sometimes heated but always fruitful
– discussions on almost all issues concerning Xcerpt.
• Włodzimierz Drabent, Polish Academy of Sciences, Warszawa, for discussing and proof-reading,
and hinting to some important mistakes.
• Norbert Eisinger, University of Munich, for spending much time with me discussing syntax and
semantics of the language, and for being (almost) always there when I needed him.
• Georg Gottlob, Technical University of Vienna, for being willing to take the burden of being second
supervisor of this thesis
• Jan Małuszyn´ski, University of Linko¨ping, for giving me the chance to present my work and for
many interesting discussions and ideas.
Also, I thank Tim Furche (University of Munich), Paula-Lavinia Pa˘traˆnjan (University of Munich), and
Artur Wilk (University of Linko¨ping) for working with me on several aspects of Xcerpt.
Furthermore, several graduate students, who worked on diploma and project theses during the develop-
ment of Xcerpt, deserve to be mentioned separately:
• Sacha Berger, who is now a fellow researcher, developed in his diploma thesis the visual language
visXcerpt which builds upon Xcerpt and has received a lot of attention in the research community. A
short introduction into this work can be found near the end of this thesis.
• Oliver Bolzer currently investigates Semantic Web querying with Xcerpt as part of his diploma thesis
and contributed many improvements to the source code of the prototypical runtime system presented
in this thesis.
• Sebastian Kraus worked extensively with the language Xcerpt during his diploma thesis, in which
he developed a comprehensive set of use cases for Xcerpt, some of which also appear in this thesis.
His work had much inﬂuence on the development of appropriate language constructs for Xcerpt.
• Andreas Schroeder developed in his project thesis several different approaches to backward chain-
ing in Xcerpt. Discussions and work with him ultimately lead to the operational semantics as it is
presented in this thesis.
. . . and last but not least: most gratitude goes to my family for supporting me patiently during the course
of this thesis.
This research has been partly funded by the European Commission and by the Swiss Federal Ofﬁce
for Education and Science within the 6th Framework Programme project REWERSE number 506779 (cf.
http://rewerse.net).
Traunstein and Munich, October 2004
IV Sebastian SchaffertCONTENTS
I Introduction and Motivation 1
1 Introduction 3
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Outline of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Design Principles of Xcerpt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Referential Transparency and Answer Closedness . . . . . . . . . . . . . . . . . . 5
1.3.2 Answers as Arbitrary XML Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.3 Pattern-Based Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.4 Incomplete Speciﬁcation of Query Patterns . . . . . . . . . . . . . . . . . . . . . 9
1.3.5 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.6 Forward and Backward Chaining . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.7 Separation of Querying and Construction . . . . . . . . . . . . . . . . . . . . . . 12
1.3.8 Reasoning Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Data Representation on the Web 15
2.1 Semistructured Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Traditional Database Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.2 Semistructured Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.3 Other Languages for Representing Semistructured Data . . . . . . . . . . . . . . . 18
2.2 XML – the Extensible Markup Language . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Markup Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 A Generic Markup Language for the Web . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Anatomy of an XML Document . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.4 XML Schema Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.5 XML References: ID/IDREF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.6 XML Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 XML, Semistructured Expressions and Semistructured Data . . . . . . . . . . . . . . . . 31
2.4 Three Scenarios for Querying Semistructured Data . . . . . . . . . . . . . . . . . . . . . 32
2.4.1 Student Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.2 Bookstore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.3 Document-Centric: PhD Thesis .