Etude de la dynamique des documents actifs pour la gestion d information distribuées, On the dynamics of active documents for distributed data management
168 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Etude de la dynamique des documents actifs pour la gestion d'information distribuées, On the dynamics of active documents for distributed data management

-

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
168 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Sous la direction de Serge Abitboul
Thèse soutenue le 11 février 2011: Paris 11
L'un des principaux problèmes que les applications Webs doivent gérer aujourd'hui est l'évolutivité des données. Dans cette thèse, nous considérons ce problème et plus précisément l'évolution des documents actifs. Les documents actifs sont documents XML pouvant évolués grâce à l'activation d'appel de services Web. Ce formalisme a déjà été utilisé dans le cadre de la gestion d'information distribuée. Les principales contributions de cette thèse sont l'étude théorique de différentes notions pour l'implémentation de deux systèmes gérant des applications manipulant des flux de données et des applications de type workflow. Dans un premier temps, nous étudions des notions reliées à la maintenance de vues sur des documents actifs. Ces notions sont utilisées dans l'implémentation d'un processeur de flux de données appelé Axlog widget manipulant des flux à travers un document actif. La deuxième contribution porte sur l'expressivité de différents formalismes pour contraindre le séquencement des activations d'un document actif. Cette étude a été motivée par l'implémentation d'un système gérant des workflows focalisés sur les données utilisant les documents actifs, appelé Axart.
-Xml
-Documents actifs
-Satisfiabilité
-Pertinence
-Workflow
One of the major issues faced by Web applications is the management of evolving of data. In this thesis, we consider this problem and in particular the evolution of active documents. Active documents is a formalism describing the evolution of XML documents by activating Web services calls included in the document. It has already been used in the context of the management of distributed data \cite{axml}. The main contributions of this thesis are theoretical studies motivated by two systems for managing respectively stream applications and workflow applications. In a first contribution, we study the problem of view maintenance over active documents. The results served as the basis for an implementation of stream processors based on active documents called Axlog widgets. In a second one, we see active documents as the core of data centric workflows and consider various ways of expressing constraints on the evolution of documents. The implementation, called Axart, validated the approach of a data centric workflow system based on active documents. The hidden Web (also known as deep or invisible Web), that is, the partof the Web not directly accessible through hyperlinks, but through HTMLforms or Web services, is of great value, but difficult to exploit. Wediscuss a process for the fully automatic discovery, syntacticand semantic analysis, and querying of hidden-Web services. We proposefirst a general architecture that relies on a semi-structured warehouseof imprecise (probabilistic) content. We provide a detailed complexityanalysis of the underlying probabilistic tree model. We describe how wecan use a combination of heuristics and probing to understand thestructure of an HTML form. We present an original use of a supervisedmachine-learning method, namely conditional random fields,in an unsupervised manner, on an automatic, imperfect, andimprecise, annotation based on domain knowledge, in order to extractrelevant information from HTML result pages. So as to obtainsemantic relations between inputs and outputs of a hidden-Web service, weinvestigate the complexity of deriving a schema mapping between databaseinstances, solely relying on the presence of constants in the twoinstances. We finally describe a model for the semantic representationand intensional indexing of hidden-Web sources, and discuss how toprocess a user's high-level query using such descriptions.
-Xml
-Active documents
-Satisfiability
-Relevance
-Workflow
Source: http://www.theses.fr/2011PA112003/document

Informations

Publié par
Nombre de lectures 26
Langue English
Poids de l'ouvrage 1 Mo

Extrait

Thèse de doctorat en informatique
Étude de la dynamique des documents actifs
pour la gestion d’information distribuées
On the dynamics of active documents for
distributed data management
Pierre Bourhis
11 Février 2011
Jury
Serge Abiteboul DR INRIA Saclay (directeur)
Michael Benedikt Prof. Univ. Oxford (rapporteur)
Albert Benveniste DR INRIA Rennes
Nicole Bidoit Prof. Univ. Paris Sud
Anca Muscholl Prof. Univ. Bordeaux 1
Victor Vianu Prof. Univ. San Diego
tel-00598299, version 1 - 6 Jun 2011tel-00598299, version 1 - 6 Jun 2011Étude de la dynamique des documents actifs
pour la gestion d’information distribuées
On the dynamics of active for
distributed data management
Pierre Bourhis
Résumé
L’un des principaux problèmes que les applications Webs doivent gèrer aujourd’hui est l’évolutivité des
données. Dans cette thèse, nous considèrons ce problème et plus précisément l’évolution des documents
actifs. Les documents actifs sont documents XML pouvant évolués grâce à l’activation d’appel de services
Web. Ce formalisme a déjà été utilisé dans le cadre de la gestion d’information distribuée. Les principales
contributions de cette thèse sont l’étude théorique de diérentes notions pour l’implémentation de deux
systèmes gèrant des applications manipulant des flux de données et des applications de type workflow.
Dans un premier temps, nous étudions des notions reliées à la maintenance de vues sur des documents
actifs. Ces notions sont utilisées dans l’implémentation d’un processeur de flux de données appelé Axlog
widget manipulant des flux à travers un document actif. La deuxième contribution porte sur l’expressivité de
diérents formalismes pour contraindre le séquencement des activation d’un document actif. Cette étude a
été motivée par l’implémentation d’un système gérant des workflows focalisés sur les données utilisant les
documents actifs, appelé Axart.
Abstract
One of the major issues faced by Web applications is the management of evolving of data. In this thesis, we
consider this problem and in particular the evolution of active documents. Active documents is a formalism
describing the evolution of XML documents by activating Web services calls included in the document.
It has already been used in the context of the management of distributed data [Abiteboul 08a]. The main
contributions of this thesis are theoretical studies motivated by two systems for managing respectively stream
applications and workflow applications. In a first contribution, we study the problem of view maintenance
over active documents. The results served as the basis for an implementation of stream processors based
on active documents called Axlog widgets. In a second one, we see active documents as the core of data
centric workflows and consider various ways of expressing constraints on the evolution of documents. The
implementation, called Axart, validated the approach of a data centric workflow system based on active
documents.
Mots clefs : XML, documents actifs, satisfiabilité, pertinence, workflow
Keywords: XML, active documents, satisfiability, relevance, workflow
tel-00598299, version 1 - 6 Jun 2011À l’exception de l’annexe C, qui propose un résumé de la thèse , cette thèse est rédigée en anglais.
With the exception of Appendix C, that is a summarize of the thesis, this thesis is written in English.
tel-00598299, version 1 - 6 Jun 2011Introduction
One of the major issues faced by Web applications is the management of evolving of data. In
this thesis, we consider this problem and in particular the evolution of active documents. Active
documents is a formalism describing the evolution of XML documents by activating Web services
calls included in the document. It has already been used in the context of the management of
distributed data [Abiteboul 08a]. The main contributions of this thesis are theoretical studies
motivated by two systems for managing respectively stream applications and workflow applications.
In a first contribution, we study the problem of view maintenance over active documents. The results
served as the basis for an implementation of stream processors based on active documents called
Axlog widgets. In a second one, we see active documents as the core of data centric workflows and
consider various ways of expressing constraints on the evolution of documents. The implementation,
called Axart, validated the approach of a data centric workflow system based on active documents.
In a first part, we focus on streaming applications. The Web includes a large number of sources
consisting of XML streams such as news or Blog feeds. Many Web pages are simply aggregations
of news feeds. At the heart of such pages, one finds stream querying. We present a formal model,
called Axlog, that captures simple queries over streaming sources. Our approach is in the spirit of
view maintenance over active documents. Our main contribution, published in [3], is a study of
two theoretical notions: satisfiability and relevance. We briefly outline an algorithm to maintain
a view over document that includes input streams, see [4]. The algorithm uses these theoretical
notions in order to combine together database techniques: optimization evaluation techniques for
datalog queries, view maintenance techniques, stream processing techniques for XML ,and filter
techniques. The Axlog model is supported by the system P2PMonitor, that is demonstrated in [9].
P2PMonitor is a peer to peer system which monitors other peer to peer systems by managing XML
stream queries. The system P2PMonitor and the view maintenance algorithm have been presented
in detail in the thesis of Bogdan Marinoiu [Marinoiu 09].
In a second part, we address the problem of sequencing interactions between Web applications.
E commerce Websites are a good example of applications where sequencing is crucial. The
interactions between the users and the applications are constrained in order to conform to a
workflow. Several workflow languages have been introduced, such as BPEL. However, most of the
languages focus on the sequencing of the actions rather than on the data used in the process. New
kinds of workflow languages more focused on data, called data centric workflows, have recently
been introduced. We present the AXML Artifact model, [5], inspired by the Business Artifact
model, a data centric workflow language introduced by IBM. Our main contribution, published in
[2], studies and compares dierent ways of expressing the sequencing of the operations based on
dierent paradigms including automata, pre and post conditions for operations, or temporal logic.
We briefly describe a system [8] implementing a portion of the AXML Artifact model.
The thesis is organized in two parts. The first deals with streaming applications and the second
with sequencing. In each part, we use the same organization. After an overview of the part, we
discuss the related work. We then present the model and study theoretical issues for the particular
model. Finally, we briefly discuss the implementation work based on these theoretical studies.
The first part is composed of three chapters: Chapter 1 presents the related work. Chapter 2
1
tel-00598299, version 1 - 6 Jun 2011Introduction
describes the model and the study of the two key notions of satisfiability and relevance in the
context of the Axlog model. Chapter 3 describes the algorithm proposed to eciently implement
applications based on the Axlog model and the system P2PMonitor supporting them.
The second part is composed of three chapters. Chapter 4 presents the related work. Chapter
5 describes the core of the model and the study of constraints specifying the evolution of active
documents. Finally Chapter 6 discusses some extensions of the model and an implementation.
Detailed proofs are provided in the appendix. Appendix C is a resume into French of the thesis.
2
tel-00598299, version 1 - 6 Jun 2011Part I.
Maintenance of Views over Active
Documents
3
tel-00598299, version 1 - 6 Jun 2011tel-00598299, version 1 - 6 Jun 2011Overview of Part I
Many Web applications are based on dynamic interactions between Web components exchanging
flows of information. Such a situation arises for instance in mashup systems [Ennals 07] or when
monitoring distributed autonomous systems [Abiteboul 07]. This is a challenging problem that has
recently generated a lot of attention; see Web 2.0 [O’Reilly ]. Starting from datalog and Active
XML technologies, we introduce a novel model, Axlog, for capturing interactions between Web
components and show how it can be supported eciently. An Axlog widget uses an active document
interacting with the rest of the world via streams of updates. Its input streams specify updates the
document (in the spirit of RSS feeds), whereas its output streams are defined by queries on the
document. More precisely, the output stream represents the list of update requests to maintain
the view for the query. The queries we consider here are tree pattern queries with value joins
(and a template to produce an XML result). Our data model and queries may include a time
dimension, an essential feature for such

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents