13 pages

English

Benchmarking an XML Mediator

Ulla - Fd

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

13 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

BENCHMARKING AN XML MEDIATOR

Florin DRAGAN, Georges GARDARIN
PRiSM Laboratory University of Versailles
78035 Versailles Cedex, France
email: Florin.Dragan@prism.uvsq.fr, georges.gardarin@prism.uvsq.fr
Abstract: In the recent years, XML has become the universal interchange format. Many investigations have
been made on storing, querying and integrating XML with existing applications. Many XML-
based commercial DBMSs have appeared lately. This paper reports on the analysis of an XML
mediator federating several existing XML DBMSs. We measure their storage and querying
capabilities directly through their Java API and indirectly through the XLive mediation tool. For
this purpose we have created a simple benchmark consisting in a set of queries and a variable test
database. The main scope is to reveal the weaknesses and the strengths of the implemented
indexing and federating techniques. We analyze two commercial native XML DBMS and an open-
source relational to XML mapping middleware. We first pass directly the queries to the DBMSs
and second we go through the XLive XML mediator. Results suggest that text XML is not the best
format to exchange data between a mediator and a wrapper, and also shows some possible
improvements of XQuery support in mediation architectures.

executing a query, but a few other ones (like
1. INTRODUCTION the size on disk to store a certain document)
are also proposed.
As XML capabilities have become more and The purpose of ...

Sujets

Publié par	Ulla
Nombre de lectures	147
Langue	English

Extrait

BENCHMARKING AN XML MEDIATOR

Florin DRAGAN, Georges GARDARIN

PRiSM Laboratory University of Versailles

78035 Versailles Cedex, France

email:

Florin.Dragan@prism.uvsq.fr

georges.gardarin@prism.uvsq.fr

Abstract:

In the recent years, XML has become the universal interchange format. Many investigations have

been made on storing, querying and integrating XML with existing applications. Many XML-

based commercial DBMSs have appeared lately. This paper reports on the analysis of an XML

mediator federating several existing XML DBMSs. We measure their storage and querying

capabilities directly through their Java API and indirectly through the XLive mediation tool. For

this purpose we have created a simple benchmark consisting in a set of queries and a variable test

database. The main scope is to reveal the weaknesses and the strengths of the implemented

indexing and federating techniques. We analyze two commercial native XML DBMS and an open-

source relational to XML mapping middleware. We first pass directly the queries to the DBMSs

and second we go through the XLive XML mediator. Results suggest that text XML is not the best

format to exchange data between a mediator and a wrapper, and also shows some possible

improvements of XQuery support in mediation architectures.

1. INTRODUCTION

As XML capabilities have become more and

more popular, a lot of XML-based products

and interfaces have been proposed. Several

XML DBMSs that have been developed try,

on the one hand, to offer the well known

capabilities of a standard DBMS, and on the

other hand, to implement new functionalities

and reach new levels of performance. In the

same time more and more classical DBMSs

add new extensions to store and retrieve XML

documents.

For

measuring

and

comparing

their

performances, a lot of XML benchmarks have

been proposed that "stress" different parts of

the systems, most often the storage engine and

the query processor, by means of a generally

complex set of queries. Each benchmark is

composed of a test database and a set of

queries trying to be as general and complete as

possible. There are also a few benchmarks

specific to a certain domain that propose a

specific format of database and a set of queries

specific to the simulated applications. The

most used metric is the response time for

executing a query, but a few other ones (like

the size on disk to store a certain document)

are also proposed.

The purpose of this paper is to present a

simple general mini-benchmark composed of a

few queries and a variable data set to evaluate

some techniques implemented in the core of

the DBMSs under the pressure of an XQuery

mediator. We are mostly interested in the

implemented

indexing

and

mediation

techniques and how they are influenced by the

size of the data set. Using our mini-

benchmark,

test

two

native

XML

commercial DBMSs and one open source

XML to relational mapping middleware and

analyze their response times. Next, we apply

our benchmark to an XML mediator for

finding the delays that are introduced by the

mediation operations. The conclusions show

that XML mediation is a time consuming

operation that has to be optimized both in

communication and processing time.

The rest of this paper is organized as

follows. In the next section we give an

overview of XML mediation technology

focusing on the XLive full-XML mediator.

Section 3 presents our mini benchmark (query

set and data set). We then introduce the

analyzed

local

systems

and

their

characteristics followed by, in section 5, the

results obtained by applying our benchmark

and their meanings. In section 6, we present

the results of the benchmarking operations

using the mediator. In the last section, we

summarize our results and suggest some

improvements to the mediator architecture.

2. XML MEDIATION

Mediation technology based on XML and

XQuery is under development. Some products

are already available. In this section, we

survey this new technology and describe our

XLive mediator (see

www.xquark.org

for an

industrial open source version).

2.1 Basics and Backgrounds

With the advent of XQuery as a standard for

querying XML collections [XQuery, 2003],

several mediator systems have been developed

using XQuery and XML schema as pivot

language and model. Examples of full XML

mediators are the Enosys XML Integration

Platform (EXIP [Papakonstantinou, 2003],),

the Software A.G. EntireX XML Mediator, the

Liquid Data mediator of BEA derived from

EXIP, the e-XMLMedia XML Mediator, a

predecessor of our current XLive project

[Gardarin, 2002].

XML Mediators are focused on supporting

the XQuery query language on XML views of

heterogeneous data sources. The data are

integrated

dynamically

from

multiple

information sources. Queries are used as view

definitions. During run-time, the application

issues XML queries against the views. Queries

and views are translated into some XML

algebra and are combined into single algebra

query plans. Sub-queries are sent to local

wrappers that process them locally and return

XML

results.

Finally,

the

global

query

processor

evaluates

the

result,

using

appropriate

integration

and

reconstruction

algorithms.

XQuery is a powerful language, which

encompasses SQL and much more. Notably, it

is able to query rich and extensible data types;

it is a functional language, so that any valid

expression applied to a valid expression is a

valid query; it will soon incorporate XQuery

Text for full text queries. XQuery Text shall

provide functionalities as single-word search,

phrase search, support for stop words, search

on prefix, postfix, infix, proximity searching,

word normalization, diacritics, ranking and

relevance. All these features will make

XQuery an ideal language for querying

integrated data sources.

2.2 Overview of XLive Mediator

In the XLive project, we use a mediation

architecture to support enterprise information

integration shown in Figure 1. It follows the

classical wrapper-mediator architecture as

defined

[Wiederhold,

1992].

The

communication

between

wrappers

and

mediator follows a common interface, which

is defined by an applicative Java or Web

service interface named XML/DBC. With

XML/DBC, requests are defined in XQuery

and results are returned in text XML format.

Web

Interface

Java

Application

Java Application

RDB1

Oracle

RDB2

MySQL

XML DB3

Xyleme

Wrapper

Mediator

Figure 1 - XLive Architecture

Our architecture is composed of mediators

that deal with distributed XML sources and

wrappers that cope with the heterogeneity of

the sources (DBMS, Web pages, etc.). The

XLive

mediator

data

integration

middleware

managing

XML

views

heterogeneous data sources. Using XLive

mediator one can integrate heterogeneous data

sources without replicating their data while the

sources remain autonomous.

XLive mediator is entirely based on W3C

standard technology: XML, XQuery, XML-

Schema,

SAX,

DOM

and

SOAP.

All

information exchanges rely on XML format.

XML-Schema

used

for

metadata

representation. Wrappers provide schemas to

export information about local data structures.

XQuery is employed for querying both the

mediator and the wrappers. Connectivity of

mediator

and

wrappers

relies

the

XML/DBC

programming

interface,

extension of JDBC to integrate XQuery. More

information about the XLive mediator can be

found in [Dang-Ngoc, 2003].

To integrate a new source into the

mediation architecture, a wrapper must be

built. It has to implement the XML/DBC

programming

interface.

DBMS

are

data

oriented sources and metadata are provided to

describe

sources

and

mappings.

DBMS

wrappers translate data sources in XML and

process a possibly reduced set of XQuery on

the source data. In the case of Web source, the

wrapper brings more intelligence. It aims at

semantically integrating Web information in a

common model accessible to programs.

3. PROPOSED BENCHMARK

Several benchmarks have been developed for

XML

DBMSs,

among

them

XMach-1

[XMach-1, 2001], XMark [XMark, 2001] ,

X007 [X007, 2002], XBench [XBench, 2004].

They all have their interests, but are in general

too complex for current mediators, both in

functionality and size. In this section, we

introduce our simpler benchmark.

3.1 Presentation

We propose a simple generic benchmark for

testing the basic functionalities of an XML

mediator and evaluating the performances of

the different join algorithms and indexing

schemas of the local sources. The existing

benchmarks

generally

propose

set

complex queries that evaluate many of the

properties of the query processor in the same

query. By appealing at simple operations, our

goal is to stress only certain functions: local

indexing, XML transfer and parsing, join

algorithms, etc. Another reason for proposing

only simple queries is that we used our

benchmark to test the XLive mediator that

performs basic XQuery to integrate multiple

sources. Generally, it takes a long time for a

mediator to perform complex join operations

(time that depends on the mediator join

algorithms and on other external parameters as

the

network

delay,

the

distant

DBMS

capabilities, and on the source speed to

transfer the results). Yet another reason to use

a simple XQuery benchmark is that most

tested DBMSs only support the core of

XQuery with realistic performance on the

computer we are using.

3.2 Data Set

The data set is composed of 2 document

models: one data oriented and the other text

oriented. With a small depth (of maximum 3)

and a small width (of maximum 5), the two

documents have a simple structure that

facilitates the evaluation of different structural

selection queries. The two documents are

logically connected, which gives us the

possibility to perform simple join operations

between documents that are located on

different systems. A graphical representation

of the schema of the two documents is given

in Figure 2 and 3. The schema is variable in

the sense that neither the number of "authors"

of a book nor the number of paragraphs in the

reviews are constant. The textual content is

generated from the most popular English

words extracted from Shakespeare’s plays.

Figure 2: Catalog schema

Figure 3: Review schema

In order to evaluate the performances of

the XML systems, we generated 3 data sets:

300/750/1500 documents, each documents

having a size less than 2k. We used the utility

toXgene [toXgene] and we started from a

provided example for generating our data set.

3.3 Queries

Our benchmark proposes a representative set

of XML DBMS query functionalities, which

can be grouped as follows:

(i) Simple XPath expressions.

Queries Q1 and

Q2 represents XQueries that require selections

on the elements and attributes names:

Q1:

for

collection("catalog")

/catalog/book return $b

Q2:

for $currency in collection("catalog")

/catalog/book/price/@currency

return

$currency

(ii) XPAth with predicates.

Q3, Q4, Q5

introduce

predicates

perform

simple

selections.

predicate

tests

for

exact

equality:

Q3:

for

collection("catalog")

/catalog/book where $b/price/@currency =

"CDN" return $b

Q4 contains a “range” predicate:

Q4:

for

collection("catalog")

/catalog/book where $b/price < 100 return $b

Q5 contains the two previous predicates:

Q5:

for

collection("catalog")

/catalog/book where $b/price < 100 and

$b/price/@currency = "CDN" return $b

(iii) Recursive Path optimization.

Q6 contains

a recursive wildcard "//" expression that tests

for the optimality of the path evaluation

(sometimes called the indexation of "//"):

Q6:

for

$col

collection("catalog")

return $col//price

(iv)

Result

ordering.

For

testing

the

performances of generating an ordered result,

we have introduced an order-by XQuery:

Q7:

for $col_rev in collection("review"),

$rev

$col_rev/review,

$rate

$rev/review/@rating order by ($rate) return

$rev

(v) Text search.

Q8 contains the "contains"

predicate

stress

some

text

indexing

capabilities:

Q8:

for

collection("catalog")

/catalog/book

where

contains($b/author,

"Fumio") return $b

(vi) Joins on values.

Q9 and Q10 require joins

between the two documents:

Q9 join and text searching: for $col_cat in

collection("catalog"),

$col_rev

collection("review"),

$col_cat/catalog/book,

$rev

$col_rev/review, $rev_rev in $rev/review

where

$b/@isbn=$rev/book/@isbn

and

contains($rev_rev,"dolphins")

return $b/@genres.

Q10

equality

join:

for

$col_cat

collection("catalog"),

$rev_cat

collection("review"),

$col_cat/catalog/book,

$rev_cat/review

where $b/@isbn=$r/book/@isbn

return $r/review/@rating

(iiv)

Result

generation.

Q11

tests

the

performances of the "query processor" to

generate new results:

Q11:

for $col_rev in collection("review"),

$rev in $col_rev/review

where $rev/review/@rating <2

return

<title>{$rev/book/title/text()}</title>

{for $col_cat in collection("catalog"),

$b in $col_cat/book

where $b/@isbn=$rev/book/@isbn

return <price>$b/price/text()</price>}

</lowRateBook>

3.4 Metrics

For evaluating a query processor, we measure

the query execution time and the size (in

bytes) of the result.

3.5 Benchmark Host

For running the benchmark and evaluating the

different DBMSs we have used a PC with the

following configuration:

vendor\s\do6(i)d : GenuineIntel

cpu family : 6

model : 9

model name : Intel® Pentium® M

processor 1600MHz

stepping : 5

cpu MHz : 1598.674

cache size L1 : 0 KB

fpu : yes

cpuid level : 2

wp : yes

flags : fpu vme de pse tsc msr mce cx8

sep mtrr pge mca cmov pat clflush dts

acpi mmx fxsr sse sse2 tm

bogomips : 3191.60.

OS: RedHat Linux 9,

kernel version:

2.4.20-8

3.6 Benchmarking Method

All the systems were evaluated using the

provided Java API. The queries were run ten

times and the average execution time was

presented. The execution of every query

followed the warming “step”, which consists

in executing the same query several times

before the evaluation. The set of results was

not scaled by eliminating the min or max .

4. DATA SOURCE DBMSs

In this section, we present the local DBMSs

handling the data sources and integrated in the

mediation platform. For commercial reasons,

we call the native XML DBMSs XDBMS1

and XDBMS2.

4.1 XDBMS1

XDBMS1 is a native European XML DBMS

that handles the storage, retrieval, indexing,

integration and distribution of semi-structured

data. The basic components are the repository,

tailored to tree data, and the index manager

that provides two kind of indexes: a standard

B-tree for indexing dates and integers and a

full text index indexing keywords but also

path labels. XDBMS1 main features are the

scalability and the power of rapid query

processing based on the index kept in memory.

It supports as query language a limited

XQuery but does not support XPath exactly.

For this reason, we had to translate all our

queries in XDBMS1 XQuery.

When a document is loaded in the

XDBMS1

repository,

automatically

indexed. XDBMS1 uses an XML index that

takes into consideration the data from an XML

file and also the metadata, storing information

about the meaning and the context of the

words. All the words resulting from stemming

are indexed. This means that basically the

separators are not indexed, nor the very

frequent words (like the prepositions or

articles). Different forms of the same word are

indexed by the same root word and only the

position of the words is added to the index.

This means that for the "plural" of a word only

the position in the document is added in the

index, but a new entry is not created (the entry

being

represented

the

"singular").

Generally the index managed by XDBMS1 is

reduced, the maximum expansion factor to the

data being of 80.

4.2 XDBMS2

For introducing a different native DBMS for

our

benchmark,

selected

XDBMS2,

another European XML DBMS. XDBMS2 is a

native XML DBMS that provides advanced

XML

data

processing

and

storage

functionality. Besides storage and querying, it

provides capabilities of versioning, indexing,

link

management,

publishing,

schema

verification,

etc.

XDBMS2

offers

the

possibility to create multiple types of indexes

among them: value, attribute ID, element

name, full text, ...

XDBMS2 provides

a user

controlled

indexing system that for each library or

document loaded in the repository creates a

new set of indexes. Depending on the updating

modality, the indexes are divided in "live" or

"non-live". The "live" indexes are updated

automatically when new data is loaded in

opposite to the "non-live" indexes that are

updated only on request. Several types of

indexes are implemented :

library indexes: A library is a logical

structure that can contain a set of

documents

and

other

libraries.

The

presence of libraries can reduce the

"range" of queries and, consequently,

speed up their execution. Library indexes

are live indexes.

id attribute indexes: They store elements

by their ID attributes specified in the DTD

or XML-Schema. They are live indexes

created at document or library level.

element name indexes: The elements in a

library or document are indexed by their

name. XDBMS2 provides the facility of

indexing all the elements or only the

elements of a certain selection.

value indexes: These are live indexes,

created at library or document level, that

stores elements by their value or by the

value of their attributes. It is possible to

specify the type of the values that will be

put

the

index

being

the

user

responsibility

convert

the

element/attribute values to declared index

types.

full text indexes: Stores elements by their

textual values or by the values of their

attributes. Apart from the value indexes,

using this type of index make possible to

select elements that have a certain word in

their textual value.

content conditioned indexes: Only certain

nodes are indexed according to a user

defined key. The user has to write a filter

for selecting the nodes that will be

indexed and to assign a key with each

equivalent class.

4.3 XQuark Bridge

XQuark

Bridge

XQuery

wrapper

proposed in open source by the XQuark

company to query relational databases in

XQuery. With XQuark Bridge, each table is

seen as a flat XML collection of documents.

Queries can be formulated in XQuery to

generate nested XML or to define XML views

of relational tables. The views can in turn be

queried in XQuery. XQuark Bridge works in a

similar way with Oracle, SQL Server or

MySQL. Another goal of our benchmark is to

evaluate XQuark Bridge in comparison with

native XML DBMSs.

5. DBMS EVALUATIONS

In this section we first present the results of

analyzing XDBMS1 and XDBMS2 with the

proposed set of queries. Next, we present the

results of analyzing XQuark Bridge. We

created

two

clusters

for

XDBMS1

and

similarly two libraries for XDBMS2, each of

them containing one category of documents. A

full text index and an element name index

have been created with XDBMS2. XDBMS1

database is indexed with the default indexing

configuration.

5.1 Results of experiments with

XDBMS1 and XDBMS2

We are interested in both the processes of

query

evaluation

and

result

generation.

XDBMS2 uses a lazy evaluation technique;

thus for evaluating all the results our test

method iterates fully thought the iterator.

Table1 contains the results for a data set

with the 2 document models (one structured

and one unstructured) organized in 201 files

(one for the structured document and 200 for

the others). The size of the single structured

document is 40k and each document of the

others is less than 2K. The size of the data set

is in this case around 300K.

Query

Time

Results

XDBMS1

XDBMS2

elements

32,6ms

60,2ms

100

14,4ms

21,1ms

100

34,6ms

18,4ms

100

33,9ms

24,6ms

32,2ms

17,0ms

13,6ms

7,2ms

100

190,6ms

77,7ms

200

12,2ms

1,9ms

25,6ms

269,6ms

Q10

37,7ms

249,4ms

200

Q11

72,3ms

35,5ms

Table 1: Results for DataSet 1

Table2 presents the results for the data set

2 of size 704K.

Query

Time

Results

XDBMS1

XDBMS

elements

77,5ms

88,5ms

250

20,0ms

23,6ms

250

82,5ms

41,6ms

250

54,2ms

39,5ms

168

71,0ms

29,9ms

168

21,0ms

16,2ms

250

382,0ms

153,5ms

500

16,2ms

4,8ms

45,8ms

1509ms

212

Q10

88,9ms

1404ms

500

Q11

272,8ms

76,4ms

148

Table 2: Results for DataSet 2

Table 3 presents the results for the data set

3 of size 1.3M.

Query

Time

Results

XDBMS

XDBMS2

elements

152,3ms

113,6ms

500

29,9ms

31,4ms

500

153,2ms

69,9ms

500

105,3ms

54,7ms

339

131,3ms

54,2ms

339

34,3ms

25,3ms

500

686,6ms

301,2ms

1000

19,1ms

7,1ms

93,3ms

5859,6ms

427

Q10

207,9ms

5717,4ms

1000

Q11

766,8ms

165,1ms

285

Table 3: Results for DataSet 3

We also measured the time required by

XDBMS1 to generate the first result. The

results are presented in table 4 :

Query

DS1

DS2

DS3

10,5

9,2

9,5

9,7

10,1

10,0

9,2

9,4

11,2

10,4

8,4

8,5

8,6

62,1

152,1

320,2

9,3

9,2

9,8

11,4

11,8

11,2

Q10

11,5

10,5

Q11

14,3

15,5

Table 4: XDBMS1 time to generate the first

result

5.2 Some Discussions

Generally XDBMS1 times are better when

evaluating

queries

that

imply

simple

selections. For queries Q1 and Q2 with

element selection XDBMS1 gives good results

(generally this would be probable when using

a structure index but XDBMS1 indexes only

stemmed words) but for the third query

XDBMS2 results are better. The third query

requires the evaluation of a simple equal

predicate that compares the value of an

attribute (an exact match).

The same thing happens on Q5 with the

presence of the same predicate. There may be

the cause of XDBMS2 value index that

performs better than the stemmed indexation

proposed by XDBMS1. This can be explained

also by the fact that the results for Q1 and Q3

with the same number of results and the same

returned structure are very close for XDBMS1

and very different for XDBMS2, the last being

influenced by the value index.

According to XDBMS1 developers, it

would be better for XDBMS1 if a “contains”

statement

would

replace

the

equality

predicate;

query

processing

would

then

beneficiate

from

XDBMS1

stemming

technique. But, the benchmark will then be

different.

Q7 involves the creation of a sorted set of

results ("clause order-by") on the attributes

value. Again XDBMS2 value index tends to

be more efficient. For Q8 (text searching)

XDBMS2 performs better. This fact can be

explained by the text index of low dimension

that is directly used by XDBMS2 (using ftd

function in a reformulation of Q8 for taking

advantage of XDBMS2 text index).

Q9 and Q10 require the computation of

join operations; the execution time is greatly

influenced by the join algorithms. XDBMS1

join algorithms seem to be more optimized

and to generate faster the results. When

performing the join operation with active

indexes, XDBMS2 times are bigger than

without the indexes; thus, we present

the

response times without the presence of

indexes.

Q11 stresses the repository and involves

new result generation. The result creation

technique utilized by XDBMS2 seems to be

more efficient.

Generally XDBMS1 does not perform

very

well

simple

selections

when

increasing the size of the database. This is

somehow contrary to stemming indexation

that should be very efficient on large data sets.

For the join operations, even for bigger data

sets, XDBMS1 works very fine with fast

results.

5.3 Results of experiments with

XQuark Mapping XML to Tables

In this sub-section, we present the results of

analyzing XQuark Bridge with the proposed

set of queries. We first run XQuark on top of

Oracle, and then on top of MySQL. For each

dataset, we define the natural mapping to

relational tables with foreign keys for joins.

The tables were indexed on keys and foreign

keys. We run the benchmark on our mono-

processor system. Results are given in table

5,6 and 7. They are quite good for Oracle but

no so good for MySQL (Q5 is 0 for MySQL

because of a wrapper fault). This is due to the

fact that nested SQL queries resulting from the

mapping are not processed efficiently by

MySQL.

Query

Time

Results

Oracle

MySQL

elements

132,0

127,2

100,0

23,8

33,9

100,0

105,1

122,1

100,0

91,0

113,0

69,0

91,1

0,0

69,0

32,6

27,5

100,0

281,8

469,2

200,0

35,4

56,2

4,0

51,2

249,7

79,0

Q10

26,1

238,2

200,0

Q11

86,7

2312,4

60,0

Table 5: XQuark results for DS1

Query

Time

Results

Oracle

MySQL

elements

162,3

217,6

250,0

33,0

11,5

250,0

173,0

236,9

250,0

134,8

185,8

168,0

154,1

0,0

168,0

37,5

15,8

250,0

590,5

674,7

500,0

72,1

110,8

10,0

40,3

9998,8

212,0

Q10

61,6

1546,6

500,0

Q11

133,6

55531,4

148,0

Table 6: XQuark results for DS2

Query

Time

Results

Oracle

MySQL

elements

302,9

355,9

500,0

46,2

15,5

500,0

335,7

433,0

500,0

231,0

327,7

339,0

235,0

0,0

339,0

39,2

41,9

500,0

1119,5

1290,7

1000,0

80,3

179,6

15,0

80,2

42493,8

427,0

Q10

61,1

26934,0

1000,0

Q11

214,3

277804,6

285,0

Table 7: XQuark results for DS3

6. MEDIATOR EVALUATION

We run the benchmark queries on top of the

XLive mediator using XDBMS1 and the

XDBMS2 as data source. With multiple data

sources, times are sensibly the same as with

one. Thus, we only report the results for the

mediator on top of a unique data source.

6.1 Results of experiments

Tables 8,9 and 10 present the results of

evaluating the query using a mediator on top

of XDBMS1 and XDBMS2 for all the data

sets. Most time in the mediator is taken to

iterate

the

intermediate

results

and

construct the final result. As XLive exchanges

data with sources in text XML (as with Web

services), a reparsing of all the partial results

is required, which is costly in Java on a small

portable computer.

Better results could be

obtained if the mediator would use a cache for

temporary storing source query results in an

easy to serialize format.

Query

Time

Results

XDBMS1

XDBMS2

elements

444,5

524,8

100

245,4

213,1

100

504,2

417,8

100

333,4

264,9

422,2

306,8

206,9

269,1

100

992,1

2151,8

200

137,5

4,61

423,3

1939,6

Q10

698,6

3293,5

200

Q11

945,5

1527,5

Table 8: Mediator results for DS1

Query

Time

Results

XDBMS1

XDBMS2

elements

758,9

2378,5

250

335,7

313,6

250

879,1

1804,8

250

652,1

866,0

168

674,8

893,2

168

263,9

263,7

250

1242,7

3412,0

500

136,8

7,24

508,0

3128,4

212

Q10

997,5

7986,1

500

Q11

2008,4

2854,8

148

Table 9: Mediator results for DS2

Query

Time

Results

XDBMS1

XDBMS2

elements

1106,6

7490,3

500

388,3

759,5

500

1144,5

8174,7

500

759,2

3998,3

339

739,1

3661,3

339

933,8

979,3

500

1887,5

4816,6

1000

131,8

8,52

829,6

6160,3

428

Q10

1586,4

14962,6

1000

Q11

4411,7

4051,7

285

Table 10: Mediator results for DS3

6.2 Some Discussions

It is important to mention that the mediator

evaluation time is strongly influenced by the

Java API provided by the mediated DBMSs.

This may mean that sometimes the generated

sub-queries are the best possible.

Another important point is that at the

mediator level, it is not always possible to

benefit from the best indexing techniques of

each local data source. For example when

evaluating Q8 on XDBMS2, in order to take

advantage of the text indexation, it is required

use

the

non-XQuery

function

“XDBMS2:fts”.

the

other

hand

the

mediator supports standard XQuery with no

specific

functions.

Thus,

optimized

translation

from

XQuery

XDBMS2

functions would require more parameters and

a constant phasing of the wrapper with the

vendor's different optimal functions. Another

actual problem that penalize the mediator

evaluation is the translation between the

XLive XQuery to real DBMSs, which are in

reality far from the standards. This factor

should

disappear

with

the

finale

standardization of XQuery.

For DS1, total time for running the whole

benchmark with XDBMS1 is 499 ms while it

is 5353 ms with the mediator on top of

XDBMS1. This shows an average factor of 10,

mainly due to data transfer and parsing. Total

time with XDBMS2 is 71 versus 922 with the

mediator on top of XDBMS2. This shows an

average factor of 13. The global difference

may come from the quality of the wrapper

(better optimizations have been made with

XDBMS1). Other ratios with the other data

sets

DS2

and

DS3

are

bit

better

(approximately 7 and 5) for XDBMS1. The

more reduced ratios are caused by the fact that

the query processing time, at XDBMS1 level,

grows “faster” that the time required to parse

additional results, at the mediator level.

Figures 4, 5 and 6 gives the detailed ratios

between the response time with mediator

versus direct response time. The ratio for

XDBMS2 increases for bigger data sets. This

means that the time required to analyze more

results

(due

iteration,

parsing,

and

serialization)

grows

“faster”

than

the

additional time required by XDBMS2 to

generate more results.

2 0

2 5

3 0

3 5

4 0

4 5

X DBM S1

X DBM S2

Q10

Q11

Univers
Ebooks
Livres audio
Presse
Podcasts
BD
Documents

Livre audio en ligne - Développement personnel Livre en ligne Tout le catalogue Tous les Intérêts

Benchmarking an XML Mediator

XMLHttpRequest

Queries Per Second

Adaptateur (patron de conception)

Q1 Tower

Système de gestion de base de données

Q400

YouScribe

Le catalogue

Le service

Les conditions