//img.uscri.be/pth/dcc8ce0628c40e0841e6099c50fb1b3fa6102d99
Cet ouvrage fait partie de la bibliothèque YouScribe
Obtenez un accès à la bibliothèque pour le lire en ligne
En savoir plus

XyView: Universal Relations Revisited

De
15 pages
XyView: Universal Relations Revisited Dan Vodislav Sophie Cluet Gregory Corona Imen Sebei CNAM/CEDRIC Paris, France INRIA Rocquencourt, France Xyleme Paris, France CNAM/CEDRIC Paris, France Abstract We present XyView, a practical solu- tion for fast development of user- (web forms) and machine-oriented applications (web services) over a repository of het- erogeneous schema-free XML documents. XyView provides the means to view such a repository as an array that can be queried using a QBE-like interface or through sim- ple selection/projection queries. It ex- tends the concept of universal relations in mainly two ways: (i) the input is not a re- lational schema but a potentially large set of XML data guides; (ii) the view is not defined explicitely by a unique query but implicitly by various mappings so as to avoid data loss and duplicates generated by joins. Developed on top of the Xyleme content management system, XyView can easily be adapted to any system support- ing XQuery. Keywords: XML views, heterogeneous data integration, application development tools, universal relation 1 Introduction For decades, companies have produced digital data such as notes, contracts, emails, progress re- ports, minutes, etc. This data constitute a mine of useful information that is largely unexploited.

  • projection

  • xyview

  • ple selection-projection

  • xml document

  • single struc- ture

  • projection queries

  • view schemas


Voir plus Voir moins

XyView:UniversalRelationsRevisited

DanVodislavSophieCluet
CNAM/CEDRICINRIA
Paris,FranceRocquencourt,France
vodislav@cnam.frSophie.Cluet@inria.fr

Abstract
Wepresent
XyView
,apracticalsolu-
tionforfastdevelopmentofuser-(web
forms)andmachine-orientedapplications
(webservices)overarepositoryofhet-
erogeneousschema-freeXMLdocuments.
XyView
providesthemeanstoviewsucha
repositoryasanarraythatcanbequeried
usingaQBE-likeinterfaceorthroughsim-
pleselection/projectionqueries.Itex-
tendstheconceptofuniversalrelationsin
mainlytwoways:(i)theinputisnotare-
lationalschemabutapotentiallylargeset
ofXMLdataguides;(ii)theviewisnot
definedexplicitelybyauniquequerybut
implicitlybyvariousmappingssoasto
avoiddatalossandduplicatesgenerated
byjoins.DevelopedontopoftheXyleme
contentmanagementsystem,XyViewcan
easilybeadaptedtoanysystemsupport-
ingXQuery.
Keywords
:XMLviews,heterogeneousdataintegration,
applicationdevelopmenttools,universalrelation
1Introduction
Fordecades,companieshaveproduceddigital
datasuchasnotes,contracts,emails,progressre-
ports,minutes,etc.Thisdataconstituteamine
ofusefulinformationthatislargelyunexploited.
TheadventofXLMprovidestheopportunityto
changethat.Manyenterprisesarenowconsider-
ingstoringtheirhomedatainXMLrepositories
soastobeabletoquerytheminasignificantway,
i.e.,withtoolsmoresophisticatedthanfulltext
searchengines.Inthispaper,weareaddressing
theproblemofqueryingsuchrepositories.More

Gre´goryCoronaImenSebei
XylemeCNAM/CEDRIC
Paris,FranceParis,France
Gregory.Corona@xyleme.comimen.sebei@cnam.fr

precisely,weareinterestedindeveloping,easily
andquickly,simplequeryAPI(webservices)or
userinterfaces(webforms)overtheserepositories.
Animportantcharacteristicoftheapplications
weareconsideringisthattheydealwithlegacy
datathathavebeenmostlyproducedbyhuman
beingsusingstandardtexteditors.Asaresult,
thedatais(i)poorlytyped(wellformedrather
thanvalidXML)and(ii)highlyheterogeneous
(althoughdocumentshavestrongsemanticcon-
nections).Thesefeaturesareparticularlychal-
lengingsincetheycallforsophisticatedtoolsto
easetheapplicationprogrammertaskwhileatthe
sametimedisablingmostexistingapproaches.
Thesolutionweproposeborrowsfromtheuni-
versalrelationparadigmoftheseventies[18]:
XyView
providesthemeanstoeasilyviewasetof
heterogeneousXMLdocumentsasasinglearray
thatcanbequeriedthroughsimpleselectionsand
projections.Obviously,thecontextbeingXML,
thearraycontainsXMLsubtreesandisbuiltus-
ingXQuery.Butthefundamentaldifferencesbe-
tweenuniversalrelationsandourapproachare
thefollowing:

Thearrayisnotdefinedbyonequerybut
byaspecificationofhowasimpleselection-
projectionuserqueryistobetranslatedinto
anXQuery.
Thisdifferenceisimportant.Theproblem
withuniversalrelationsisthat,unlessthe
databaseschemahasparticularlyniceprop-
ertieswhichisrarelythecase,projectionop-
erationsgeneratemanyduplicatesthatare
notalwayseasytoremove.Thisisdueto
thejoinoperationsenteringthedefinitionof
theuniversalrelation.Alternatively,thejoin
operationscanalsobethecauseofmissing
information.Thisisusuallysolvedbyintro-

ducingouter-joinsbutatthecostofhaving
todealwithnullvalues.
Notethattheseproblemsofdatalossanddu-
plicatesmayoccuranytimeaviewisdefined
asastructuredquery(SQLorXQuery).
Ourapproachisnottodefinetheviewasa
querybutratherasavirtualsetofqueries
thataregeneratedontheflytofittheuser
currentrequirements.Inthisway,weavoid
incompleteorverboseanswers.

Todealwiththecomplexityoftheinput
data,wedefineviewsintwosteps.Thefirst
dealswithdataheterogeneityandsomehow
mapssemanticallyconnectedheterogeneous
documentsintoatargetstructure.Atrun
time,thisstepgeneratesunions.Thesecond
stepcorrespondstoastandardviewdefini-
tionwheredataisaggregated.Atruntime,
thisleadstojoins.
Somehowsimilartoageneralwrapper-
mediatorarchitecture,ourviewmodeladds
anintermediarylevelthat(i)stronglystruc-
turestheviewbyseparatingunionsfrom
joins,and(ii)provideshomogeneousXML
typingfortheuniversalrelationelements.
WeimplementedXyViewasasetoftoolson
topoftheXyleme[19]XMLrepository,butit
caneasilybeadaptedtoanysystemsupporting
XQuery.TheXyViewtoolscovertheviewdef-
initionprocessbutalsogenerationofwebform
applicationsandwebservices.Althoughitsex-
pressivepowerislimitedaswillbeexplainedin
thispaper,XyViewhasproveditsworthwithsev-
eralindustrialapplications.
Therestofthepaperisorganizedasfollows.
Thenextsectionpresentsanexampleapplication
scenariothatillustratestheproblemwearead-
dressing.Section3describestheXyViewmodel.
Section4explorestheexpressivenessandsome
moresubtlefeaturesofthemodel,thensection5
describestheXyViewsystemthatisbuiltontop
ofanXMLrepository.Thefinalsectionspresent
therelatedworkandexploresomefuturework.
2ExampleApplicationScenarioand
Motivation
Theexamplethatwepresenthereisadrasticsim-
plificationofareallifeapplication.Asportsnews

companyhandlesseveraltypesofnewswires.The
wiresarewellformedXMLdocuments,withno
globalschema,thathavebeenextractedfromtext
files.Thesefileshavebeeneditedbyvariouslo-
calcorrespondentsovertheyears,accordingto
thecompany(mostlyverbal)editingrecommen-
dations.Thewireshavedifferentstructures,de-
pendingonthesportandthekindofinformation
theycontain.
Forlackofspaceandeaseofunderstanding,
weshowhereonlytwosuchwiresaboutfoot-
ball(soccer)andinasimplifiedform.Thefirst
considersresultsfromnationalleagues(e.g.,Doc-
ument1),andthesecondresultsfrominterna-
tionalinter-countrygames(e.g.,Document2).
Thenewscompanywantstobuildanapplication
thatqueriesthroughsimplewebformsthevarious
footballresultswiresandasportsencyclopedia
withdetailedinformationaboutfootballplayers
(Document3).
Theapplicationmanipulatesdocumentswhose
structuresaresimilar,butnotnecessarilyidenti-
cal,toDocuments1,2and3.Notably,otherdoc-
umentsmayhavemoreorlessinformation.These
threekindsofdocumentsarestoredinasingle
XMLcontentmanagementsystemincollections
whoserespectiveidentifiersareNationalURI,In-
ternationalURIandEncyclopediaURI.
<!--Document1:Nationalleagueresult-->
<GameResult>
<WireHeading>...</WireHeading>
<Description>RealMadrid1-Valencia0</Description>
<Date>2004-05-22</Date>
<Team>
<Name>RealMadrid</Name>
<Scored>1</Scored>
<Scorer><Name>Zidane</Name>
<Goals>1</Goals>
</Scorer>
</Team>
<Team>
<Name>Valencia</Name>
<Scored>0</Scored>
</Team>
</GameResult>
<!--Document2:Inter-countrygame-->
<ResultDate="2004-03-15">
<Summary>France1-Spain1</Summary>
<Scorers>
<ScorerGoals="1">
<Name>Zidane</Name>
<Country>France</Country>
</Scorer>
<ScorerGoals="1">
<Name>Raul</Name>
<Country>Spain</Country>
</Scorer>
</Scorers>
</Result>

<!--Document3:Sportsencyclopedia-->
<Encyclopedia>
<Football>
<Player><Name>Zidane</Name>
<Biography>...</Biography>
</Player>
...</Football>
...</Encyclopedia>
Theapplicationqueries,asthoseinFigure1,
mayconcernfootballresults(Q
1
),playerbiogra-
phies(Q
2
),orboth(Q
3
).
Theseapparentlysimplequeriesareinfact
ratherhardtoprograminXQueryasillustrated
byFigures2,3and4(issuesregardingthetyping
ofresultsarediscussedinSection4,weassume
herethatqueriesreturnsimplestrings).First,
onemustfindwhatdocumentsareneededamong
thevariousdocumenttypesinthesystem,and
whataretheirunderlyingstructures(documents
maybeschema-free).Then,onemustunderstand
whataretheXMLelements(andtheiraccess
paths)involvedineachquery,andhowtocom-
binethemtoproducetheresult(e.g.Q
1
simply
involvesaunionwhileQ
3
involvestwojoinsand
aunion).Andlast,butnotleast,theapplication
queriesmustbecorrectlyexpressedinXQuery.
Q
1
:
“GamesinwhichZidanescoredmorethanonce”
Q
2
:
“ThebiographyofZidane”
Q
3
:
“Biographiesofscorersfromgameson2004-09-08”
Figure1:Samplequeries
Programmersofgraphicaluserinterfacesare
notdatabaseexperts.Theyareusuallymorecom-
fortablewithJava,servlets,stylesheets,etc.than
withdatabaseschemasorXQuery.Yet,theyhave
toprogram(andmodify)manyqueriestosat-
isfytheircustomersneeds.Ourobjectiveswith
XyViewistooptimizetheirproductivitybyal-
lowingthemtoviewthedatabaseassomething
assimpleasaqueryformconsistingoffieldsthat
canbeusedtofilterorextractdata.
Notethatthisisanoldidea,universalrelations
intheseventiesaddressedthesameproblem.The
databasewasviewedasasinglerelationqueried
usingsimpleselectionsandprojections.
Yet,thereisacrucialdifferencebetween
XyViewanduniversalrelationsastheywerede-
finedintheseventies.Asamatteroffact,XyView

union(
For$docincollection(
NationalURI
)
$var1in$doc/GameResult,
$var2in$var1/Team/Scorer,
$var3in$var1/Description
Where$var2/Nameftcontains’Zidane’and
$var2/Goals>1
Returnstring($var3),
For$docincollection(
InternationalURI
)
$var1in$doc/Result,
$var2in$a//Scorer,
$var3in$a/Summary
Where$var2/Nameftcontains’Zidane’and
$var2/@Goals>1
Returnstring($var3))
Figure2:QueryQ1inXQuery
differsfromanystandardviewmechanismrelying
onquerycomposition:
aXyViewviewisnot
definedbyaqueryandisnotequivalentto
aquery
.Inthenextsection,weexplainthisin
moredetails.Butletusfirstseewhydefininga
universalrelationusingaquerymaybeproblem-
.citaSo,weconsiderthataviewisdefinedbya
queryandthataqueryonaviewcorrespondsto
thecombinationoftwoqueries.Goingbacktoour
example,afirstinterestingexerciceistodefinethe
viewquerythat,combinedrespectivelywiththe
threeselection-projectionqueries
Q
1
-Q
3
,would
returnthequeriesofFigures2-4.Thisqueryhas
toprovideafullviewoverthevariousdocuments
structures.Thus,itwouldnaturallyfeature(i)
thejoinandunionoperationsofQuery
Q
3
,as
wellas(ii)avariableforeachinternalnodessoas
topreservetheconnexionbetweentheelements
belongingtothesamesubtree.Asaconsequence:

Thejoinoperationswouldmakeitimpossi-
bletoreturnthebiographyofplayerswho
arenotpartofsomegames,and,alterna-
tively,wouldleadtoreturningseveralbiogra-
phyoccurencesformostplayers.Information
losscanbesolvedbyintroducingouter-joins,
buttheygeneratenullvaluesandtheneedto
dealwiththem.Asforduplicates,theycan
beeliminatedbyintroducing
distinct
oper-
ations,but(i)itissometimesverydifficult
For$docincollection(
EncyclopediaURI
)
$var1in$doc/Football/Player,
$var2in$var1/Name,
$var3in$var1/Biography
Where$var2ftcontains’Zidane’
Returnstring($var3)
Figure3:QueryQ2inXQuery

union(
For$doc1incollection(
NationalURI
),
$var1in$doc1/GameResult,
$doc2incollection(
EncyclopediaURI
),
$var2in$doc2/Encyclopedia/Football/Player,
$var3in$var2/Biography
Where$var1/Date=xs:date(’2004-09-08’)and
$var1/Team/Scorer/Name=$var2/Name
Returnstring($var3),
For$doc1incollection(
InternationalURI
),
$var1in$doc1/Result,
$doc2incollection(
EncyclopediaURI
),
$var2in$doc2/Encyclopedia/Football/Player,
$var3in$var2/Biography
Where$var1/@Date=xs:date(’2004-09-08’)and
$var1//Scorer/Name=$var2/Name
Returnstring($var3))
Figure4:QueryQ3inXQuery
todistinguishbetweengoodandbaddupli-
catesand(ii)distinctoperationshaveacost
(notablywhenthedesiredorderisnotthat
requiredbythedistinctoperation).

Inasimilarway,butwithnoapparentrea-
sonnableXQuerysolution,variablesmaybe
responsibleforinformationloss.E.g.,con-
sideravariableontheinternalnoderep-
resentingscorers(requiredifonewantsto
queryboththeirnameandtheirgoals).If
itcannotbeinstanciated,thecorresponding
parentelementwouldbediscarded.Asare-
sult,gameswithnoscorerwouldbediscarded
fromallresults.
Finally,theexamplewegaveisrathersimple.
Inrealapplications,theviewdesignerhastoma-
nipulatemanymorestructures.Definingtheview
querythenbecomesneartoimpossibleforanon-
expert.Asfortheoptimizationofthegenerated
nestedqueries,thechancesarethatitwillbe
.roopThenextsectiondetailsthebasicXyView
model.Then,Section4addssomeexpressive
powertothissimplemodelandSection5briefly
describesthegraphicaltoolsthattheGUIpro-
grammerusestobuildhis/hersimplequeryenvi-
ronmentandtoeasetheGUIprogramming.
3TheXyViewmodel
InXyView,viewsaredefinedbyasetofmap-
pingsandjoinconditionsthatspecifyhowasim-
pleselection-projectionuserqueryistranslated
intoanXQuery.Thisapproachovercomesthe
problemsdiscussedabove.Uselessjoinsandvari-
ablesarenotconsideredatquerytranslation;the

viewdefinitionisequivalenttoavirtualsetofflat
andeasilyoptimizablequeriesthataregenerated
ontheflytofittheusercurrentrequirements.
Notably,giventheappropriatespecification,the
queriesofFigures2-4wouldbegeneratedatrun
timebyXyViewtoanswerQueriesQ
1
-Q
3
.
Thisresultsinasimplerdefinitionandmain-
tainanceofviews,usingintuitivegraphicaledi-
tors.Giventhecomplexityofviewqueriescaused
bydataheterogeneity,thisisacrucialadvantage
fortheviewdesigner.
Also,inordertocopewithheterogeneousdata,
XyViewaddsanintermediarylevelintheview
definitionprocess.Tothephysicalandview
schemas,weaddlogicalschemaswhosepurpose
istoprovidehomogeneitytosemanticallyrelated
data.Moreprecisely:
1.Thefirstleveldealswithschema-freedata,by
defining
physicaldataviews
thatsummarize
XMLaccesspathstousefulinformationin
documents.
2.Thesecondleveldealswithheterogeneity,
bydefiningintegrated
logicaldataviews
over
unions
ofphysicaldataviewswithsimilar
contents.
3.Thethirdleveldefinesthe
userdataview
as
joins
betweenthelogicaldataviews.
Figure5illustratesthisthreeleveldefinition.
Itisbuiltusingthesampledataintroducedin
Section2.
Ontherighthandsideisthe
userdataview
.
Itconsistsofasetofso-called
“concepts”
that
theuserwantstoquery.Conceptsare
typed
bytheviewdesigner.Forinstance
PlayerGoals
and
TeamGoals
areintegers,
GameDate
isoftype
date,theotherconceptsareconsideredasXML
strings(orelements,aswillbeexplainedinthe
nextsection).
Asisthecasewithuniversalrelations,the
querylanguagesupportedatthislevelconsistsof
selectionsandprojections.Forinstance,Query
Q
3
,thatreturnsbiographiesofscorersfromgames
on2004-09-08consistsofaselectionon
GameDate
=
2004-09-08
andaprojectionon
Biography
.
Onthelefthandsideofthefigure,arethe
phys-
icaldataviews(PDV)
.Theyrepresentthe
dataasitisstoredintherepository.Intheexam-
ple,therearethreephysicaldataviews(
National
,
International
and
Encyclopedia
),thefirsttworep-
resentingrespectivelylocalandinternationalsoc-

Physical Data ViewsLogical Data ViewsUser Data View
National
GameResult
DescriptionDateTeam
Game
emaGNameScoredScorerPlayer
DateDescriptionTeam
PlayerGoals
NameGoals
NameNbOfGoalsScorerBiography
International
ResultNameNbOfGoalsTeam
//TeamGoals
Date(@)SummaryScorer
emaGGoals(@)NameCountryGameDate
EncyclopediaEncyclopedia
GameDescription
EncyclopediaEncyclopedia
FootballFootball
PlayerPlayer
NameBiographyNameBiography

Figure5:FromTreestoTable
cergamesresults,theotherasportencyclopedia.notusefulfortheapplication.PDV
International
Thetreesare
datasummaries
,i.e.treesgath-containsanexampleofshortcut:element
Scorers
eringusefulaccesspathstodataelementsinthehasbeendiscardedfromthepathto
Scorer
,be-
XMLdocuments.SimilartoLoredataguides[8],causeitisuselessandremovingitintroducesno
theyaregeneratedbythesystemtocopewiththeambiguity;theedgeleadingto
Scorer
ismarked
factthatmanydocumentsaresimplywell-formedwith
//
.Thissimplificationprocesscangreatly
anddonotcomewithaDTDorXMLSchema.easetheviewdesignprocess,bykeepingonlyuse-
InXyleme,thesesummariesaregeneratedatfulaccesspathsfrompossiblycumbersomedocu-
loadingtime,thereisonesummaryperdistinctmentstructures.Intheexample,documentstruc-
rootelement.XyViewalsoprovidesatooltoex-turesaresimpleandthedifferencebetweenthe
tractthesesummariesfromagivensetofdoc-twotypesofsoccergamesresultislight.Inreal
uments.Inbothcases,weuseanincrementallife,structuresareoftenmuchmorecomplex.
algorithmthattakesalltheXMLpathsindocu-Inthecenterofthefigure,wehavegottenrid
mentsandextendsthedatasummaries(initiallyofthesoccergamesresultsheterogeneitybyin-
restrictedtotherootelements)witheventuallytroducingso-called
logicaldataviews(LDV)
.
newsubpaths.NotethatthealgorithmdoesnotLogicaldataview
Game
unifiesinasinglestruc-
careaboutdatatypes.Itistheviewdesignerwhoturegameresultsfromdocumentsdescribedby
associatestypestoviewconcepts.MorewillbePDVs
National
and
International
.Notethatthe
saidonthistopicinthesequel.secondLDV(
Encyclopedia
)isaduplicationofthe
Whendesigningtheview,onecaneditdatacorrespondingPDV.Inreallife,wedonotdupli-
summariestoremovebranchesthatareuselesscatedataviews,wedidithereforthesakeof
fortheapplicationortocreateshortcutsinlongclarity.
branchesbyusingadescendant(
//
)connectionWenowillustratehowonegoesfirstfromphys-
betweentwonodes.Thisiswhatwehavedoneinicaltologicalthentouserdataviewsandthe
theexample.E.g;,thesubtree
WireHeading
hasqueriesthatareassociatedwitheachlevel.Next,
beenremovedfromthestructureofDocument1,wefurtherdetailhowuserqueriesaretranslated
whilethestructureofDocument3onlyfeaturesintoqueriesagainsttherepository.
the
Football
element,othersportshavingbeendis-
carded.Also,the
Biography
elementisnotde-
tailedinthePDV,becauseitsinternalstructureis

3.1FromPhysicaltoLogicalDataViews
Beforewedetailthewayonegoesfromphysical
tologicaldataviews,letussayalittlemoreabout
physicalandlogicaldataviews.
Aphysicaldataviewconsistsofadata
summaryandasetofso-called
clusters
in
whichwefinddocumentsconformingtothesum-
mary(theremaybedocumentsconformingto
othersummariesaswell).Aclusteristheunitin
whichwestoredocumentsandprovidesanentry
pointintherepository.Aclustercanbequeried
inXQueryasacollectionofdocuments,byus-
ingthe
fn:collection
functionontheclusterURI.
SinceaPDVisdefinedoverasetofclusters,
queryingaPDVimpliesaunionoperationover
theseclusters.Forthesakeofclarity,inthefol-
lowingexamplesweconsiderasingleclusterfor
eachPDV.
Alogicaldataviewisanannotateddata
summary
.Theannotationsrepresentthecorre-
spondence(mappings)betweenphysicalandlog-
icaldataviews.ThisisillustratedinFigure6for
LDV
Game
andthe
National
and
International
PDVs.NotethattoeachnodeintheLDVdata
summaryisassociatedthesetofcorresponding
nodesinthephysicaldataviews.Tokeepthefig-
urereadable,onlymappingsforLDVnodes
Game
and
Date
areillustrated.
MappingsbetweenLDVsandPDVsarebased
oncorrespondencesbetweenLDVandPDVtree
nodes.Ifoneconsidersthatanodeinatreecan
easilybeidentifiedbyitspathfromtheroot,one
cannotethatthisapproachtorepresentingcor-
respondencebetweentreesisclosetothe
path-
to-pathmappings
usedin[4].AlthoughanLDV
nodecanbemappedtoseveralPDVnodes,we
imposethefollowing
restriction
:aLDVnodecan
bemappedto
atmostonenodeinthesamePDV
.
Thisrestrictioneliminatesanyambiguityingoing
fromLDVtoPDVforqueryprocessing.Thecon-
sequenceisthatanyqueryontheLDVistrans-
latedonthePDVinaunique(andeasytocom-
pute)way.Therestrictiontransformsthecorre-
spondencebetweenLDVsandPDVsintoa
DTD-
to-DTDmapping
,followingtheapproachescom-
paredin[4].Theadvantagesareprecision(norisk
ofincorrectcombinationsofPDVnodesatquery
translation)andfasttranslation.Itisalwayspos-
sibletorespecttherestrictionbycarefullychoos-
ingineachPDVatmostonenodewiththesame

Physical Data ViewsLogical DataViews
National
GameResult
DescriptionDateTeam
NameScoredScorer
Game
NameGoalsGame
International
DateDescriptionTeam
Result//
Date(@)SummaryScorerNameNbOfGoalsScorer
Goals(@)NameCountryNameNbOfGoals
Mappings
Game:Game −−> National:GameResult, International:Result
.
G
.
a
.
m
.
e
.
:
.
G
.
a
.
m
.
e
.
/D
..
ate −−> National:GameResult/Date, International:Result/Date
Figure6:Fromphysicaltologicaldataviews
meaning;documentstructuresnotrespectingthe
restrictioncanbesplitintoseveralPDVs.
Comparedtoclassicalquery-basedmethodsto
definecorrespondencesbetweenschemas,thesim-
plicityofournode-to-nodemappingsapproach
providesseveraladvantages.

Inmanycases,mappingscanbesemi-
automaticallygeneratedbyrelyingonthese-
manticscarriedbyasequenceoftags(see
[15,17]).

Theprocessofcreatingthesemappingscan
easilybesupportedbyagraphicalinterface.

Correspondencesbasedonnode-to-node
mappingsareeasiertomaintainthanquery-
basedones.

InSection4,wewillseethatsuchmappings
caneasilybeextendedinordertosupporta
richersemantics.
Aqueryagainstalogicaldataviewistrans-
latedstraightforwardlyintoaunionofqueries
againstitscorrespondingphysicaldataviews.As
willbeexplainedinSection3.3,eachmemberof
theunionisobtainedbytransformingpathsfrom
theLDVqueryintothecorrespondingpathsin
eachPDV.
3.2FromLogicaltoUserDataViews
Auserdataviewconsistsofasetoftyped
concepts,theircorrespondencewithnodes
inthelogicaldataviewsandasetofpred-
icatesthatareusedtojointhelogicaldata
views
(intheexample,asinglejoinpredicateis
defined).ThisisillustratedinFigure7.Notethat

Logical Data ViewsUser Data View
emaGGamePlayer
DateDescriptionTeamPlayerGoals
Biography
NameNbOfGoalsScorer
maeTNameNbOfGoalsTeamGoals
Encyclopedia
Game
Encyclopedia
FootballGameDate
PlayerGameDescription
NameBiography
Mappings
Player > Game:Game/Team/Scorer/Name
PlayerGoalEsn cycl>o pGeadmiae::EGnacymcel/oTpeeadmia//SFcooortebr/alNl/bPlOafyGeor/alNsame
.
Bi
.
o
.
gr
.
a
.
p
.
h
.
y
.

.

.
>
.

.
En
.
c
.
y
.
cl
.
o
.
pedia:Encyclopedia/Football/Player/Biography
snioJ(Game:Game/Team/Scorer/Name, Encyclopedia:Encyclopedia/Football/Player/Name, "=")
Figure7:Fromlogicaltouserdataview
eachconcepthasatleastamappingtosomeLDV
node,concept
Player
beingtheonlyonemapped
tobothLDVs.Thejoinpredicatespecifiesthe
joinedLDVnodesandthejoinoperator(’=’in
ourexample).Ifseveraljoinpredicatesconnect
twoLDVs,theglobaljoinconditionisthecon-
junctionoftheindividualpredicates.
Aqueryagainstauserdataviewconsistsof
asetofselectionpredicatesandasetofpro-
jectedconcepts.Asasimpleexample,consider
QueryQ
3
whosecorrespondinguserqueryisgiven
ontheleftsideofFigure8.
Wenowexplainindetailsthetranslationalgo-
rithmfromuserqueriestophysicalqueries,via
logicalqueries.
3.3MoreonTranslatingUserQueries
LetusnowconsiderQueryQ
3
asanexampleto
illustratethetranslationalgorithms.Itinvolves
ajoinbetweenthetwoLDVsinordertoreturn
thebiographiesofscorersfromgamesplayedon
2004-09-08.
Definition1
AuserqueryinXyViewhasthe
mrofQ:Select
c
1
,...,c
n
Where
cond
1
(c
c
1
)
and
...
and
cond
m
(c
cm
)
where
c
i
and
c
cj
,
i
=1,...,
n
,
j
=1,...,
m
,are
userdataviewconceptsand
cond
j
arepredicates
overasingleconcept,basedonapredefinedsetof
operators(’=’,’contains’,’
>
’,etc).

Figure8showsontheleftsidetheuserquery
Q
3
andillustratesstepbystepitstranslationinto
XQuery.Thetranslationalgorithmforauser
query
Q
consistsof
fivesteps
:
1.IdentifyLDVsandjoinsinvolvedinuser
query
Q
;
2.Produceatreerepresentationof
Q
byadding
queryannotationstotheLDVtrees;
3.ForeachLDVannotatedtree,findthesubset
ofPDVtreesthatmatch
Q
;
4.Generateallcombinationsofjoinsbetween
;sVDP5.GeneratethefinalXQuerybyunioningthe
combinationsofstep4.
1petSOneidentifiesthesetsofconcepts(C
Q
),LDVs
(L
Q
)andjoins(J
Q
)involvedin
Q
.Theyarethe
following:
SC
Q
=
{
c
1
,...,c
n
}{
c
c
1
,...,c
cm
}
SL
Q
=L
Qstrict
L
Qjoin
L
Qstrict
containsallLDVshavingatleasta
non-joinnodemappedtosomeconceptinC
Q
L
Qjoin
containsallLDVsnotinL
Qstrict
nec-
essarytoconnectbychainedjoinsLDVsfrom
L
Qstrict
J
Q
=
{
j
=((
l
1
,
path
1
),(
l
2
,
path
2
),
op
)
|
j
isajoin,
l
i

L
Q
,
path
i
isapathin
l
i
,i=1,2,
op
isthe
joinpredicate
}
Intheexample(Figure8,Step1),
C
Q
3
=
{
Biography
,
GameDate
}
L
Q
3
=
{
Game
,
Encyclopedia
}
,because
Game
hasanodemappedtoconcept
GameDate
and
Encyclopedia
hasanodemappedtoconcept
Biography
J
Q
3
=
{
((
Game
,Game/Team/Scorer/Name),
(
Encyclopedia
,Encyclope-
dia/Football/Player/Name),’=’)
}
includes
theonlyexistingjoin,becauseL
Q
3
containsboth
joinedLDVs.
NotethatthedefinitionofL
Q
issuchthatit
discardsuselessjoins.
2petSOneaddsqueryannotationstonodesofLDV
treesfromL
Q
.

User Query
Q3
Select
Biography
erehWGameDate=2004-09-08
Step 1Step 2Step 3Step 4Step 5
identify LDVs and joins in queryadd query annotations to LDVsfind PDVs matching the querygenerate and annotate combinations of PDV joinsgenerate XQuery
ConceptsLDV
Game
PDVs for LDV
Game
union (
1Biography
Game
NationalEncyclopedia
For
$doc1 in collection(NationalURI),
National
GameDate
GameResultEncyclopedia$var1 in $doc1/GameResult,
Date
. . .
Team
International
$doc2 in collection(EncyclopediaURI),
LDVs
$var2 in $doc2/Encyclopedia/Football/Player,
{= 2004-09-08}
both have mappings for
. . .
DateTeamFootball$var3 in $var2/Biography
Game
. . .
Scorerthe marked nodes
{= 2004-09-08}
Encyclopedia
Date and Name
. . .
ScorerPlayer
Where
$var1/Date = xs:date('2004-09-08') and
$var1/Team/Scorer/Name = $var2/Name
Joins
Name
(=)
Return
string($var3)
Game/Team/Scorer/Name =
Join (=)
NameName
Biography
,Encyclopedia/Football/Player/Name
LDV
Encyclopedia
PDVs for LDV
Encyclopedia
2
For
$doc1 in collection(InternationalURI),
Encyclopedia
InternationalEncyclopedia
$var1 in $doc1/Result,
Encyclopedia
ResultEncyclopedia$doc2 in collection(EncyclopediaURI),
Footballhas mappings for//$var2 in $doc2/Encyclopedia/Football/Player,
the marked nodesDate(@)
. . .
ScorerFootball$var3 in $var2/Biography
PlayerName and Biography
{= 2004-09-08}
Where
$var1/@Date = xs:date('2004-09-08') and
. . .
NamePlayer$var1//Scorer/Name = $var2/Name
(=)
Return
string($var3)
Name
Biography
Name
Biography)

Figure8:StepsfortranslatingQueryQ
3
Definition2
Thefollowingquerynodeannota-
correspondence.Forsimplicity,weconsiderstrict
tionsaredefinedforLDVnodes:
matchingintherestofthesection.

isProjected
,abooleanthatistrueiffthenode
Intheexample,weobtain
ismappedtoaprojectedconcept(c
1
,...,c
n
);
P
Q
3
,Game
=
{
National
,
International
}
and

condSet
,asetofconditionpredicatescom-
P
Q
3
,Encyclopedia
=
{
Encyclopedia
}
,becauseall
posedofpredicates
cond
j
of
Q
overaconcept
themarkednodesinStep2aremappedinto
mappedtothenode;
allthecorrespondingPDVs.Thisnotalways

isJoined
,abooleanthatistrueiffthenode
thecase;supposethatgamedatesarelacking
occursinJ
Q
.
fromPDV
National
,inthatcase
National
must
beremovedfromP
Q
3
,Game
,because
Date
isa
Definition3
ALDVtreenodeiscalled
amarkednodeinLDV
Game
.
markednode
if
isProjected
=trueor
condSet
6
=

or
isJoined
=true.
Step4
OnecomputesallthecombinationsComb
Q
of
Figure8,Step3showsqueryannotationsaddedjoinsbetweenPDVsfoundatStep3.
toLDVtreesforQ
3
.Theprojectednode,
Bi-
Notethatjoinsmayben-ary(inopposition
ography
(
isProjected
=true),isinboldfont;jointobinary),thismayoccurwhentheschemahas
nodes(
isJoined
=true)areconnectedthroughamorethantwoLDVs.
dashedline;andtheselectionnode(
condSet
6
=

)Foreach
comb

Comb
Q
,thePDVnodesget
isannotatedwiththesetofconditionpredicates.thesamequeryannotationastheLDVnodesto
Weremovednodesthatarenotmarkedandnotwhichtheyaremapped.APDVnodemappedto
involvedinthequery.noLDVnodegets
isProjected
=false,
condSet
=

and
isJoined
=false.
Step3
Also,foreach
comb

Comb
Q
,thesetJ
comb
Foreach
l

L
Q
,thesetofPDVsmatching
Q
isofjoinsonthePDVsisobtainedfromJ
Q
byre-
P
Q,l
=
{
p
|
p
isaPDV,

n
markednodeof
l

placingtheLDVnodesbythecorrespondingPDV

n
0
of
p
mappedto
n
}
nodes.
Inourexample,therearetwosuchcombina-
Thisisoneofthetwosemanticsimplementedtions,showninFigure8,Step4.
byXyView,the
strictmatching
.Itrequiresthat
allmarkednodesoftheLDVhaveacorrespon-
Step5
denceinthePDV.Theothersemantics,
relaxed
Foreach
comb

Comb
Q
,representingajoinbe-
matching
,allowsselectionandjoinnodeswithouttweenPDVs,a
For-Where-Return
queryisgener-

ated.ThefinalXQueryisobtainedbyunioning
allthesejoinqueries.
Thealgorithmforgeneratinga
For-Where-
Return
queryfromacombinationisdescribedin
Figure9.The
For/Where/Return
clausesare
concatenationsoftheclausesgeneratedforeach
individualPDV.Tothe
Where
clause,onemust
alsoconcatenatethejoinconditionsfromJ
comb
.
ForWhereReturn
(
comb
,J
comb
)

String
foreach
p
i

comb
repeat
forClause
i
=
For
(p
i
)
whereClause
i
=
Where
(p
i
)
returnClause
i
=
Return
(p
i
)
endfor
forClause=
concat
(forClause
1
,...)
whereClause=
concat
(whereClause
1
,...,
Join
(J
comb
))
returnClause=
concat
(returnClause
1
,...)
returnconcat
(’For’,forClause,
’Where’,whereClause,’Return’,returnClause)
end
ForWhereReturn
Figure9:For-Where-Returnalgorithm
Letusdescribenowthealgorithms
For
,
Where
and
Return
thatgeneratethecorre-
spondingclausesforasingleannotatedPDV.
The
For
clausedefinesvariablesandaccess
pathstoquerieddatainthePDV.Variablegen-
erationrespectsthefollowingrules:
Rule1
Avariableisdefinedforeachprojected
nodeinthePDV.
Rule2
ForanytwomarkednodesofaPDV,
thereisavariabledefinitionfortheirleastcom-
monancestorinthePDVtree.
Rule2ensurestheclosestpossiblecontextfor
theXMLelementsaddressedbythequery,i.e.
thosecorrespondingtomarkednodesinthePDV.
Forinstance,itensuresthatinqueryQ
3
thedate
andthescorernamebelongtothesamegame.
ConsideringtheannotatedPDV
National
inthe
firstcombinationinFigure8,Step4,thereare
onlytwomarkednodes
Date
and
Name
,noneof
themprojected.Then,theonlyvariableelement
definedonnationalgamesisthatof
GameResult
,
theirleastcommonancestor.
Thealgorithmforgeneratingthe
For
clausefor
asinglePDVispresentedinFigure10.Itadds
somenewqueryannotationstotreenodes:

variable
,astring(possiblynull)containing
thenameofthevariablegeneratedforthe
;edon•
markedAncestor
,aboolean,trueiffthe
node’ssubtreecontainsatleastamarked

.edonItalsoaddsthefollowingannotationforeach
:VDP•
variable
,thenameofthevariablegenerated
forthedocumentsthatmatchthePDV;

varNodeList
,thelistofvariablenodesinthe
PDV,followingtheorderofvariablesinthe
For
clause.
ForClause
(
pdv
)

String
pdv
.variable=
GenerateVar
()
pdv
.varNodeList=
VariableGen
(
pdv
.root)
forClause=
concat
(
pdv
.variable,’incollection(’,
pdv
.collectionURI,’)’)
foreach
n
i

pdv
.varNodeList
repeat
forClause=
concat
(forClause,’,’,n
i
.variable,
’in’,
AncestorVarAndPath
(n
i
,
pdv
))
endfor
return
forClause
end
ForClause
VariableGen
(n)

NodeList
foreach
n
i

n.children
repeat
varNodeList
i
=
VariableGen
(n
i
)
endfor
childrenVarNodeList=
concat
(varNodeList
1
,...)
if
n.isProjected
then
n.variable=
GenerateVar
()
n.markedAncestor=true
else
maChildren=
NbMarkedAncestor
(n.children)
n.markedAncestor=n.condSet
6
=

or
n.isJoined
or
maChildren
>
0
if
maChildren
>
1
then
n.variable=
GenerateVar
()
else
n.variable=
null
fidnefidneif
n.variable
6
=
nullthen
returnconcatList
([n],childrenVarNodeList)
elsereturn
childrenVarNodeList
fidneend
VariableGen
Figure10:“For”clausegenerationforaPDV
The
ForClause
functionusesthe
VariableGen
functiontoobtaintheorderedlistofvariable
nodesinthePDV.The
For
clausestartsbydefin-
ingthedocumentvariableiteratinginthecollec-
tionassociatedtothePDV.Allvariablenames
aregeneratedbycallstofunction
GenerateVar
,
thatreturnsadifferentstringateachcall.The
restofthe
For
clausedefinesvariablesforeach
variablenode.The
AncestorVarAndPath
function
searchesforthefirstvariableancestorofthenode,
thenreturnsthisvariableconcatenatedwiththe
pathfromthisancestortothenode.Ifnovari-
ableancestorexists,oneusesthePDVvariable
andthepathfromtheroottothenode.
The
VariableGen
functionbuildsthelistof
variablenodesinthesubtreeoftheparameter

node
n
,butalsoannotateswith
variable
and
markedAncestor
eachnodeinthesubtree.First,it
recursivelybuildsthevariablenodelistsforeach
childof
n
,thenconcatenatestheselists.Then,it
mustdecideif
n
isavariablenodeornot;ifnot,
theresultistheconcatenatedlistfromchildren,
else
n
isaddedinfrontofthislist.Thisproduces
aconsistentorderforthe
For
clause,becausea
nodeisalwaysplacedbeforeitsdescendants.
Rules1and2areusedtodecideif
n
isavari-
ablenode.Thisistrueeitherif
n
isprojected,or
ifithasatleast2childrenbeing
markedAncestor
.
Inthelattercase,itiseasytodemonstratethat
n
istheleastcommonancestorofmarkednodes
fromthesubtreesofthesechildren.Function
NbMarkedAncestor
returnsthenumberofnodes
being
markedAncestor
intheparameterlist.Also,
n
isitself
markedAncestor
ifitisprojectedorifit
hasatleastonechildbeing
markedAncestor
.
Notethatthealgorithmdoesnotgenerateuse-
lessvariables,onlymarkednodes(i.e.neededin
theuserquery)areconnectedthroughvariables
onthelastcommonancestor.
Weendthissectionbybrieflydescribingalgo-
rithmsfortheotherclauses.Thealgorithmfor
the
Where
clauseproducesa
conjunctive
condi-
tion.IttakesallthePDVnodeswith
condSet
6
=

andgeneratesaconditionpredicateonthenode,
foreachelementof
condSet
.Thenodeisidenti-
fiedbythepathfromitsfirstvariableancestor.
Asimilaralgorithmisusedtogeneratejoincon-
ditionsinthe
Where
clause;thedifferenceisthat
joinpredicatesconcerntwonodes,notone.
Notethatthetypesoftheviewconceptsare
usedtogeneratewell-typedconstantsinthe
Where
clause.Forinstance,becauseconcept
GameDate
isoftypedate,condition
GameDate=
2004-09-08
inQueryQ
3
istranslatedinto
...=
xs:date(’2004-09-08’)
.Notethatthepreviousex-
pressionmayproduceanerroriftheelementvalue
typeisnotcompatiblewiththeconstanttype.
Inrealityweuseadedicatedcomparisonfunc-
tionthatchecksalsotypecompatibility(return-
ing
false
iftypesarenotcompatible).Forthesake
ofclarity,weusethesimpleformofconditionsin
thispaper.
The
Return
clausedescribesthequeryresult,
usingthevariablesofprojectedPDVnodes.The
Return
clausedeterminestheXMLtypeofthe
result.SeveralchoicesarepossibleinXyView;
theyarediscussedinthenextsection.Here,we

transformthebiographyelementstoreturnsim-
plestrings.
ThefinalXQuerycorrespondingtoQ
3
isshown
inFigure8,Step5.
4DeeperinsideXyView
Sofar,wehavepresentedviewsasprovidingaflat,
relational-like,representationofarbitraryXML
trees.Theflatteningisperformedbyaccess-
ingnodesthroughpathexpressions(preserving
thenodesdependency)andapplyingtheXQuery
string()
operationontheprojectednodes.The
transformationfromlogicaltouserdataviews
thencorrespondstoasimplesequenceofjoinop-
erationsbetweenresultsofpathexpressionsfol-
lowedbyprojection/mapoperations.Themain
differencewithastandardviewmechanismisthat
theviewqueryisnotdefined
apriori
butrather
inanopportunisticway,dependingontheuser
query,soastoavoidduplicatesandinformation
lossthatwouldbegeneratedbyunecessaryjoins.
Inthetransformationfromphysicaltological
dataviews,joinsarereplacedbyunions.
Notethatifaprojectedconcepthasanatomic
typeotherthanstring,acastoperatorisapplied
totheflattenstringvalue.Forinstance,when
concept
PlayerGoals
,oftypeinteger,isprojected,
the
Return
clausehastheform:
Returnxs:integer(string($var))
.
Toavoiderrorsproducedbycastoperations,
weuseinrealityadedicatedfunctionthatchecks
iftheconversionispossibleandifnotreturnsthe
inputstringvalue.Forthesakeofclarity,weuse
onlythecastoperatorhere.
Wenowaddresstwoissuesconcerning(i)the
possibilitytoreturntreesratherthanflatresults
and(ii)theadditionofsomeexpressivepower.
4.1TreeResults
Therearethreepossibilitiestoreturntreerather
thanflatresults.Thefirstandsimplestonecon-
sistsintypingresultsaccordingtothePDVs,i.e.,
returningtreesastheyarestoredinthereposi-
tory.Inotherwords,thetreesareflattenedfrom
theirrootstotheprojectednodes,theprojected
nodesarereturnedastheyare.Notethatthis
solutionleadstoheterogeneousresults.
Forinstance,considertheexampleofQuery
Q
1
,modifiedtoaskforallthegameinformation
(concept
Game
),insteadofonlythegamede-