Conditional Random Fields for XML Trees
8 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Conditional Random Fields for XML Trees

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
8 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Niveau: Supérieur, Doctorat, Bac+8
Conditional Random Fields for XML Trees Florent Jousse1, Remi Gilleron2, Isabelle Tellier2, and Marc Tommasi2 INRIA Futurs and Lille University, LIFL, Mostrare Project , Abstract. We present xml Conditional Random Fields (xcrfs), a frame- work for building conditional models to label xml data. xcrfs are Con- ditional Random Fields over unranked trees (where every node has an unbounded number of children). The maximal cliques of the graph are triangles consisting of a node and two adjacent children. We equip xcrfs with efficient dynamic programming algorithms for inference and param- eter estimation. We experiment xcrfs on tree labeling tasks for struc- tured information extraction and schema matching. Experimental results show that labeling with xcrfs is suitable for these problems. 1 Introduction We address the task of labeling xml documents with Conditional Random Fields (crfs). Many different problems in information science, such as information extraction, data integration, data matching and schema matching, are performed on xml documents and can be dealt with using xml labeling. Lafferty et al have introduced crfs in [LMP01]. A crf represents a con- ditional distribution p(y|x) with an associated graphical structure. crfs have been successfully used in many sequence labeling tasks such as those arising in part-of-speech tagging [SRM04], shallow parsing [SP03], named entity recog- nition [ML03] and information extraction [PMWC03,SC04]; for an overview

  • xml trees

  • lsd system

  • tree-shaped graphical

  • graphical structure

  • nodes

  • fields over

  • dom tree

  • training

  • valued parameter


Sujets

Informations

Publié par
Nombre de lectures 25
Langue English

Extrait

1
Conditional Random Fields for XML Trees
1 2 2 2 FlorentJousse,R´emiGilleron,IsabelleTellier,andMarcTommasi
INRIA Futurs and Lille University, LIFL, Mostrare Project http://www.grappa.univlille3.fr/mostrare 1 2 jousse@grappa.univlille3.fr, first.last@univlille3.fr
Abstract.We presentxmlConditional Random Fields (xcrfs), a frame work for building conditional models to labelxmldata.xcrfs are Con ditional Random Fields over unranked trees (where every node has an unbounded number of children). The maximal cliques of the graph are triangles consisting of a node and two adjacent children. We equipxcrfs with efficient dynamic programming algorithms for inference and param eter estimation. We experimentxcrfs on tree labeling tasks for struc tured information extraction and schema matching. Experimental results show that labeling withxcrfs is suitable for these problems.
Introduction
We address the task of labelingxmldocuments with Conditional Random Fields (crfs). Many different problems in information science, such as information extraction, data integration, data matching and schema matching, are performed onxmldocuments and can be dealt with usingxmllabeling. Laffertyet alhave introducedcrfs in [LMP01]. Acrfrepresents a con ditional distributionp(y|x) with an associated graphical structure.crfs have been successfully used in many sequence labeling tasks such as those arising in partofspeech tagging [SRM04], shallow parsing [SP03], named entity recog nition [ML03] and information extraction [PMWC03,SC04]; for an overview, see Sutton and McCallum’s survey [SM06]. The idea of definingcrfs for tree structured data has shown up only recently. Basically, the propositions differ + in the graphical structure associated with thecrf02], the outputs. In [RKK variables are independent. Other approaches such as [CC04,Sut04] define the graphical structure on rules of contextfree or categorial grammars. Viola and Narasimhan in [VN05] consider discriminative contextfree grammars, trying to combine the advantages of nongenerative approaches (such ascrfs) and the readability of generative ones. All these approaches apply to ranked rather than unranked trees. As far as we know, their graphical models are limited to edges. We developxcrfs, a new instance ofcrfs that properly accounts for the inherent tree structure ofxmldocuments. In anxmldocument, every node has an unlimited number of ordered children, and a possibly unbounded number of unordered attributes. The graphical structure forxcrfs is defined by: for ordered (parts of the) trees, the maximal cliques of the graph are all triangles
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents