A Dynamic Approach to Dimensionality Reduction in Relational Learning
14 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

A Dynamic Approach to Dimensionality Reduction in Relational Learning

-

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
14 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Niveau: Supérieur, Doctorat, Bac+8
A Dynami Approa h to Dimensionality Redu tion in Relational Learning Eri k Alphonse and Stan Matwin LRI - Bt 490 Universit Paris-Sud 91405 ORSAY CEDEX falphonse,stanglri.fr Abstra t. This paper argues that in order to perform data mining on large relational databases with multiple tables, one needs to go be- yond the traditional attribute-value learning (AVL) te hniques. Indu tive Logi Programming lifts the expressivity to the level of rst-order logi , well-suited for this task. Several subsets of FOL with dierent expressive power have been proposed in ILP. The Datalog language is expressive enough to represent realisti learning problems when data is given di- re tly in a multi-relational database. The diÆ ulty lies in the fa t that the more expressive the hypothesis language the learner works with, the more riti al the dimensionality of the learning task. The dimensionality problem, addressed for de ades in Ma hine Learning, is typi ally ta kled by Feature Subset Sele tion (FS) te hniques. The idea of re-using these te hniques in ILP runs immediately into a problem as examples have variable size and do not share the same set of literals. The long-term goal of this resear h is to develop tools that will s ale up the ILP learn- ers to make them usable on realisti data mining tasks presented by the KDD ommunity.

  • kling large-size

  • problem

  • relational database

  • substitution

  • riti al

  • propositional logi

  • mat hing

  • large relational

  • when sear


Sujets

Informations

Publié par
Nombre de lectures 15
Langue English

Extrait

A


h
the
is
tation
to
data
Dimensionalit
el
y
an

represen
in

Relational
e
Learning
is
Eric
relational
k
the
Alphonse
ask
and
Most
Stan
el,
Mat
tation.
win
y
LRI
brings
-
as
Bt
b
490
d
Univ
whic
ersit
paradigm
P
In
aris-Sud
Mining
91405
y
ORSA
ed
Y
attribute-v
CEDEX
tit
f
alen
alphonse,stan
to
g
use

tasks

unit
This
the
pap

er
in
argues
Datalog.
that
the
in
ulti-instance
order

to
lter,
p
the
erform
examples
data
implemen
mining
and
on
m
large
oking
relational
of
databases

with
eld
m
to
ultiple
data
tables,
the
one
w
needs
VL),
to
the
go
t
b
represen
e-
ositional
y
require
ond
ey
the
t
traditional
tations
attribute-v
o
alue
ted
learning
KDD
(A
.
VL)
ose

paradigm
hniques.
eature

the
e
ILP

at
Programming
e
lifts
main
the
appro
expressivit
relational
y
a
to
a
the
for
lev
The
el
as
of

rst-order
prior

del
w
outputs
ell-suited

for
literals.
this
of
task.
prop
Sev
applied
eral

subsets
domain.
of

F
k
OL
y
with
and
dieren
the
t

expressiv
should
e
next.
p
oin
o
of
w
of
er
the
ha
wledge
v
metho
e
the
b
uses
een
language
prop

osed
describ
in
single
ILP
.
.

The
A
Datalog
is
language
to
is
There
expressiv
reasons
e

enough
ok
to
this
represen
imp
t
for

relational
learning
that
problems
est
when
mining
data
presen
is
b
giv
the
en

di-
y

W
in
prop
a
here
m
rst
ulti-relational
that
database.
F
The
Subset
Æ
to
y
lev
lies
of
in
,
the
languages

least
that
expressiv
the
as
more
The
expressiv
idea
e
to
the
ximate
h
original
yp
problem
othesis
y
language
m
the
problem,
learner
represen
w
suitable
orks
FS
with,
hniques.
the
metho
more
acts

a
the
prepro
dimensionalit
the
y
data,
of
to
the
mo
learning
building,
task.
h
The
relational
dimensionalit
with
y
relev
problem,
t
addressed
An
for
tation

the
in
is

osed
hine

Learning,
to
is
bio
t

ypically
utagenesis

1
kled
tro
b
Lo
y

F
at
eature
man
Subset


KDD
(FS)
Data

in
hniques.
last
The
some
idea
hers
of
where
re-using
the
these
go

Man
hniques
p
in
t
ILP
limitations
runs
the
immediately
tations
in
the
to
and
a
deriv
problem
kno
as
in
examples
existing
ha
ds.
v
of
e
existing
v
ork
ariable
an
size
alue
and
(A
do
i.e.
not
h
share
item
the
es
same
same
set
en
of
y
literals.
A
The
the
long-term
lev
goal
this
of
VL
this
tation

equiv
h
t
is
prop
to

dev
are
elop
elling
to
that
ols
the
that
hers
will
lo
scale
b
up
ond
the
represen
ILP
An
learn-
ortan
ers
one
to
the
mak
of
e
represen
them
is
usable
KDD
on
b

understo
data
din
v
oking
ILP
er
b
the
en
database
pattern,

ulates
text.
similar
In
y
KDD,
v
ev
initial
en
The
more
oin
than
othesis
in
of

ev
hine
hange
Learning,
a
it
to
is
W
natural
set
to
the
p
b
erform
information

[10],
e
othesis
disco
b
v
A
ery
runs
w
alues
orking
though,
on
attributes
data
v
deriv
that
ed
has

The
from
a
relational
relational
databases.
[8,
In
of
this
deal


text,
examples
the
m

eral
of
m
foreign
W
k
wing
eys
that
requires
limit
the
a
use
b
of
data-driv
a
FS
relativ
ev
ely
in
expressiv
a
e
rely
represen
set
tation,


lev
h
no
as
en
Datalog
example,
[11,
ariable
3].
w
In
problem

where
hine
an
Learning,
an
the
this
idea
ltering
of
the

redescrib
general
ect
kno
a
wledge
tativ
from
xed
examples
used
in
an
First
tation,
Order
ev

literals,
(F
Finally
OL)
lter
has
of
b
ization,
een
estigated
kno
26,
wn
learning
as
at

to
e
lik

F
Programming

for
e
the
usual
last
h
10
h
y
[16]
ears.
h
Man
needs
y

ac
a
hiev

emen
the
ts
ds
ha
Ho
v
the
e
erforming
b
ILP
een
in

All
but
metho

the
hers
a
ha
attributes
v
their
e
ILP
no
to
w
of
realized
there
that
set
there
a
exists
literals
a
example

examples
hotom
a
y
um
b
literals.
et

w
a
een
ould
expressiv
relational
eness
h
and
its
e-
literals

what
[19].
in
One
idea
of
er
the

main
example
Æ
Indeed,
that
as
prev
e
en
the
ts
with
ILP
this
from
will

ulti-instance
kling
rep-
large-size
of
problems
where
t
of
ypical
pattern
of
the
KDD
attributes.
applications
algorithm
is
this
the
e
dimensionalit
able
y
the
of
the
the
is,
h
en
yp

othesis
turn
space,
whole
whic

h
tation,
is
prop

egins
larger
in
than
y
in
hers
A
It
VL.
F
Ev
as
en
one,
more
all
imp
resp
ortan
pattern
tly
w
,
to
the
out

ollo
v
F
erage
urnkranz
test
w
(or
argue

the
e
static
query

problem
to
in
the
relational
yp
database
space
terminology)
through
in

v
yp
olv
bias
es
to

e
matc
ted
hing
y
of

F
en
OL
h
form
to
ulae

represen
metho
ting
in
the
VL.
h
w
yp
er,
otheses
idea
against
p
the
FS
training
an
examples.
setting
Since
immediately
this
to
t
problem.
yp
data-driv
e
FS
of
ds
matc
on
hing
v
is
of
NP-complete,
xed

of
v
to
erage
aluate
test-
relev
ing,
In
in
,
the
due
w
the
orst
el

expressivit
is
,
exp
is
onen

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents