MPI and MapReduc eCCGSC 2010 Flat Rock NC Seembe 8 2010 Geoffrey Fox gcf@indiana.edu h.p://www.infomall.o rgh.p://www.futuregrid.o rgDirector, Digital Science Center, Pervasive Technology Institute Associate Dean for Research and Graduate Studies, School of Informatics and Computing Indiana University Bloomington r ptMapReduce Map(Key, Value) A hash function maps the results of the map Reduce(Key, List) tasks to reduce tasks • Implementa;ons (Hadoop – Java; Dryad – Windows) support: – SpliH'g of data with customized file systems – Passing the output of map func;ons to reduce func;ons – Sor;ng the inputs to the reduce func;on based on the intermediate keys – Quality of service • 20 petabytes per day (on an average of 400 machines) processed by Google using MapReduce September 2007 stupO RecudePnsoititra atDaMapReduce “File/Data Repository” Parallelism Map = (data parallel) computa;on reading and wri;ng data Redce = Collec;ve/Consolida;on phase e.g. forming mul;ple Instruments global ...
MPI
and
MapReduc
e
CCGSC
2010
Flat
Rock
NC
Seembe
8
2010
Geoffrey Fox
gcf@indiana.edu
h.p://www.infomall.o
rgh.p://www.futuregrid.o
rg
Director, Digital Science Center, Pervasive Technology Institute
Associate Dean for Research and Graduate Studies, School of Informatics and Computing
Indiana University Bloomington
r ptMapReduce
Map(Key,
Value)
A
hash
function
maps
the
results
of
the
map
Reduce(Key,
List)
tasks
to
reduce
tasks
• Implementa;ons
(Hadoop
–
Java;
Dryad
–
Windows)
support:
– SpliH'g
of
data
with
customized
file
systems
– Passing
the
output
of
map
func;ons
to
reduce
func;ons
– Sor;ng
the
inputs
to
the
reduce
func;on
based
on
the
intermediate
keys
– Quality
of
service
• 20
petabytes
per
day
(on
an
average
of
400
machines)
processed
by
Google
using
MapReduce
September
2007
stupO Recude
Pnsoititra atDaMapReduce
“File/Data
Repository”
Parallelism
Map
=
(data
parallel)
computa;on
reading
and
wri;ng
data
Redce
=
Collec;ve/Consolida;on
phase
e.g.
forming
mul;ple
Instruments
global
sums
as
in
histogram
MPI
or
IteraNve
MapReduce
Communication
Map
Reduce
Map
Reduce
Map
Poal
Redce
Map Map Map /Uses
Disks 1
2
3
r
us tr
uTypical
ApplicaNon
Challenge
:
DNA
Sequencing
Pipeline
MapReduce
Pairwise
clustering
Dissimilarity Sequence Visualization block FASTA File Blocking Matrix Alignment/ MPI
Pairings N Sequences Assembly
N(N-1)/2 values
MDS
Read
Alignment
Illumina/Solexa
Roche/454
Life
Sciences
ABiosystpplieems /d
SOLiD
Internet
Modern
Commercial
Gene
Sequencers
Linear
Algebra
or
ExpectaNon
MaximizaNon
based
data
mining
poor
on
MapReduce–
equivalent
to
using
MPI
wri;ng
messages
to
disk
and
restar;ng
processes
each
step/itera;on
of
algorithm
Metagenomics
This
visualizes
results
of
dimension
reducn
to
3D
of
30000
gene
sequences
from
an
environmental
sample.
The
many
different
genes
are
classified
by
clustering
algorithm
and
visualized
by
MDS
dimension
reducn
All-‐Pairs
Using
MPI
oDrayrd
LINQ
125
million
distances
4
hours
&
46
minutes
20000
DryadLINQ
MPI
15000
10000
5000
0
35339
50000
Calculae
Pt airwise
Diats ces
(n Smiht
Waermat
Gon hot )
• Calculate
pairwise
distances
for
a
collec;on
of
genes
(used
for
clustering,
MDS)
• Fine
grained
tasks
in
MPI
• Coarse
grained
tasks
in
DryadLINQ
• Performed
on
768
cores
(Tempest
Cluster)
MoreH ,
C.,
Bui,
H.,
Hollingsworth,
K.,
RichT,h
aiB.n,
FDl.y
n(n2,
00P9.),.
&
All-‐Pairs:
An
Abstrac;on
for
Data
Intensive
Compu;ng
on
Campus
Grids.IE
EE
Transacons
on
Parallel
and
Distributed
Sys
t,
ems21,
21-‐36.
Smith
Waeman
MPI
Dray dLINQ
Had
poo
0.025
0.020
0.015
0.010
Hadoop
SW-‐G
0.005
MPI
SW-‐G
DryadLINQ
SW-‐G
0.000
10000
20000
30000
40000
No.
of
Sequences
Hadoop
is
Java;
MPI
and
Dryad
are
C#
Time
per
Actual
CalculaNon
(ms)
r tTwister(MapReduce++)
Pub/Sub
Broker
Network
M Map
Worker
• Streaming
basedco
mmunica;on
Woke
Nde
Redce
Wke
D D • Intermediate
results
are
directly
MR
Use
transferred
from
the
map
tasks
to
the
Drier
v Prgro am
MRDeamon
reduce
tasksel
–
imian est
lco al
files
D
• Cacheable
map/reduce
tasks
Daat
Read/Wrie
t • Sta;c
data
remains
in
memory
• Combine
phase
to
combine
reduc;ons
Commuin caN6
n • User
Program
is
thcome
e
of
File
System
MapReduce
computa;ons
Daat
Slp i
t
• Exends
the
MapReduce
model
to
iteaNe
comaN
Iteae
StaNc
Cofigun e(r )
daa
Use
Prgro am
Map(Key,
Value)
δ
flow
Reduce
(Key,
List)
Combine
(Key,
List)
Clse(o )
Dierff en
st chny inoraz N6
an dn
iertn cmmuo in caN6
n
mechain mss
ueds
b
ty he
paar llel
ruesn
r
t
t r
ons put v r
t
r pos
r
r or u s o r rIteaNe
and
non-‐IteaNe
ComaNons
K-‐mea
Smith
Waeman
is
a
non
iteaNe
cae
ad
f
ce
e
Performance
of
K-‐Means
nfi unrs ours o n s
v r r t
ns
put v r v rMatrix
Mul;plica;on
64
cores
Square
blocks
Twie
Ro/Cl
decom
Twie
Square
blocks
OpenMPI
r st
p o w
r st