La lecture en ligne est gratuite
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
Télécharger Lire

Introduction to Hadoop C++ Extension

De
16 pages

Introduction to Hadoop C++ Extension肖 康xiaokang@baidu.comOutlineBig PictureWhy HCEHCE ImplementationHCE UsageHCE ReferenceOther WorksBaidu StatisticsCurrent>10 cluster, 4000 nodeslargest cluster: 1000 nodes8 core/16GB/12*1TB per nodedata per day: >3PBjobs y: >3wSoon>10000 nodesdata per ...

Publié par :
Ajouté le : 16 septembre 2011
Lecture(s) : 193
Signaler un abus
Introduction to Hadoop C++ Extension
肖 康 xiaokang@baidu.com
Outline
Big Picture
Why HCE
HCE Implementation
HCE Usage
HCE Reference
Other Works
Baidu Statistics
Current
>10 cluster, 4000 nodes
largest cluster: 1000 nodes
8 core/16GB/12*1TB per node
data per day: >3PB
jobs per day: >3w
Soon
>10000 nodes
data per day: 10PB
Big Picture
Algorithm Description Layer
Computing Model
Classification Clustering
Vector Regression
Scheduling Layer (HPC Scheduler, Agent)
Communication Intensive – HPC …
SQL-like Representation Layer
Computing Model
MapReduce
DAG
Scheduling Layer (DC Scheduler, Agent)
Data & Computing Intensive DC...
Computing Resource Management Layer
Why HCE
Current API
java
streaming/ bistreaming
pipes
Why HCE
java language efficiency
sort, compress/decompress
C++ 10% ~ 40% improvement
java memory control
full featured C++ API
HCE Implementation
Data
Data
HCE Implementation
HCE Usage
basic interface
setup(), cleanup(), map(), reduce() is not optional return 0 for success
emit() for output K/V, TaskContext for conf and counter
HCE Usage
wordcount map
HCE Usage
wordcount reduce
HCE Usage wordcount run
Partitioner
Combiner
Out
_ $HADOOP HOME/bin/hadoop hce \ -mapper wordcount-demo \ -reducer wordcount-demo \ -file ./wordcount-demo \ -jobconf mapred.reduce.tasks=1 \ _ -input /user/test/sample input \ _ -output /user/test/sample output
RecordWriter
RecordReader