16 pages

English

Introduction to Hadoop C++ Extension

mtoledan - Xiaokang

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

16 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

Introduction to Hadoop C++ Extension肖康xiaokang@baidu.comOutlineBig PictureWhy HCEHCE ImplementationHCE UsageHCE ReferenceOther WorksBaidu StatisticsCurrent>10 cluster, 4000 nodeslargest cluster: 1000 nodes8 core/16GB/12*1TB per nodedata per day: >3PBjobs y: >3wSoon>10000 nodesdata per ...

Informations

Publié par	mtoledan
Publié le	16 septembre 2011
Nombre de lectures	193
Langue	English

Extrait

Introduction to Hadoop C++ Extension

肖康 xiaokang@baidu.com

Outline

Big Picture

Why HCE

HCE Implementation

HCE Usage

HCE Reference

Other Works

Baidu Statistics

Current

>10 cluster, 4000 nodes

largest cluster: 1000 nodes

8 core/16GB/12*1TB per node

data per day: >3PB

jobs per day: >3w

Soon

>10000 nodes

data per day: 10PB

Big Picture

Algorithm Description Layer

Computing Model

Classification Clustering

Vector Regression

Scheduling Layer (HPC Scheduler, Agent)

Communication Intensive – HPC …

SQL-like Representation Layer

Computing Model

MapReduce

DAG

Scheduling Layer (DC Scheduler, Agent)

Data & Computing Intensive – DC...

Computing Resource Management Layer

Why HCE

Current API

java

streaming/ bistreaming

pipes

Why HCE

java language efficiency

sort, compress/decompress

C++ 10% ~ 40% improvement

java memory control

full featured C++ API

HCE Implementation

Data

HCE Implementation

HCE Usage –

basic interface

setup(), cleanup(), map(), reduce() is not optional ， return 0 for success 。

emit() for output K/V, TaskContext for conf and counter

HCE Usage –

wordcount map

HCE Usage –

wordcount reduce

HCE Usage – wordcount run

Partitioner

Combiner

Out

_ $HADOOP HOME/bin/hadoop hce \ -mapper wordcount-demo \ -reducer wordcount-demo \ -file ./wordcount-demo \ -jobconf mapred.reduce.tasks=1 \ _ -input /user/test/sample input \ _ -output /user/test/sample output

RecordWriter

RecordReader