Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop

40 pages

English

Facebook's Petabyte Scale Data Warehouse using Hive and Hadoop

mtoledan - Lucy Chung

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

40 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

Facebook’s Petabyte Scale Data Warehouse using Hive and HadoopWednesday, January 27, 2010Why Another Data Warehousing System?Data, data and more data200GB per day in March 2008 12+TB(compressed) raw data per day todayWednesday, January 27, 2010Trends Leading to More Data Wednesday, January 27, 2010Trends Leading to More Data Free or low cost of user servicesWednesday, January 27, 2010Trends Leading to More Data Free or low cost of user servicesRealization that more insights are derived fromsimple algorithms on more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataLimited Scalability does not support trends towards more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataLimited Scalability does not support trends towards more dataClosed and Proprietary SystemsWednesday, January 27, 2010Lets try Hadoop… Pros– Superior in availability/scalability/manageability– Efficiency not that great, but throw more hardware– Partial Availability/resilience/scale more important than ACID Cons: Programmability ...

Sujets

Facebook

Hadoop

MapReduce

Informations

Publié par	mtoledan
Publié le	04 août 2011
Nombre de lectures	112
Langue	English

Extrait

Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop

Wednesday, January 27, 2010

Why Another Data Warehousing System?

ednesday,January27,2010

Data, data and more data 200GB per day in March 2008 12+TB(compressed) raw data per day today

Trends Leading to More Data



y 2

7, 2

Trends Leading to More Data

Wednesday, January 27, 2010

Free or low cost of user services

Trends Leading to More Data

ednesday,Janu

Free or low cost of user services

Realization that more insights are derived from simple algorithms on more data

ary27,2010

Deficiencies of Existing Technologies

Wednesday, January 27, 2010

Deficiencies of Existing Technologies

ednesda

Cost of Analysis and Storage on proprietary systems does not support trends towards more data

y,January27,2010

Deficiencies of Existing Technologies

ednesda

Cost of Analysis and Storage on proprietary systems does not support trends towards more data

y,January

Limited Scalability does not support trends towards more data

27,2010

Deficiencies of Existing Technologies

ednesda

Cost of Analysis and Storage on proprietary systems does not support trends towards more data

y,January

Limited Scalability does not support trends towards more data

27,2010

Closed and Proprietary Systems

Lets try Hadoop



edne

Pros – Superior in availability/scalability/manageability – Efficiency not that great, but throw more hardware – Partial Availability/resilience/scale more important than ACID

Cons: Programmability and Metadata – Map-reduce hard to program (users know sql/bash/python) – Need to publish data in well known schemas

Solution: HIVE

sday,January27,2010

What is HIVE?



edne

A system for managing and querying structured data built on top of Hadoop – Map-Reduce for execution – HDFS for storage – Metadata in an RDBMS

Key Building Principles: SQL as a familiar data warehousing tool – – Extensibility – Types, Functions, Formats, Scripts – Scalability and Performance – Interoperability