La lecture en ligne est gratuite
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
Télécharger Lire

Facebook's Petabyte Scale Data Warehouse using Hive and Hadoop

De
40 pages
Facebook’s Petabyte Scale Data Warehouse using Hive and HadoopWednesday, January 27, 2010Why Another Data Warehousing System?Data, data and more data200GB per day in March 2008 12+TB(compressed) raw data per day todayWednesday, January 27, 2010Trends Leading to More Data Wednesday, January 27, 2010Trends Leading to More Data Free or low cost of user servicesWednesday, January 27, 2010Trends Leading to More Data Free or low cost of user servicesRealization that more insights are derived fromsimple algorithms on more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataLimited Scalability does not support trends towards more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataLimited Scalability does not support trends towards more dataClosed and Proprietary SystemsWednesday, January 27, 2010Lets try Hadoop… Pros– Superior in availability/scalability/manageability– Efficiency not that great, but throw more hardware– Partial Availability/resilience/scale more important than ACID Cons: Programmability ...
Voir plus Voir moins

Vous aimerez aussi

Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Wednesday, January 27, 2010
W
Why Another Data Warehousing System?
ednesday,January27,2010
Data, data and more data 200GB per day in March 2008 12+TB(compressed) raw data per day today
W
Trends Leading to More Data
ed
n
e
s
d
a
y
,
Ja
n
u
ar
y 2
7, 2
0
1
0
Trends Leading to More Data
Wednesday, January 27, 2010
Free or low cost of user services
W
Trends Leading to More Data
ednesday,Janu
Free or low cost of user services
Realization that more insights are derived from simple algorithms on more data
ary27,2010
Deficiencies of Existing Technologies
Wednesday, January 27, 2010
W
Deficiencies of Existing Technologies
ednesda
Cost of Analysis and Storage on proprietary systems does not support trends towards more data
y,January27,2010
W
Deficiencies of Existing Technologies
ednesda
Cost of Analysis and Storage on proprietary systems does not support trends towards more data
y,January
Limited Scalability does not support trends towards more data
27,2010
W
Deficiencies of Existing Technologies
ednesda
Cost of Analysis and Storage on proprietary systems does not support trends towards more data
y,January
Limited Scalability does not support trends towards more data
27,2010
Closed and Proprietary Systems
W
Lets try Hadoop
edne
Pros – Superior in availability/scalability/manageability – Efficiency not that great, but throw more hardware – Partial Availability/resilience/scale more important than ACID
Cons: Programmability and Metadata – Map-reduce hard to program (users know sql/bash/python) – Need to publish data in well known schemas
Solution: HIVE
sday,January27,2010
W
What is HIVE?
edne
A system for managing and querying structured data built on top of Hadoop – Map-Reduce for execution – HDFS for storage – Metadata in an RDBMS
Key Building Principles: SQL as a familiar data warehousing tool – Extensibility – Types, Functions, Formats, Scripts – Scalability and Performance – Interoperability
sday,January27,2010
Un pour Un
Permettre à tous d'accéder à la lecture
Pour chaque accès à la bibliothèque, YouScribe donne un accès à une personne dans le besoin