Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop
40 pages
English

Facebook's Petabyte Scale Data Warehouse using Hive and Hadoop

-

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
40 pages
English
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

Description

Facebook’s Petabyte Scale Data Warehouse using Hive and HadoopWednesday, January 27, 2010Why Another Data Warehousing System?Data, data and more data200GB per day in March 2008 12+TB(compressed) raw data per day todayWednesday, January 27, 2010Trends Leading to More Data Wednesday, January 27, 2010Trends Leading to More Data Free or low cost of user servicesWednesday, January 27, 2010Trends Leading to More Data Free or low cost of user servicesRealization that more insights are derived fromsimple algorithms on more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataLimited Scalability does not support trends towards more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataLimited Scalability does not support trends towards more dataClosed and Proprietary SystemsWednesday, January 27, 2010Lets try Hadoop… Pros– Superior in availability/scalability/manageability– Efficiency not that great, but throw more hardware– Partial Availability/resilience/scale more important than ACID Cons: Programmability ...

Sujets

Informations

Publié par
Publié le 04 août 2011
Nombre de lectures 112
Langue English

Extrait

Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Wednesday, January 27, 2010
W
Why Another Data Warehousing System?
ednesday,January27,2010
Data, data and more data 200GB per day in March 2008 12+TB(compressed) raw data per day today
W
Trends Leading to More Data
ed
n
e
s
d
a
y
,
Ja
n
u
ar
y 2
7, 2
0
1
0
Trends Leading to More Data
Wednesday, January 27, 2010
Free or low cost of user services
W
Trends Leading to More Data
ednesday,Janu
Free or low cost of user services
Realization that more insights are derived from simple algorithms on more data
ary27,2010
Deficiencies of Existing Technologies
Wednesday, January 27, 2010
W
Deficiencies of Existing Technologies
ednesda
Cost of Analysis and Storage on proprietary systems does not support trends towards more data
y,January27,2010
W
Deficiencies of Existing Technologies
ednesda
Cost of Analysis and Storage on proprietary systems does not support trends towards more data
y,January
Limited Scalability does not support trends towards more data
27,2010
W
Deficiencies of Existing Technologies
ednesda
Cost of Analysis and Storage on proprietary systems does not support trends towards more data
y,January
Limited Scalability does not support trends towards more data
27,2010
Closed and Proprietary Systems
W
Lets try Hadoop
edne
Pros – Superior in availability/scalability/manageability – Efficiency not that great, but throw more hardware – Partial Availability/resilience/scale more important than ACID
Cons: Programmability and Metadata – Map-reduce hard to program (users know sql/bash/python) – Need to publish data in well known schemas
Solution: HIVE
sday,January27,2010
W
What is HIVE?
edne
A system for managing and querying structured data built on top of Hadoop – Map-Reduce for execution – HDFS for storage – Metadata in an RDBMS
Key Building Principles: SQL as a familiar data warehousing tool – Extensibility – Types, Functions, Formats, Scripts – Scalability and Performance – Interoperability
sday,January27,2010
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents