tutorial-04
20 pages
English

tutorial-04

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
20 pages
English
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

Description

OLAP | Data Analysis with DB21 Data Analysis and OLAP Aggregate functions summarize large volumes of data. Online Analytical Processing (OLAP):. Interactive analysis of data.. Allows data to be summarized and viewed in di erent ways in an online fashion(with negligible delay). OLAP data is modeled multi-dimensionally. It can be modeled as dimension attributes and measure attributes:. Given a relation used for data analysis, we can identify some of its attributes asmeasure attributes, since they measure some value and can be aggregated upon(e.g., number of sales, inhabitants, passengers, etc.).. Someoftheotherattributesoftherelationareidenti ed asdimensionattributes,since they de ne the dimensions on which measure attributes and summaries ofmeasure attributes are viewed.c 2005 Jens Teubner, Andre Seifert, University of Konstanz 11.1 Cross Tabulation and its Relational Representation A cross tabulation, also referred to as a pivot table, is a table where. values of one of the dimension attributes form the row headers,. values of another attribute form the column headers, and. valuesinindividual cellsare(aggregates of)thevaluesofthedimensionattributesthat specify the cell. The table below is an example of a cross tab:SexMale Female TotalAustralia 9;913;658 9;999;486 19;913;144Denmark 2;676;377 2;737;015 5;413;392Germany 40;413;132 42;011;477 82;424;609Netherlands 8;079;392 8;238;807 16;318;199United States 143;957;558 149;070;013 293 ...

Informations

Publié par
Nombre de lectures 22
Langue English

Extrait

OLAP | Data Analysis with DB2
1 Data Analysis and OLAP
Aggregate functions summarize large volumes of data.
Online Analytical Processing (OLAP):
. Interactive analysis of data.
. Allows data to be summarized and viewed in di erent ways in an online fashion
(with negligible delay).
OLAP data is modeled multi-dimensionally.
It can be modeled as dimension attributes and measure attributes:
. Given a relation used for data analysis, we can identify some of its attributes as
measure attributes, since they measure some value and can be aggregated upon
(e.g., number of sales, inhabitants, passengers, etc.).
. Someoftheotherattributesoftherelationareidenti ed asdimensionattributes,
since they de ne the dimensions on which measure attributes and summaries of
measure attributes are viewed.
c
2005 Jens Teubner, Andre Seifert, University of Konstanz 11.1 Cross Tabulation and its Relational Representation
A cross tabulation, also referred to as a pivot table, is a table where
. values of one of the dimension attributes form the row headers,
. values of another attribute form the column headers, and
. valuesinindividual cellsare(aggregates of)thevaluesofthedimensionattributes
that specify the cell.
The table below is an example of a cross tab:
Sex
Male Female Total
Australia 9;913;658 9;999;486 19;913;144
Denmark 2;676;377 2;737;015 5;413;392
Germany 40;413;132 42;011;477 82;424;609
Netherlands 8;079;392 8;238;807 16;318;199
United States 143;957;558 149;070;013 293;027;571
Total 205;040;117 212;056;798 417;096;915
c
2005 Jens Teubner, Andre Seifert, University of Konstanz 2
Country In relational DBMSs cross tabs are re-
Country Sex Population
presented as relations:
Australia male 9;913;658 female 9;999;486
. The value all is used to represent Australia all 19;913;144
Denmark male 2;676;377
aggregates.rk female 2;737;015
. The SQL 1999 standard uses null
Denmark all 5;413;392
Germany male 40;413;132
values in place of all. female 42;011;477
. DB2 used the minus sign (-) to
Germany all 82;424;609
denote aggregate values.
Netherlands male 8;079;392 female 8;238;807 all 16;318;199
United States male 143;957;558 States female 149;070;013
United States all 293;027;571
all male 205;040;117
all female 212;056;798
all all 417;096;915
c
2005 Jens Teubner, Andre Seifert, University of Konstanz 31.2 OLAP | Terminology
The operation of changing the dimensions in a cross tab is called pivoting.
Supposeananalystwishestoseeapopulationcrosstaboncountriesandsexfora xed
2
value of the size of the states of the respective countries, for example, 10;000km
instead of the sum across all states:
. Such an operation is referred to as slicing.
. If values from multiple dimensions are xed, the operation is called dicing.
The operation of moving from ner-granula rity data to a coarser granularity is called
a roll-up.
The opposite direction | that of moving from coarse granularity data to ne granu-
larity data | is called drill down.
c
2005 Jens Teubner, Andre Seifert, University of Konstanz 42 Extended Aggregation
2.1 SQL-92 vs. SQL-99
SQL-92 aggregation functionality quite limited.
. Very useful aggregates are either very hard or impossible to specify.
Data cube operations,
Complex aggregates (e.g., median, variance),
Binary (e.g., correlation, regression curves),
Rankingqueries(e.g.,assigneachfootballteamarankbasedonthetotalnumber
of point, goal di erence, goals scored).
SQL-99 OLAP extensions provide a variety of aggregation functions to address the
above limitations.
. Supported by DB2 version 6.
c
2005 Jens Teubner, Andre Seifert, University of Konstanz 52.2 Extended Aggregation in DB2
GROUP BY and GROUPING SETS statements are used to group individual rows into
combined sets based on the value in one, or more, columns.
ROLLUP and CUBE statements are short-hand forms of particular types of
GROUPING SETS statement.
2.2.1 Cube Operation
CUBE operation computes union of GROUP BY’s on every subset of the speci ed at-
tributes.
Consider the following example query:
SELECT country, sex, sum(population)
FROM population
GROUP BY CUBE(country, sex);
c
2005 Jens Teubner, Andre Seifert, University of Konstanz 6n
This computes the union of 2 with n=2 groupings of the population relation:
f(country, sex), (country), (sex), ()g,
where () denotes an empty group by list.
For each grouping, the result contains the null value for attributes not present in
the grouping.
Query above computes the relational representation of the population cross tab that
we saw earlier.
The function grouping() can be used to identify what rows come from which parti-
cular grouping set.
. A value of 1 indicates that the corresponding data eld is null because the row is
from of a grouping set that does not involve this row.
. Otherwise, the value is zero.
c
2005 Jens Teubner, Andre Seifert, University of Konstanz 7 Example:
SELECT country, sex, sum(population),
grouping(country) AS country_flag,
grouping(sex) AS sex_flag,
FROM population
GROUP BY CUBE(country, sex);
YoucanusetheCASE expression intheSELECT clause toreplace such nulls(presented
as \-\) by a value such as all.
For example: Replace country in the previous query by:
CASE WHEN grouping(country) = 1 THEN ’all’ ELSE country END AS country
c
2005 Jens Teubner, Andre Seifert, University of Konstanz 82.2.2 Rollup Operation
ROLLUP operation generates union on every pre x of speci ed list of attributes.
The following example query:
SELECT country, sex, sum(population)
FROM population
GROUP BY ROLLUP(country, sex);
generates the union of three groupings:
f(country, sex), (country), ()g
Rollup can be used to generate aggregates at multiple levels of a hierarchy.
Suppose their exists the dimension stretch in the population relation which can be
used to aggregate by town, state, and country.
c
2005 Jens Teubner, Andre Seifert, University of Konstanz 9 Then the query
SELECT country, state, town, sex, sum(population)
FROM population
GROUP BY ROLLUP(country, state, town, sex);
would give a hierarchical summary by sex and by stretch.
Multiple roll-ups and cubes can be used in a single group by clause.
. Each generates set of group by lists.
. Cross product of sets gives overall set of group by lists.
The following example query:
SELECT year, country, sex, sum(population)
FROM population
GROUP BY ROLLUP (year), ROLLUP(country, sex);
generates the groupings:
f(year), ()g X f(country,sex), (country), ()g

f(year,country,sex), (year,country), (year), (country,sex), (country), ()g
c
2005 Jens Teubner, Andre Seifert, University of Konstanz 10

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents