Recent developments in the analysis of large-scale data sets

-

Documents
332 pages
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

eurostat RECENT DEVELOPMENTS IN THE ANALYSIS OF % LARGE-SCALE DATA SETS Proceedings of a seminar held in Luxembourg, 16-18.11.1983 Eurostat News i Special number 1984 Ξίύ eurostat DE EUROPÆISKE FÆLLESSKABERS STATISTISKE KONTOR STATISTISCHES AMT DER EUROPÄISCHEN GEMEINSCHAFTEN ΣΤΑΤΙΣΤΙΚΗ ΥΠΗΡΕΣΙΑ ΤΩΝ ΕΥΡΩΠΑΪΚΩΝ ΚΟΙΝΟΤΗΤΩΝ STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES OFFICE STATISTIQUE DES COMMUNAUTÉS EUROPÉENNES ISTITUTO STATISTICO DELLE COMUNITÀ EUROPEE BUREAU VOOR DE STATISTIEK DER EUROPESE GEMEENSCHAPPEN L­2920 Luxembourg — Tél. 43011 — Télex : Comeur Lu 3423 B­1049 Bruxelles, Bâtiment Berlaymont, Rue de la Loi 200 (Bureau de liaison) —Tél. 235.11.11 Denne publikation kan fås gennem de salgssteder, som er nævnt på omslagets tredje side i dette hæfte. Diese Veröffentlichung ist bei den auf der dritten Umschlagseite aufgeführten Vertriebsbüros erhältlich. Την έκδοση αυτή μπορείτε να την προμηθευτείτε από τα γραφεία πωλήσεων τα οποία 'αναφέρονται στην τρίτη σελίδα του εξωφύλλου. This publication is obtainable from the sales offices mentioned on the inside back cover. Pour obtenir cette publication, prière de s'adresser aux bureaux de vente dont les adresses sont indiquées à la page 3 de la couverture. Per ottenere questa pubblicazione, si prega di rivolgersi agli uffici di vendita i cui indirizzi sono indicati nella 3' pagina della copertina. Deze publikatie is verkrijgbaar bij de verkoopkantoren waarvan de adressen op blz.

Sujets

Informations

Publié par
Nombre de visites sur la page 29
Langue English
Signaler un problème

eurostat
RECENT DEVELOPMENTS
IN THE ANALYSIS OF
%
LARGE-SCALE DATA SETS
Proceedings of a seminar
held in Luxembourg,
16-18.11.1983
Eurostat News i
Special number
1984 Ξίύ
eurostat
DE EUROPÆISKE FÆLLESSKABERS STATISTISKE KONTOR
STATISTISCHES AMT DER EUROPÄISCHEN GEMEINSCHAFTEN
ΣΤΑΤΙΣΤΙΚΗ ΥΠΗΡΕΣΙΑ ΤΩΝ ΕΥΡΩΠΑΪΚΩΝ ΚΟΙΝΟΤΗΤΩΝ
STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES
OFFICE STATISTIQUE DES COMMUNAUTÉS EUROPÉENNES
ISTITUTO STATISTICO DELLE COMUNITÀ EUROPEE
BUREAU VOOR DE STATISTIEK DER EUROPESE GEMEENSCHAPPEN
L­2920 Luxembourg — Tél. 43011 — Télex : Comeur Lu 3423
B­1049 Bruxelles, Bâtiment Berlaymont, Rue de la Loi 200 (Bureau de liaison) —Tél. 235.11.11
Denne publikation kan fås gennem de salgssteder, som er nævnt på omslagets tredje side i dette hæfte.
Diese Veröffentlichung ist bei den auf der dritten Umschlagseite aufgeführten Vertriebsbüros erhältlich.
Την έκδοση αυτή μπορείτε να την προμηθευτείτε από τα γραφεία πωλήσεων τα οποία 'αναφέρονται
στην τρίτη σελίδα του εξωφύλλου.
This publication is obtainable from the sales offices mentioned on the inside back cover.
Pour obtenir cette publication, prière de s'adresser aux bureaux de vente dont les adresses sont indiquées à la
page 3 de la couverture.
Per ottenere questa pubblicazione, si prega di rivolgersi agli uffici di vendita i cui indirizzi sono indicati nella
3' pagina della copertina.
Deze publikatie is verkrijgbaar bij de verkoopkantoren waarvan de adressen op blz. 3 van het omslag vermeld zijn. RECENT DEVELOPMENTS
IN THE ANALYSIS OF
LARGE-SCALE DATA SETS
Proceedings of a seminar
held in Luxembourg,
16-18.11.1983
Eurostat News
Special number
1984 This volume is also available in:
FR: N° de catalogue: CA­AB­84­006­FR­C
Cataloguing data can be found at the end of this publication
The views expressed in this publication are the personal views of their authors. They do not
express opinions, or policies, either of the Commission, or of national governments.
As long as stocks last, interested readers may obtain
— copies of 3 unpublished papers described on page 10
— s of (uncorrected) translations into German and Italian of Chapters 2, 3, 4, 7, 8, 9, 10,
12 and 14
on application to Mr A. D. Cunningham, Eurostat, Bâtiment Jean Monnet,
L­2920 Luxembourg.
Luxembourg: Office for Officiai Publications of the European Communities, 1985
Catalogue number: CA­AB­84­006­EN­C
© ECSC ­ EEC ­ EAEC, Brussels ■ Luxembourg, 1985
Printed in the FR of Germany Table of Contents
Page
0. PREFACE 5
P.B.R. de Geus
1. INTRODUCTION 7
E. Malinvaud
2. THE ROLE OF MODELS IN OFFICIAL STATISTICS 15
J.A. Neider
3. MATHEMATICAL MODELLING OF THE EEC LABOUR FORCE SURVEY 23
M. Aitkin and R. Healey
4. THE TRANSITION FROM ELEMENTARY TO SECONDARY EDUCATION IN
1964 AND 1977, A LOG-LINEAR ANALYSIS 51
H. Stronkhorst and J. Pannekoek
5. ANALYSIS OF CATEGORICAL DATA FROM SURVEYS WITH COMPLEX
DESIGNS: SOME CANADIAN EXPERIENCES 75
D.A. Binder, M. Gratton, M.A. Hidiroglou
6. REPLICATION APPROACHES TO THE LOG-LINEAR ANALYSIS OF DATA
FROM COMPLEX SURVEYS 9
R.E. Fay III
7. USES OF A LINEAR MODEL WITH THE DATA FROM THE 1978 SURVEY
ON THE STRUCTURE OF EARNINGS 119
D. Depardieu and J.F. Payen
8. THE PROGRESSION OF WAGE DISTRIBUTIONS 13
A.D. Airth
9. EDA AND LARGE-SCALE DATA SETS - AN EXAMPLE FROM THE BRITISH
NEW EARNINGS SURVEY 165
J. Bibby
10. EXPLORATORY ANALYSIS OF SURVEY DATA - THE ROLE OF CORRES­
PONDENCES9
L. Lebart
11. LARGE-SCALE SURVEY ANALYSIS 18
J.J. Daudin, R. Tomassone, P. Trecourt 12. ADJUSTING INPUT-OUTPUT TABLES 203
A. Bachern and B. Korte
13. ENTROPY AND DIVERGENCE IN INFORMATION THEORY AND STATISTICS 221
G. Longo and A. Sgarro
14. MEASURES OF CONCENTRATION, ENTROPY AND SELECTION CRITERIA 23
A. Jacquemin
15. MULTIVARIATE ANALYSIS METHODS FOR DISCRETE VARIABLES 241
A.Z. Israels, J.G. Bethlehem, J. van Driel,
M.E. Jansen, J. Pannekoek, S.J.M. de Ree, D. Sikkel
16. CLOSING REMARKS 303
J.Α. Zighera
Bibliography9
List of participants 31PREFACE
P.B.R. DE GEUS
Eurostat
The papers reproduced in this volume were first presented at an inter­
national seminar "Recent developments in the analysis of large data sets"
held by the Statistical Office of the European Communities (Eurostat) at
Luxembourg in November 1983. An important aim of the seminar was to pro­
vide for an exchange of ideas between institutions in different countries
engaged in developing new methods of data analysis but giving emphasis to
different techniques. Discussion was concentrated in particular, on the
Tole of two, correspondence analysis and log-linear modelling,
but the seminar touched on other techniques such as, for example, Explor­
atory Data Analysis (EDA) and on aspects of information theory relevant
to large-scale data analysis. Thes discussed depend largely on
developments in the capacity and effectiveness of computers, which now
make possible the detailed analysis of large masses of data and the heavy
calculations necessary to distil the information into a manageable compass.
This seminar formed part of a series of international seminars on statis­
tical matters which Eurostat has arranged under the auspices of the Com­
mission of the European Communities. Each seminar has differed from its
predecessors but all have dealt with matters of topical importance and
all have produced useful results. The seminars have brought together sta­
tisticians, analysts, economists, sociologists and other data users from
all countries of the European Community and from further afield.
Academics have participated in each of the seminars but the seminar to
which this volume refers was distinguished by the strength of the academic
representation. Although there has not been space to reproduce in full the
discussions on each paper, the value of such discussion to the seminar as
a whole should not go without recognition. In preparing the papers for
publication some latitude was given to authors to revise the original
texts. It was also necessary to ask thes of longer papers to re­
duce them somewhat in length. In principle, however, there has been no
substantial departure from the texts which were presented. The rapporteur
was Professor Zighera of the University of Paris X, who has prepared this
report of the proceedings in conjunction with the Chairman and Eurostat. Finally, Eurostat was indeed fortunate that Professor Edmond Malinvaud,
the distinguished French statistician, and Director of the Institut
National de la Statistique et des Etudes économiques (INSEE), Paris,
accepted an invitation to chair the seminar. That the seminar was such
a marked success owes much to his Chairmanship. INTRODUCTION
E. MALINVAUD
Institut national de la statistique et des études économiques
In the life of institutions as in private life, there are moments when
one has to make a choice committing oneself to a long-term course, moments
when the road forks and one wonders which way to go.
National statistical institutes encountered such a moment at the end of
the war when they had to decide whether or not they should implement pro­
bability sampling in their surveys. A little later another moment came
when they had to decide whether it should be their job to prepare national
accounts. Today they face a similar choice with regard to data analysis.
It is to help them in this choice that this book, a product of a Eurostat
seminar, is devoted; it gives the reader food for thought on two aspects
of the problem. On the one hand, the nature Of the choice should become
more clear in the light of the applications described here and after ex­
amination and discussion at the seminar, and in the light of the general
conclusions which the participants felt able to draw. On the other hand
this book will provide information on the current status of those data
analysis techniques most likely to meet official statistician's needs.
Let us have a closer look at the challenge and see what this book has to
offer.
A CHALLENGE TO OFFICIAL STATISTICIANS
At the current stage of development in public statistics, the mass of data
collected contains such a wealth of information that it is far in excess
of what can be passed on by usual methods of publication and dissemination.
Statisticians everywhere are asking themselves what the best procesures are
for coping with this relatively new situation. They realise that efficient
archiving facilities have to be provided to keep data accessible for a long
period of time to those who need them or to researchers who wish to carry
out diverse analyses. It is also clear that data banks have to be set up
which can exploit the full power of modern data processing to store far
more data than traditional publications can. Finally, it is also clear that
a special effort must be made to document data which are so numerous that,
in the absence of documentation, they are liable to be misinterpreted or
misused. But how great an effort has to be made? One begins to have doubts when one
realises that well documented statistical information in published form is
already scarcely intelligible in many cases, precisely because of its wealth.
Faced with a table, publication of which would involve 20-.pages of closely
printed figures because of the multiplicity of the cross-classification
criteria and the detail in the nomenclatures, a user who is interested in
a particular box or a small number of boxes could be quite satisfied, but
how many readers would feel unable to pick out the information which ought
to emerge from the figures spread out under their eyes? These readers might
consider, and with good reasons, that the statistician had not completed his
task, and that it was up to him to unearth the information buried in the
table.
Various modern techniques for data anlysis claim to meet this demand, at
least in some cases. Ought official statisticians not get used to exploit­
ing these techniques and to taking processing of data one stage beyond that
of conventional tabulation? Would they not then be better at picking out
hidden information in large-scale surveys and passing on information which
is often significant although it does not appear abvious at first sight?
This is a question which official statisticians are facing today.
To avoid any risk of confusion, it should be noted that this is a differ­
ent matter from a much older issue, namely whether statistical departments
should devote part of their resources and efforts to social and economic
analyses. This is a familiar issue, as are the advantages and hazards which
an affirmative answer entails for the relevance and objetctivity of the
statistics prepared by these departments. Different countries give differ­
ent answers to this question; some such as France, where the INSEE is
heavily involved in economic analysis, are clearly in its favour, others
are less involved, and others still not at all.
As regards the data analyses in question today, we are not concerned with
in-depth treatment of some topic taken from the sociological and economic
field by utilising all the available sources, constructing a model and
feeling obliged to draw conclusions, which might be provisional or which
might be nothing more than admissions of ignorance. We are, rather talk­
ing of initial exploratory processing with the aims of descriptive statis­
tics, which official statisticians have always thought to fall within
their province.