The how and why of statistical classifications

7 pages
Statisticians rely on coordinated classifications to map the economic and social sphere. They use regulatory classifications—chiefly because they are required to do so—and statistical classifications, which they create or help to develop. Moreover, the European Union framework is a driver for close harmonization of national classifications. Their convergence, now largely complete in the area of economic production, is still in progress in the socio-economic field.
The how and why of statistical
! Michel Boeda*
Statisticians rely on coordinated classifications to map the economic and social sphere. They use regulatory
classifications—chiefly because they are required to do so—and statistical classifications, which they create or help
to develop. Moreover, the European Union framework is a driver for close harmonization of national classifications.
Their convergence, now largely complete in the area of economic production, is still in progress in the socio-economic
tatisticians measure entities that inevitably a complicated, imperfect, Shave been defined beforehand and problematic process that requires
in a field that has been identified, complex solutions (see article by Pinel).
named, bounded, and “cadastered”— There is no exact correspondence
one might say “taxonomized.” They between one or more items of the old
can use the most refined level of a and new classifications; otherwise,
classification like a zoom, a detailed the “revision” would simply involve
level beyond which the possibility switching categories (why bother?) or
or significance of the measurement nesting (change of scale).
is lost. From another angle, the
aggregated (“collapsed”) levels offer nesting is precisely the key to
summary information. international comparability, and
especially to european Union (eU)
Statisticians (here: official harmonization, now essential.
statisticians) are involved in many
fields but use only a limited number
Overview of classificationsof coordinated classifications. We
can link classifications—for example,
Statisticians begin by finding their those of activities and products
bearings in a territory and laying (which are symmetrical) and those
down markers in areas that have of occupations and social categories
largely been classified without their (which are nested). We can also
participation, such as accounting, combine them (specializations and Systema Naturæ (“The Systems of Nature”)
(1748), by the taxonomist Carl Von Linnæus law, and regulations.educational attainment) or create
c lassifications age because reality
changes. c lassifications need to Box 1: Nomenclatures and classifications
be revised periodically, but that is a
In ancient Rome, the nomenclator was the usher who announced the names difficult exercise for statisticians. The
and titles of senators—a nomenklatura before its time! The term “nomenclature” linkage between consecutive series is
refers to the concept of naming. “classifications” are more suggestive of the
need to organize knowledge categories.
The structuring of the economic and social sphere has been a very gradual process.
* Michel Boeda was Head of the classifications
International harmonization is recent and incomplete. Some typologies are built Division at InSee’s Head office, 1989-1995, then
Deputy Head of the Statistical and Accounting from data analyses and intended mainly for study purposes. Multiple-use statistical
Standards Department, and has ended his classifications are informed by principles and objectives. To identify—for example,
career at ceFIL (InSee’s Training center in
in a business register or by assigning a geographic code—is not to classify.Libourne), where, among other projects, he
organized seminars to prepare statisticians from The terms “nomenclature” and “classification” exist in French and english. French
the european Union accession countries for
speakers tend to use the first, english speakers the second. We can treat them as their new statistical environment.
1. originally published as “Les nomenclatures synonyms, each focusing on one aspect of the concept: a system for filing items
statistiques: pourquoi et comment,” Courrier in drawers, with specific instructions on what goes where (classification), and a
des statistiques (French series), no. 125, nov.-
set of labels describing the general content of each drawer (nomenclature).
Dec. 2008, pp. 5-11, http://www.insee.fr/fr/ffc/
courrier des statistiques, english series no. 15, 2009
Source: WikipediaMichel Boeda
Box 2: Classifications and mathematics
Partition the nested aggregation categories for – by intrinsic nature of products (as in
A “flat” (single-level) classification forms national classifications. initial cPc).
a partition of the field studied that is Methodological advances have not
a breakdown into disjoint equivalence Tree structure sufficed to settle the debate (Boeda et
classes. A multi-level classification classifications (espalier tree structures) al., 2002).
consists of nested partitions. fall within the scope of graph theory.
Partitions have a “lattice structure” This approach is better suited to Data analysis
with respect to nesting, just as whole Instead of taking formal classification research on the “closeness” of two
numbers do with respect to divisibility: structure as its starting point, data classifications, notably between
A nests within B, B nests within A, or analysis uses information on the objects countries, by specifying a “distance”
there is no nesting. Like least common to be classified in order to deduce between tree structures. We can also
multiples (LcMs) and largest aggregation classes and tree structures. assess the homogeneity of “more or
denominators (Lc Ds), the “product” It assumes the existence of data, a less dense” tree structures using an
classification (intersection of two metric for the “distance” between two entropic concept (disaggregation vs.
classifications) is the one in which we objects, and a choice of levels for aggregation) applied to information
need to collect information if we want aligning the tree-structure levels. The distribution.
to publish results in both classifications; classification obtained depends on the
InSee has taken part in a european the “sum” classification (union) is the data: any new information may call it
research project implicitly aimed at one in which we can compare the into question (Volle et al., 1970).
result of the data collected in either transcending dialectical debates by Area divisions for study purposes
means of a technical approach, notably classification (Arkhipoff, 1976). routinely draw on data analysis. Its initial
for revising the international product We can identify equivalence classes applications have revealed relevant and
classification. There are indeed three (codes, descriptions) but there robust macroeconomic groupings. The
ways to design such a classification:is no natural order. An international classification of sports activities has
– by origin (european approach)classification cannot serve as a bank of relied on data analysis (Desrosières,
basic items and, at the same time, supply – by purpose (American approach) 1972).
But statisticians also contribute to on commuting patterns but comply transactions. n ational accounting
the evolution of regulatory, economic, with the administrative constraints also relies on classifications of
social, and other standards. o ne imposed by regional boundaries. activities (for industry accounts) and
example is national accounting, products (for supply-and-use tables)
To structure the world of as well as on functional classifications which defines and organizes flows
enterprises, the SIRene register in the economic system. Similarly, for household consumption and
uses categories that reflect mandatory sociodemographic statistics explore government expenditures. “Satellite
reporting by firms and offer an outline different aspects of people and their accounts” build bridges between
of “institutional sectors.” The General relationship to work, an indicator of national accounting and various
social category. Statisticians flesh chart of Accounts (Plan comptable sectors with specific classifications,
out and arrange the administrative Général)—a central reference—is not such as tourism, research, and
foundations, thereby helping to a statistical classification, although it agriculture.
incorporates features requested by structure the economic and social
statisticians. In France, corporate tax The health field (diseases, causes sphere.
returns for “income from industrial of death, and so on) has its own
French administrative territorial and commercial activities” (bénéfices international statistical norms. It also
units, from regions to municipalities uses social-insurance management industriels et commerciaux: BIc)
(communes), are the fruit of our notably serve to prepare “intermediate tools for tracking medical
history: regions are ranked in nUTS corporate accounts” (comptes procedures, medical and paramedical
(eU nomenclature of Territorial Units intermédiaires des entreprises), which occupations, and so on. The same is
for Statistics) at the second level, can be broken down by economic true for education with UneSco’s
the 36,000 communes at the fifth activity. International Standard classification of
level. These indivisible atoms of the education (ISceD) and management
French Geographic code account for National accounting, a representation tools used by French educational
about one-third of eU items—hardly of economic flows, makes reference district authorities (rectorats).
a balanced situation. There are a to various classifications defined by
the United nations System of national Classifications regarding individuals host of other geographic divisions
Accounts (SnA 93) in the accounts of (giving age, vital statistics, nationality, that statistical methods have helped
and other characteristics) are used to define. Special mention should institutional sectors: transactions in
be made of “employment areas” goods and services, distribution-of- by statisticians in the most neutral
(zones d’emploi), which are based income transactions, and financial manner possible. But they are primarily
The how and why of statistical classifications
administrative classifications subject
to various limitations with respect to Box 3: Naming, or why words count
civil or penal legal age, number of
The Swiss classification of activities had distinguished between—and therefore
tax-deduction units per household,
named—metal roofing work as Bauspenglerei (German), travaux de ferblanterie
and so on. Having been rejected by (French), and lavori di lattoneria (Italian): three languages, three different metals.
the constitutional council, ethnic- Moreover, in “French French,” ferblanterie would be replaced by zinguerie
and religious-based typologies are (evoking zinc rather than tin)! This example shows that literal translation is not
not used in France. always possible.
Translations from French into english and back again offer surprises: InSee
had suggested adding “certification of civil-engineering structures” (in French: The customs classification, used
certification des ouvrages d’art) to the explanatory notes for “technical inspection extensively by statisticians, clearly
services.” The translation came back as “authentication of works of art” illustrates the constraints of a
(authentification d’œuvres d’art).
regulatory classification. Its goal is
In French, occupation can change with gender: the boulanger (“baker,” masculine) to enable international trade to expand
kneads the dough and minds the oven, the boulangère (“baker,” feminine) serves
in a transparent setting and to provide
customers and operates the cash register. But the human brain is rather good
a framework where rules can be at decoding ambiguities: of the three expressions coupe de cheveux (“haircut”),
stated with their legal consequences. coiffeur (“hairdresser”) salon de coiffure (“hair salon”), only the first denotes an
Such rules apply to customs duties activity; the second describes an occupation, and the third an establishment.
and refunds, quotas, narcotics, A likelihood test in a past population census had turned up a totally anomalous
arms, hazardous products, and so number of farmers in urban areas, nearly all of them female. By checking the
sources, InSee was able to identify the source of the anomaly: the occupation on. The first obligation, therefore, is
jardinière d’enfants (“kindergarten worker,” feminine) had been shortened to an unambiguous identification of all e (“gardener,” feminine)—the watchword, at the time, was to save merchandise, objectively observable
computer processing space. The recurring error was easy to correct.
with state-of-the-art technology
(for example: traces of genetically
modified organisms [GMos]).
customs categories are therefore far ignored. Via cnIS, InSee has invited stages for the waste flows: collection,
more focused on the boundaries of an representatives from the public sorting, processing, and disposal.
item than on its core—the opposite education system and the continuing
of the statistician’s approach. Their The working group’s report was buried system—two worlds that
description may be either a very long for three years. It was resurrected as a usually do not work together—to sit
enumeration or a simple “other,” a appendix to the 2002 eU regulation on around the same table (Gensbittel et
balancing item spelled out at the next waste statistics, but with an artificial al., 1992). A classification needs to
level of detail. often, the economic linkage to the european Waste be negotiated; it cannot be forced on
destination of products is of little catalogue—which goes to show how users.
interest to customs authorities. The hard it is to abrogate a regulation.
statistician consequently interprets The classification of waste is a
The classification of physical and “crawler tractor” as a construction
result of the technical impossibility of
sports activities was developed by machine, a “wheeled tractor” as an
implementing the “european Waste
a working group composed of InSee agricultural machine.
catalogue” prepared by jurists. To put
and the statistical unit of the Ministry of
it bluntly, the catalogue merely listed
Youth Affairs and Sports (MJS, 2002). Statisticians may also be called in
economic activities and inserted
A wide variety of data, supplementing to address a specific need. Three
“waste from” before each. But many
those of the 2000 Sports Participation very different examples—education/
types of waste are not generated by
Survey, were “crunched” through the training, waste, and physical
activities. examples include products
ascending hierarchical classification and sports activities—illustrate
at the end of their life cycle, from (AHc ) analysis method. The “co-building” between classification
old documents to the French aircraft project leader was responsible for supply and demand, involving various
carrier Clemenceau. assigning weights to the different players and institutions.
data categories. The outcome was a
At eurostat’s request, IFen, ADeMe, The classification of education/ classification comprising 9 classes,
In See , and a few international training specializations addresses 34 families, and 335 disciplines. While
experts formed a working group. It a long-latent need to reconstruct the names of the disciplines are drawn
defined waste categories by their a classification that had become from standard sports terminology,
nature, ranking them by hazard where obsolete and mainly focused on the groupings identified by the data
applicable (chemical, radioactive public education programs provided analysis are outright creations. Thus
in initial schooling. Technical training or biological), on the basis of the labels invented to describe them
programs were poorly represented degradability or recyclability in other mean nothing to people outside of the
and, most importantly, lifelong cases. A secondary criterion was the working group. Good luck to these
education for working adults was sequence of mandatory treatment new expressions!
courrier des statistiques, english series no. 15, 2009
– The next two allow a doubling of The customs modelStructural classifications:
detailed breakdowns (from 5,000 to activities and products
Starting in the late 1960s, the european 10,000 items) in the eU’s “c ombined
c ustoms Union required Member n omenclature” (cn ) for the c ommon Formerly, each application program
States to use national customs c ustoms Tariff and external trade was implemented using a specific
classifications based on a “nesting” statistics.classification, for activities as well
european matrix. The only latitude as products. There was no clear
granted to individual countries was the – The French classification (n GP) distinction between economic
right to subdivide any european item has a ninth position to express our activities and individual activities
at the most detailed (“final”) level. exceptions: wine, cheese, and so on.(occupations). This tower of Babel
prevented full use of the information
These classifications evolve in available.
tandem, their regulatory purpose
leaving little room to accommodate
Modern times statisticians’ needs.
The inter-departmental register
The international and eU decision has
that preceded SIRene created
been to use the customs classifications
an opportunity to impose the as the reference, which—in theory—
classification of economic activities settles the issues of consistency
(n Ae 59). The national accounts— between the production sphere and
most notably the input-output
the external-trade sphere. every good
table—provided a strong case for a Russian dolls is defined by a whole number of HS
classification of products arranged
positions at international level and by
in the same way as the classification
a whole number of cn positions (if This model has been systematized.
of activities. At the same time, it
needed) in europe. There are some Since 1988, the same “Russian doll”
made sense to submit product adaptations, however. For example, arrangement applies:
questionnaires to the firms that customs classifications recognize
carried the corresponding Principal only processed milk (a product of the – The first six digits of the customs
economic Activity (APe) code. These food industry) and ignore raw milk (a code are those of the Harmonized
desiderata were fulfilled by the
System (HS). product of livestock breeding) and
“n AP 73” classification of activities
and products. As its name does
not indicate, it actually consisted of
Box 4: Understanding the activities-products correspondence
a pair of mirror classifications with
each activity generates characteristic products. Must every product originate 600 matching items. The “products”
from a single activity? If we applied this principle to the lowest level of the section was later refined in order to
classification of activities, we would be assuming a nesting relation that would adapt it to the “industry” surveys
make the product classification a sort of expanded version of the activity
and to begin the structuring of the
classification. The U.n . Statistical c ommission eventually decided to design the
vast tertiary sector (no DeP: detailed c Pc like the balance of payments.
classification of products).
Yet a concrete example brings the issue back to its proper proportions: in the
old c PF, the “production of fish” activity corresponded to the “fish” product.
The inter-ministerial decree But why deprive ourselves of the distinction between fishing and fish-farming
promulgating n AP made its use activities, which exhibit major differences such as employment at sea or on
land, processing equipment vs. boats, and resource management? Admittedly, compulsory in official statistics.
statisticians cannot discern fish (unless, perhaps, they are also gourmets). It specified that the classification
We therefore have a product that is common to two activities at the detailed
by activity did not, in itself, create
level (coding linked to the higher level). Taking an ordinary activities-products
rights or duties for firms. And it correspondence as our starting point, we had merely distinguished between
reminded non-statistician users of two modes of production for reasons of relevance to business statistics, without
their own responsibility (see article impacting product statistics.
by Roussel). Retail trade offers a case that is more complex but open to the same analysis.
Merchandising consists in offering customers the products they want in the
right conditions. This is a service that justifies a profit margin. each contract The strictly national history of n AP
lists the products sold (invoice), and retail trade is accordingly broken down by ended after twenty years of good
product ranges sold. Retailing activity takes various forms, such as specialized and faithful service (Lainé, 1999).
stores, non-specialized department stores, street markets, mail order, and online
But customs classifications had vendors. each form should be tracked, along with its effects on employment,
been sidelined—hence the lack of urban planning, and the social bond. Here, the activities-products relationship
consistency at detailed level between takes the form of a matrix that cross-tabulates retailing methods and margins
the production and external-trade per product range sold.
Source: WikipediaThe how and why of statistical classifications
many perishable products such as The national implementation of the
latest classification change reproduced fresh pastry. experience has led to
a loosening of strict principles in the the initial procedure (Boeda, 1996),
latest revision of the classifications but with better testing, supervision,
and documentation. The timetable (see article by Lacroix and Fuger).
was tighter, as the statistical calendar
was now more europeanized.
The Single European Market
The first result of eU discussions
The first european c ommunity is the linkage between activities
classification of activities (nAce 70) and products, as advocated by
is contemporary with nAP 73, but the French. This contributes to the
there is no one-for-one equivalence. overall consistency with customs
The prospect of a single market in classifications (see diagram on loose
1993 required good comparability of sheet inserted in this issue).
national statistics on the production
sector. The solution was a revised Along with the demands of national
nAce in which national classifications accountants, the decisive role fell to
would be nested. statisticians in charge of industrial
statistics (Prodcom): to what should
The operation was carried out in one link a european list of several
tandem with the third revision of thousand industrial goods, if not to
the International Standard Industrial the activity code of the industry of
French Classifications of Economic Activities and
classification of All economic Activities origin? eurostat quickly understood Products, 1973 version (NAP 73)
(ISIc ), administered by the United that the eU classification of products
nations—hence the same Russian- would become an empty shell if it
doll arrangement: ISIc Rev. 3 - nAce failed to fit in between Prodcom and In other words, this repeated the
Rev. 1 - nAF. each is broken down in nAce. nAP 73 arrangement, fleshed out by
detail in the next classification, but noDeP and industry surveys. The
the nesting is not visible in the code. This realization led to the now larger majority of eU statistical
establishment of the c lassification authorities has approved the cPA
The operation has just been repeated, of Products by Activity (cPA). The structure, although the latest cPc
in parallel with the fourth revision of cPA code reproduces the nAce code revision endorses its initial structure
ISIc. This time, eU transparency has at aggregate levels, broken down and U.S. statisticians have defended
been achieved. The present issue of into two supplementary positions for an alternative choice.
Courrier des statistiques is largely a detailed description, plus a two-
devoted to the operation and its digit position for the Prodcom list
Macroeconomic classifications
impact on French statistics. (for goods-producing industries).
Macranalysis must
operate on large, economically
Box 5: The association criterion significant categories, combining
market characteristics and corporate
The French theoretical approach is based on a correspondence between
strategy. For example, consumer
activities and products and an association criterion. This prescribes a grouping
industries have to be accommodative of (including to form an elementary “building block”) that respects the
toward wholesalers and retailers, woo associations most often encountered in units. Multiple activity is thus minimal
and the significance of the classification is maximal. customers, and segment the market;
by contrast, capital-goods industries Underneath this empirical observation lies a microeconomic determinism. If the
market-entry cost of a key product is high (machinery, technology, research, exploit their technical know-how
etc.), the firm that takes the step sets up a near-monopoly on the production that or that of a network of specialized
depends on the key product. The firm has every incentive to press its advantage. subcontractors by catering to the needs
conversely, the producer of an ordinary product, exposed to stiff competition, will of large customers. That is what the
try to tailor the range to its customers, including as reseller. It is the combined set
“Summary economic classification”
of products and activities that develops its structure in the market: groupings are
(n omenclature Économique de shaped by a supply- or demand-driven rationale, as the case may be.
Synthèse: neS) sought to capture
The association criterion, seldom articulated, informs discussions during revision
through its groupings, in contrast to processes. For example, the latest revision saw the end of the centuries-old
the ISIc/nAce grwhich were association between printing and publishing, a separation between production
solely production-focused. Ultimately, and repair of industrial goods, a confirmation of the association between trade
and rof motor vehicles, and a convergence of multimedia activities. neS was adopted only in France,
whose statistical institute (InSee)
courrier des statistiques, english series no. 15, 2009
Source: InSeeMichel Boeda
displays the singular characteristic the “Socio-occupational categories” At the same time as In See was
of performing economic studies as (catégories Socioprofessionnelles: updating its national classification of
well. Another aim was to counter cSP), still used in everyday language. occupations, eurostat was promoting
non-coordinated groupings in official This is a distinctly French development the application of the International
statistics. The issue re-emerged in the (see article by Desrosières). The Standard c lassification of o ccupations
latest revision (see article by Madinier), very concept of social category— (ISco, 1988 version). However, ISco
leading to an eU compromise that introduced in the middle of the cold was issued in a marginally adapted form
involved the abandonment of neS War—was bold. Technically, PcS for europe called ISco(coM). This
and officialized the dissemination of comprises two nested classifications: venture registered some successes,
statistics on different grouping levels. occupations and socio-occupational particularly in new Member States
categories. with obsolete national classifications.
But an old ambiguity endures: ISco Functional classifications
The basic rationale is that social identity is focused not so much on people’s
is built in the workplace. occupation occupations as on the jobs they hold Beyond the production system, we
is decisive for social positioning. It (see articles by Brousse and Torterat, need to track the uses of products—
is understood in the broad sense, who notably discuss ISco and its above all, household consumption.
i.e., including job description, skills, european future).
status, and terms of “collective The current international classification
is the c lassification of Individual agreements” between employers PcS is very accurate if the protocol
consumption by Purpose (coIcoP: and employees in each industry. A is followed strictly. That is not so
only the english abbreviation is person’s occupation reflects his or easy a task, as it requires information
used), with no eU or national her education and training, family ranging beyond the job position. ISco
version. A conversion table shows background, and the context in which is probably easier to code but leaves
the correspondence with cPc (and (s)he engages in it. Income, lifestyle, a wider margin for interpretation.
hence cPA/cPF). coIcoP is used and consumption patterns go hand in The ILo expressly recognizes the
for the Household Budget Survey— hand with occupation. The correlation need for national classifications of
conducted throughout the eU—and also applies to retired people—and occupations, which should reflect
to present the eU Harmonised for households, so strongly does the structure of national employment
Index of c onsumer Prices (HIc P). endogamy persist in occupational markets as faithfully as possible.
Purchasing power parities (PPPs) categories.
between countries are determined by Work on ISco 2008 is ending
means of a detailed breakdown of the econometricians therefore view the without truly convincing results, all
lowest coIcoP level. social category as an overall indicator the more so as it was performed
that possesses a high explanatory on an international scale and that
The classification of the Functions power when applied to household certain now-established eU practices
of Government (co Fo G) is the behavior—and so eliminates the need have been challenged, such as the
international system used to categorize for multiple kinds of information that use of the category “administrative
government expenditures. The latest are hard to access (such as income). managers in the public sector” (cadres
version, revised for consistency with There is no comparable international administratifs publics).
coIcoP, is more specifically aimed at
system: occupations lie within the
breaking down general-government Absent an approved international scope of the International Labor office
final consumption (in the national- standard, social categorization (ILo), whereas social categories tend
accounting sense) by category:
to be the object of academic study. remains a strictly eU undertaking.
general administration, defense,
eurostat had already been obliged to By trying to promote convergence
public order, education, health, social
recommend pseudo-social categories between these fields, the european
protection, and so on. Individual-
(ISco-based occupational groupings) Socio-economic classification (eSec)
consumption items such as education
project is leading the way. for eU Household Budget Surveys:
and health can thus be aggregated
in so doing, it endorsed a founding
with similar items financed directly by
France has used PcS in censuses principle of the French Pc S. The
since 1982, introducing a new version theoretical inspiration for the current
in the 1999 census. A 2003 revision project is Goldthorpe’s table of classes;
concerned only the “o ccupations” the project reference is the socio-Structural classifications:
section. The classification is used in economic classification (eSec: see occupations and socio-
household surveys, while its variant article by Brousse). The studies under occupational categories
for employees (Pc S-eSe, where way seek to measure the prototype’s
France’s system of “o ccupations eSe stands for “emploi Salarié en capacity to capture occupations in
and Socio-occupational categories” entreprise” [paid employment in firms]) a PcS and/or ISco framework and
(Professions et c atégories is used for surveys or administrative to provide adequate explanatory
Socioprofessionnelles: PcS) succeeds forms filed by employers. power in various applications of the
The how and why of statistical classifications
“common core” questionnaire in eU services, where local specificities The advantages of international
harmonization in the production household surveys. endure).
sphere vastly outweigh the drawbacks.
That argument is less self-evident In the social sphere, the historical
Conclusion in the social sphere. Despite their legacy of particularisms is becoming
age, “tailor-made” national systems the rule. After half a century of economic classifications are highly
remain attractive by comparison to an european convergence, reciprocal standardized in the eU because the
off-the-shelf eU system that has not
recognition of education degrees has process began long ago, and because
yet found its bearings. n
not made much progress, and the trade, technology, and the single
market all provided incentives in the language barrier perpetuates heavy
same direction (with reservations for labor-market segmentation.
courrier des statistiques, english series no. 15, 2009

