42
pages

Van Imhoff (Evert), Post (Wendy). - Microsimulation methods for population projections Microsimulation differs from traditional macrosimulation in using a sample rather than the total population, in operating at the level of individual data rather than aggregated data, and in being based on repeated random experiments rather than average numbers. Here are presented the circumstances in which microsimulation can be of greater value than the more conventional methods. It is particularly relevant when the results of the process being studied are complex whereas the forces driving it are simple. A particular problem in microsimulation results from the fact that the projections are subject to random variation. Various sources of random variations are examined but the most important is the one we refer to as specification randomness: the more explanatory variables are included in the model, the greater the degree of random variation affecting the output of the model. After a brief survey of the microsimulation models which exist in demography, a number of the essential characteristics of microsimulation are illustrated using the KINSIM model for projecting the future size and structure of kinship networks.

Van Imhoff (Evert), Post (Wendy). - Méthodes de micro-simulation pour des projections de population La micro-simulation se distingue de la macro-simulation traditionnelle, en utilisant un échantillon plutôt que la population totale, en travaillant au niveau de données individuelles plutôt que de données agrégées, et en se basant sur des expériences aléatoires répétées plutôt que sur des nombres moyens. Nous présentons ici les circonstances sous lesquelles la micro-simulation peut être plus intéressante que des méthodes plus conventionnelles. Elle est particulièrement appropriée si les résultats du processus étudié sont complexes, tandis que les forces qui lui sont sous-jacentes sont simples. Un problème difficile en micro-simulation vient de ce que les projections sont sujettes à des variations aléatoires. Diverses sources d'aléas sont présentées, mais la plus importante est ce que nous appelons l'aléa de spécification : plus on introduit de variables explicatives dans le modèle, plus le degré d'aléa, auquel les sorties du modèle sont sujettes sera important. Après une revue rapide des modèles de micro-simulation qui existent en démographie, plusieurs des caractéristiques essentielles de la micro-simulation sont illustrées avec le modèle KINSIM, pour projeter la taille et la structure des réseaux de parenté futurs.

Van Imhoff (Evert), Post (Wendy). -Métodos de micro-simulación para proyecciones de población Algunos de los elementos que distinguen la micro-simulación de la macro-simula- ción tradicional son: el uso de muestras en lugar de población total, trabajo a nivel de datos individuates en vez de datos agregados y uso de experimentos aleatorios repetidos en lugar de médias. El articulo présenta las condiciones bajo las cuales la micro-simulación puede ser más interesante que los métodos convencionales. La micro-simulación es especialmente apropiada si los resultados del proceso estudiado son complejos mientras que las fuerzas subyacentes son simples. Una de las dificultades existentes en micro-simulación es que las proyecciones están sujetas a variaciones aleatorias. Existen varias fuentes de error, pero el más importante es el derivado de la propia especificación del modelo : cuantas más variables explicativas se introduzcan en el modelo, mayor sera el nivel de error al cual los resultados del modelo están sujetos. Después de realizar una rápida revision de los modelos de micro-simulación que existen en demografia, el articulo ilustra varias caracteristicas esenciales de la micro- simulación a través del modelo KINSIM, para proyectar el tamaňo y la estructura de las re- des de parentesco futuras.

40 pages

Source : Persée ; Ministère de la jeunesse, de l’éducation nationale et de la recherche, Direction de l’enseignement supérieur, Sous-direction des bibliothèques et de la documentation.

Voir plus
Voir moins

Vous aimerez aussi

Wendy Post

Microsimulation methods for population projection

In: Population, 10e année, n°1, 1998 pp. 97-136.

Citer ce document / Cite this document :

Van Imhoff Evert, Post Wendy. Microsimulation methods for population projection. In: Population, 10e année, n°1, 1998 pp. 97-

136.

http://www.persee.fr/web/revues/home/prescript/article/pop_0032-4663_1998_hos_10_1_6824Abstract

Van Imhoff (Evert), Post (Wendy). - Microsimulation methods for population projections Microsimulation

differs from traditional macrosimulation in using a sample rather than the total population, in operating at

the level of individual data rather than aggregated data, and in being based on repeated random

experiments rather than average numbers. Here are presented the circumstances in which

microsimulation can be of greater value than the more conventional methods. It is particularly relevant

when the results of the process being studied are complex whereas the forces driving it are simple. A

particular problem in microsimulation results from the fact that the projections are subject to random

variation. Various sources of random variations are examined but the most important is the one we refer

to as specification randomness: the more explanatory variables are included in the model, the greater

the degree of random variation affecting the output of the model. After a brief survey of the

microsimulation models which exist in demography, a number of the essential characteristics of are illustrated using the KINSIM model for projecting the future size and structure of

kinship networks.

Résumé

Van Imhoff (Evert), Post (Wendy). - Méthodes de micro-simulation pour des projections de population

La micro-simulation se distingue de la macro-simulation traditionnelle, en utilisant un échantillon plutôt

que la population totale, en travaillant au niveau de données individuelles plutôt que de données

agrégées, et en se basant sur des expériences aléatoires répétées plutôt que sur des nombres moyens.

Nous présentons ici les circonstances sous lesquelles la micro-simulation peut être plus intéressante

que des méthodes plus conventionnelles. Elle est particulièrement appropriée si les résultats du

processus étudié sont complexes, tandis que les forces qui lui sont sous-jacentes sont simples. Un

problème difficile en micro-simulation vient de ce que les projections sont sujettes à des variations

aléatoires. Diverses sources d'aléas sont présentées, mais la plus importante est ce que nous appelons

l'aléa de spécification : plus on introduit de variables explicatives dans le modèle, plus le degré d'aléa,

auquel les sorties du modèle sont sujettes sera important. Après une revue rapide des modèles de

micro-simulation qui existent en démographie, plusieurs des caractéristiques essentielles de la micro-

simulation sont illustrées avec le modèle KINSIM, pour projeter la taille et la structure des réseaux de

parenté futurs.

Resumen

Van Imhoff (Evert), Post (Wendy). -Métodos de micro-simulación para proyecciones de población

Algunos de los elementos que distinguen la de la macro-simula- ción tradicional son:

el uso de muestras en lugar de población total, trabajo a nivel de datos individuates en vez de datos

agregados y uso de experimentos aleatorios repetidos en lugar de médias. El articulo présenta las

condiciones bajo las cuales la micro-simulación puede ser más interesante que los métodos

convencionales. La micro-simulación es especialmente apropiada si los resultados del proceso

estudiado son complejos mientras que las fuerzas subyacentes son simples. Una de las dificultades

existentes en micro-simulación es que las proyecciones están sujetas a variaciones aleatorias. Existen

varias fuentes de error, pero el más importante es el derivado de la propia especificación del modelo :

cuantas más variables explicativas se introduzcan en el modelo, mayor sera el nivel de error al cual los

resultados del modelo están sujetos. Después de realizar una rápida revision de los modelos de micro-

simulación que existen en demografia, el articulo ilustra varias caracteristicas esenciales de la a través del modelo KINSIM, para proyectar el tamaňo y la estructura de las re- des de

parentesco futuras.MICROSIMULATION METHODS

FOR POPULATION PROJECTION

Evert VAN IMHOFF* and Wendy POST**

I. - Introduction

Population projections are almost invariably produced with the so-

called cohort-component method. In its simplest form, this method boils

down to the following. The population is classified by sex (males and fe

males) and age group (cohorts). For each combination of sex s and age x,

the initial population is transformed into a projected final population of

sex s and age x+l by projecting the population changes, distinguished by

type (components). Typical components are mortality and fertility. These

calculations are repeated for successive time intervals, where the final

population of one interval serves as the initial population for the next in

terval, until the end of the projection period has been reached.

The basic idea behind the cohort-component model is that the popul

ation changes because individuals experience certain demographic events,

and that the mechanisms underlying these events differ between the sexes,

age groups, and the type of event. The total number of events of a certain

type, for each combination of age and sex, is projected as the result of

two factors: the size of the population exposed to the risk of experiencing

the event; and the level (or intensity) of the risk for individual persons,

which may be interpreted as a measure of demographic behaviour.

Suppose that we want to project the number of children born during

the year out of 100,000 women aged 25. The population consists of

100,000 women and each 25-year old woman has a probability of 0.10 to

'rate' is 0.10). bear a child during the year (i.e. the age-specific fertility

Now according to the traditional methods of demographic projection, which

might be called macrosimulation, the projected number of births is obtained

by applying the fertility probability to the size of the group of women:

0.10 x 100,000 yields 10,000 projected births.

** * Netherlands Department of Interdisciplinary Medical Statistics, Demographic Faculty of Institute Medicine, (NIDI), Leyden The University, Hague, Netherlands. Netherlands

Population: An English Selection, special issue New Methodological Approaches

in the Social Sciences, 1998, 97-138. 98 E. VAN IMHOFF, W. POST

In contrast, microsimulation would proceed as follows:0 )

— first, a sample of, say, 1,000 women is drawn from the population;

— next, for each woman in the sample, a random experiment is done

with 0.10 probability of success. More specifically, in each experiment a

random number is drawn from the uniform distribution over the (0,1) in

terval. If the number drawn is less than 0.10, the woman is deemed to

have a child. For understandable reasons, this roulette-like procedure is

known as the Monte Carlo technique. On average, 1,000 experiments will

yield 100 successes, i.e. births. However, in a particular model run there

can be either less or more than 100 simulated births;

— finally, the number of births in the sample is scaled to

the population level: 100 births among a sample of 1,000 implies

10,000 births for a population of 100,000.

Now in this particular example, microsimulation is needlessly comp

licated. The projection problem is so trivial that no demographer would

ever user microsimulation to solve it. Nevertheless, the example illustrates

the three essential ingredients of the approach that dis

tinguish it from traditional macrosimulation:

— the model uses a sample rather than the total population;

— it works on the level of individual data rather than grouped data;

— it relies on repeated random experiments rather than on average

fractions.

Together, these three ingredients imply several strengths, as well as

several weaknesses for microsimulation. These strengths and weaknesses

will be discussed extensively in this article, but at the outset it should be

stressed that microsimulation can do certain things that macrosimulation

cannot. For this reason alone, microsimulation should definitely be taken

seriously as a potentially powerful tool for demographic as well as for

non-demographic projection purposes.

The origins of the microsimulation approach go back to the late 1950s

(Orcutt, 1957; Orcutt et ai, 1961). With the advances in computer tech

nology, the method has gained increasing popularity in recent years. Howe

ver, many of the advantages and principles of the approach have been

recognized independently and dispersed over several disciplines (Clarke,

1986). In demography, quite a number of applications of microsimulation

exist today - an overview will be given later on in this paper - but a

coherent literature of the essentials of demographic is

lacking. With this paper, we hope to provide a first attempt in this direction.

The outline of this paper is as follows. In section II, the conceptual

similarities and differences between micro- and macrosimulation, already

briefly introduced above, will be elaborated. In section III, we will outline

(1) This description applies to microsimulation in its conventional form. Variance-re

duction techniques like the sorting method work slightly differently. This issue will be ela

borated in section IV. MICROSIMULATION METHODS 99

the strengths of microsimulation, and the resulting relevance of microsimul

ation for demographic projection purposes. However, microsimulation does

have its drawbacks, too. In particular, the issue of randomness is extremely

important in which is why we devote a separate section

(section IV) to it. Section V deals with several other issues that are specific

to microsimulation. Existing microsimulation models in demography, which

can be viewed as concrete applications of the concept of microsimulation,

are briefly reviewed in section VI. The flavour of microsimulation can best

be obtained by having a closer look at a particular application. Therefore,

in section VII we briefly discuss the microsimulation model KINSIM de

veloped at NIDI, not because KINSIM is a particularly spectacular repre

sentative from the microsimulation family, but rather because it neatly

illustrates several of the essential characteristics of microsimulation that

were discussed from a more general perspective in earlier sections. The

paper ends with a summary of the main conclusions.

II. - Microsimulation versus macrosimulation

There are numerous conceptual and practical differences between

microsimulation methods and macrosimulation methods. However, the import

ance of these differences should not be overstated. Microsimulation and mac-

rosimulation have a lot of fundamental principles in common. Therefore, in

order to get a better understanding of the differences, we start this section

with a discussion of the common properties of both methods.

/. Common properties Making a population projection is making

statements about the future of the populat

ion. If such statements about the future are to be meaningful, they must

be based on a more or less valid description of the various processes that

govern the population system. In short, population projections must be

based on a model. For the purpose of this paper, we define a model as a

simplified, quantitative description of reality. This definition excludes sev

eral meanings in which the term 'model' is also commonly used. All models

are simplifications, most models describe reality (with varying degrees of

success), but only some are quantitative (Van Imhoff et ai, 1995).

In particular, a projection model does not contain any non-specified par

ameters. This is in contrast to theoretical models, which describe how vari

ables are linked without specifying the exact functional form of the

relationship. This is also in contrast to estimation models, in which the

functional form is specified but the parameters of the function are not.

All demographic projection models are simplified, quantitative descrip

tions of the processes that determine population structures. They are simplified

in the sense that not all variables affecting population structures are included

in the model (they are also simplified in terms of functional form). They are

quantitative in the sense that one set of numbers goes in and another set of 1 00 E. VAN IMHOFF, W. POST

numbers comes out. As a matter of fact, because of the obvious efficiency

gains that are achieved if all necessary calculations are made by computer

rather than by hand, many projection models are also concrete computer pro

grams. Strictly speaking, the computer program that actually produces the

numbers coming out of the model should not be confused with the model

itself. In practice, the term 'model' is frequently also applied to the computer

program, and, admittedly, the dividing line is not always easy to discern.

This latter observation is particularly true for microsimulation models,

which lose virtually all of their usefulness once the computer is taken away.

In principle, an algebraic representation of a model can be manipulated to

study the properties of the model. A macro model which is not too complicated

might be analysed in this way. For more complex macro models, analytic

manipulation becomes infeasible, so that numerical simulation methods have

to be used to study the implications of the model. Micro models typically

are such that either an algebraic representation is impossible, or the algebraic

representation is so complicated that it does not allow analytic manipulation.

Thus, a micro model almost by definition implies numerical simulation, and

numerical simulation implies a computer program.

Once we have a quantitative description of the population system, it

can be used for describing how this system will develop over time, from

the present into the future. Since the model is a simplified description of

reality, it will always contain certain elements that are exogenous to the

model, i.e. their quantitative value is not explained inside the model. For

projection purposes, therefore, the model will have to be supplemented by

hypotheses concerning the future values of these exogenous elements.

Within the context of the projection model, such elements are

usually referred to as model parameters.

Since projection models make statements about the future, they must

always contain the time element in one way or another. In this sense, all

projection models are dynamic by definition. However, there are many

ways in which time can be included into the model. Merely adding an

index t to all model variables and parameters hardly warrants the term

'dynamic'. A truly dynamic model should not only specify what the system

looks like in the year 2000, but also how the system is supposed to get

there. In other words, the processes that underlie the changes in the system

variables should be explicitly included in the model. (2) In a truly dynamic

model, the focus is on

"events rather than things, processes rather than states, as the ultimate com

ponent of the world of reality" (Ryder, 1964, p. 450).

(2) In a way, the dynamic-versus-static issue is gradual rather than fundamental. For

instance, if a demographic forecaster hypothesizes that mortality rates will fall by 10%

between now and the year 2000, one could argue that the mortality component of the model

is static since he does not specify how the mortality rates are going to fall. Similarly, the

headship rate method for producing household projections is generally termed static since

the changes in age-specific headship are not explicitly specified; however, the changes in the

age-specific population size to which these headship rates are applied are explicitly modelled,

so the headship rate model is dynamic at least to some extent. MICROSIMULATION METHODS 101

Now if we recall the elementary example given in the introduction

to this paper, microsimulation and macrosimulation are essentially two al

ternative methods for making similar statements about the future. Given

a description of reality ("the number of births is determined by the number

of women and the age-specific probability of bearing a child") and given

a hypothesis about the future value of the model parameters ("the fertility

rate will be 0.10"), both methods arrive at the same statement about the

future ("the expected number of births will be 10,000"). Of course, this

does not imply that both approaches are equally suitable implementations

for all descriptions of reality. However, conceptually the two approaches

share the essential feature of being based on a simplified description of

the real world. Just as the term 'simulation' suggests, the method of simul

ation, whether micro or macro, is based on the idea of imitating the pro

cess under consideration. Although the real world is imitated by a

simulation model - and, when supplemented by hypotheses on future

values of the parameters, by a projection model as well -, a model

is just a model and therefore not capable of reproducing the real world.

"What we make when we simulate is not a likeness of the operation of the

world, but a likeness of some sets of our own ideas concerning the operation

of the world" (Wachter, 1987).

2. Differences Both the microsimulation approach and the macro-

simulation approach simulate a dynamic process: they

describe the development of a system over time in terms of the events that

underlie the changes of the central variables in the model. Now the essential

feature of an event is that it is of the either/or type: either it happens, or

it doesn't. At the population level one can speak of the 'average' occurrence

of a certain type of event, but this average remains to be ultimately based

on the individual occurrences. Thus, events are random variables that occur

with a certain probability. When making a statement about a certain future

number of events, we are in fact a about the expected

value of a random variable. In doing so, both the microsimulation and the

macrosimulation approach rely upon the Law of Large Numbers. However,

they do so in different ways. A macro model assumes that the size of the

population (100,000 women) is so large that the projected number of events

(births) may be set equal to its expected value (which is 10,000). A micro

model assumes that the number of repetitions of the random experiment

in the sample (1,000) is so large that the resulting projected number of

events (which might be anything) will approximately equal its expected

value (100 in the sample; 10,000 after scaling to the population level).

Since the simulated process is inherently random, any projection into

the future is subject to random variation. The descriptive model is probab

ilistic. Therefore, the corresponding projection model should, in principle,

not only produce an expected value but also an indication of the variation

around the expected value. In macrosimulation, the random nature of the

process is generally disregarded altogether. Things like standard errors 102 E. VAN IMHOFF, W. POST

could be calculated in principle also in macro models, but it is hardly ever

done in practice, primarily because the necessary calculations are extremely

complicated. In contrast, the random nature of the process is explicitly

modelled in microsimulation, viz. in the form of repeated probabilistic ex

periments (drawing random numbers and deciding whether or not the event

should be deemed to have taken place). Thus, the projections produced by

microsimulation are subject to random variation. Performing several model

runs in produces different projections, from which stand

ard errors can be directly calculated. However, it should be added that this

is still insufficiently done in practice. Too often, microsimulators just pro

duce one model run and leave it at that.

At the heart of any modelling exercise lies the specification of the state

space: the representation of the components of the system of interest. At the

individual level, the state space consists of a number of characteristics or

attributes, each of which can take a certain value. At the population level,

the state space consists of all possible combinations of attribute values: it is

a breakdown of the individuals comprising the population by relevant char

acteristics. If there are К attributes and M, categories for attribute i=l,...,K,

the state space at the macro level consists ofM1xM2x...xMt cells: a matrix

of this size is required for a complete description of the population by relevant

characteristics. In contrast, at the micro level each individual is characterized

by a vector of attribute values of length K; a total population of N individuals

can then be described by a matrix with NxK cells.

In macrosimulation, the calculations required for the projection are

carried out in terms of the cells in the aggregate cross-classification table:

for each cell, the projection model should evaluate how the number it con

tains will change over time. Microsimulation, on the other hand, does its

calculations in terms of the individual records: for each individual, the

attribute vector is updated according to the specifications of the model

and the results of the Monte Carlo experiments. This has two important

consequences for the distinction between microsimulation and macrosimul

ation. First, in microsimulation the behavioural equations of the underlying

descriptive model should be reformulated into model specifications at the

individual level; in macrosimulation they should be translated into beha

vioural equations at the aggregate level. Second, the storage and retention

of information in microsimulation occurs via a list of individuals and their

attributes; in macrosimulation this is done via the aggregate cross-classi

fication table. For most applications where a relatively large number of

attributes is considered, the size of the aggregate table, which consists of

M,xM2x ...xMk cells, is much larger than the size of the list, which cons

ists of NxK cells only. We will return to this issue in the next section.

A further difference between microsimulation and macrosimulation

is that the latter works in terms of the population as a whole, while the

former typically in of a sample. There are two main reasons

for this. First, it would be very unpractical - and infeasible even with modern MICROSIMULATION METHODS 103

computer technology - to include a record for each individual member of

the population. Second, microsimulation models typically take into account

a much larger number of covariates than do macro models. The joint dis

tribution of all state variables and covariates is generally unknown at the

population level. Therefore, the necessary data are obtained from sample sur

veys, either cross-sectional surveys or longitudinal panels. These survey data

can be fed directly into the database of individual records on which the micro-

simulation model operates. Naturally, macromodels also frequently rely on

survey data for estimating information that is not available on the population

level. However, the link between the sample and the model is much more

explicit in the case of microsimulation: the list underlying microsimulation

is a sample, while the aggregate table underlying macrosimulation is the total

population, possibly supplemented by sample information.

Another difference, closely related to the previous one, concerns the

relationship between the empirical data feeding the model and the speci

fication of the behavioural equations. In every modelling exercise, there

is always a moment during the stage of model building at which the data

have to be taken into account. In a macro approach, there is a fair degree

of flexibility on this point. Naturally, when specifying a macro model the

state space has to be properly taken into account right from the start, but

for most of the covariates the estimation phase usually comes at a later

stage and relationships can be specified in an indirect way. In contrast, in

a microsimulation approach all the data have to be taken into account from

the very beginning. To see this, we must recall that in microsimulation all

behavioural equations, in principle, operate at the individual level. Theref

ore, all explanatory variables should be available in the records of indi

vidual attributes (possibly including links to other individuals in the

database). What is more, to the extent that these explanatory variables are

allowed to change over time, the model should also include a behavioural

equation for this change. Thus, microsimulation models can be regarded

as models which generate their own explanatory variables.

If we look at this problem from a different angle, we cannot avoid

the fundamental trade-off that any effort in modelling human behaviour

must face. This is the trade-off between information intensity, on the one

hand, and the capacity to make meaningful predictions, on the other hand.

The dependent variables of human behaviour are always stochastic.

Equally, our knowledge of the determinants of human behaviour is far from

being complete. These two facts together imply that there are limits to the

complexity of a projection model: beyond the certain point, the model

becomes so complex that the resulting projections are no longer meaningf

ul, being dominated by randomness. This holds both for macro and micro

models. However, in macro models it is much easier to isolate the central

process from its surroundings, by treating certain variables as being truly

exogenous. In doing so, one in fact acknowledges that partial processes

are insufficiently understood to justify their inclusion in the model. In

micro models, on the other hand, all explanatory variables must be included 1 04 E. VAN IMHOFF, W. POST

at the individual level, and as a consequence, processes for generating time-

dependent explanatory variables (explanatory for the main process) must

be included in the model as well. Thus, macro models suffer from info

rmation loss, while micro models suffer from high data requirements and

a much larger influence of disturbance terms.

A final difference is that, because of the tight link between data and

model in microsimulation models, standardization of computer software is

much more difficult for micro models than for macro models. Existing

microsimulation computer applications are almost impossible to transfer,

and the software is not really user-friendly. Many macro models are much

more accessible because of the availability of excellent software.

III. - The usefulness of microsimulation for demographic

projection purposes

In the preceding section, we have discussed several similarities as

well as differences between microsimulation and macrosimulation ap

proaches. It was stated that and are es

sentially two alternative methods for making similar statements about the

future. However, despite this essential similarity, for practical purposes one

will virtually never be indifferent between microsimulation and macrosimul

ation. Some types of statements about the future are more conveniently

arrived at using a microsimulation approach, others will require a macro-

simulation approach. For concrete research questions, the differences be

tween the two approaches imply that one method has certain advantages

over the other. In this section, we will indicate the circumstances under which

microsimulation might be more useful than more conventional methods.

1. Strong points A first strong point of microsimulation is its per-

of microsimulation formance under conditions of a sizeable state

space. If the number of individual attributes i

ncluded in the model and the number of values that these can

take becomes larger and larger, macro models tend to become unmanag

eable: the size of the state space increases exponentially with the number

of categories included in the model. Recall that the aggregate table in a macro

model contains M , x M2 x . . . x MK cells, while the size of the list (or database)

in a micro model consists of N x K cells. For even moderately sized problems,

the former will be much larger than the latter. As an example, consider a purely

demographic model for France, in which the population is classified by:

— sex (males/females: 2 categories);

— parity (females only; 0,..., 5+: 6 categories);

— current age (0, ..., 99+: 100