Deconvolution problems in density estimation [Elektronische Ressource] / vorgelegt von Christian Wagner

Publié par

Deconvolution problemsin density estimationDissertationzur Erlangung des Doktorgrades Dr. rer. nat.der Fakultät für Mathematik und Wirtschaftswissenschaftender Universität Ulmvorgelegt vonChristian WagnerausÖhringen2009Amtierender Dekan: Prof. Dr. Werner KratzErstgutachter: Prof. Dr. Ulrich StadtmüllerZweitgutachter: Prof. Dr. Volker SchmidtTag der Promotion: 8. Juni 2009ContentsIntroduction and Summary iiiMotivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiConcepts in density estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiFocus of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vDeconvolution of densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viResults about approximative deconvolution estimators . . . . . . . . . . . . . . . . . viiResults about contaminated-data-only models . . . . . . . . . . . . . . . . . . . . . viiiAggregated data models and corresponding results . . . . . . . . . . . . . . . . . . . . . . xConclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Density Estimation Methods 11.1 Direct density estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Quality measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Asymptotic properties of the direct density estimation . . . . . . . . . . . . .
Publié le : jeudi 1 janvier 2009
Lecture(s) : 19
Tags :
Source : VTS.UNI-ULM.DE/DOCS/2009/6858/VTS_6858_9504.PDF
Nombre de pages : 190
Voir plus Voir moins

Deconvolution problems
in density estimation
Dissertation
zur Erlangung des Doktorgrades Dr. rer. nat.
der Fakultät für Mathematik und Wirtschaftswissenschaften
der Universität Ulm
vorgelegt von
Christian Wagner
aus
Öhringen
2009Amtierender Dekan: Prof. Dr. Werner Kratz
Erstgutachter: Prof. Dr. Ulrich Stadtmüller
Zweitgutachter: Prof. Dr. Volker Schmidt
Tag der Promotion: 8. Juni 2009Contents
Introduction and Summary iii
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Concepts in density estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Focus of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Deconvolution of densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Results about approximative deconvolution estimators . . . . . . . . . . . . . . . . . vii
Results about contaminated-data-only models . . . . . . . . . . . . . . . . . . . . . viii
Aggregated data models and corresponding results . . . . . . . . . . . . . . . . . . . . . . x
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 Density Estimation Methods 1
1.1 Direct density estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Quality measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Asymptotic properties of the direct density estimation . . . . . . . . . . . . . 4
1.2 Density deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Classical consistency results . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Supersmooth target densities . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.3 Approximative deconvolution methods . . . . . . . . . . . . . . . . . . . . . 14
1.3 Aggregated data models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Unknown Error Density 19
2.1 TAYLEX and SIMEX estimators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.1 Justification of the appearing bias reduction . . . . . . . . . . . . . . . . . . 19
2.1.2 Consistency results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.4 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2 Modified variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.2.1 Model and estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.2.2 Consistency results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.2.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.2.4 Proof of Theorem 2.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.3 Additional error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.3.1 Model and estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.3.2 Consistency results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.3.3 Proof of Theorem 2.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
iii Contents
3 Aggregated Data Models 103
3.1 Estimators and assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.2 Consistency and minimax rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.3 Properties of the unweighted estimator . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.4 Aggregated data models in density deconvolution . . . . . . . . . . . . . . . . . . . 112
3.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.5.1 Proofs of Lemmas 3.2.1 and 3.2.2 . . . . . . . . . . . . . . . . . . . . . . . 113
3.5.2 Proofs of Theorems 3.2.1 and 3.2.2. . . . . . . . . . . . . . . . . . . . . . . 116
3.5.3 Proof of Theorem 3.2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.5.4 Proof of Theorem 3.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.5.5 Proof of Theorem 3.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
A Appendix 141
A.1 Spaces of continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
A.2 Integration theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
A.3 Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
A.4 Characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
A.5 Sobolev spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
A.6 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
B Auxiliary Results 149
List of Figures 155
List of Symbols 157
Bibliography 161
German Summary 165Introduction and Summary
Motivation
In many circumstances repeated measurements of a quantity are observed and one would like to
gain as much information as possible from these observations. Examples are the log return per
day of a company share, or the policy holders’ lifespans for life insurances. Other quantities of
interest could be participants’ blood pressures in a clinical study or households’ consumptions
of electricity per year. There are a variety of other situations where repeated observations are
possibleforinstanceinbiology, geology, otherfieldsofscienceorsocialscience. Instatisticsthese
repeated measurements are often modeled as realisations of random objects, so-called random
variables. Under this assumption the first characteristics of the data one might consider are the
mean or the variance. Yet, both values contain only partial information about the distribution
of a random variable, whereas complete information is given by the cumulative distribution
function, with its empirical counterpart, the so-called empirical distribution function. However,
when plotting the empirical distribution function, one only receives a step function and it is
hard to obtain more detailed information from this graph. In case of an absolutely continuous
random variable, the density function of the quantity of interest can be estimated to circumvent
this downside. Using an appropriate estimate of the density function allows for information
about modes, symmetry, and frequent values of the random variable to be gathered. Even more
information about modes or the change in frequency of the random variable’s values should be
attainable through an estimate of the density’s derivatives. Both the estimation of a density and
its derivatives will be addressed in this work. However, since both problems lead to comparable
considerations, the focus is on density estimation in the introduction.
Concepts in density estimation
There are two basically different approaches for estimating a density. The first is the parametric
density estimation approach, where one assumes that the observations come from a parametric
family of densities that has to be specified in advance. Then, the task is to estimate the pa-
rameters that fit the data best. The second approach is the nonparametric density estimation,
where one does not impose a certain functional form on the density. Instead, one tries to find an
estimate using only minor assumptions such as some smoothness of the density. The parametric
estimation procedure has advantages but also large drawbacks. Foremost, it is difficult to specify
an appropriate parametric family since this information might often be not directly accessible
from the situation, where the data was observed. However, a misspecification of the family of
densities will lead to an estimate that does not capture important structures of the density; a
fact that contradicts the objective of obtaining as much information from the data as possible.
Additionally, even if one insists on using a parametric approach, a nonparametric estimate will
give a good starting point to find an appropriate model. For this reason, nonparametric density
estimation will be studied in more detail here.
iiiiv Introduction and Summary
There are various density estimation procedures in nonparametric density estimation like
orthogonal series methods and histograms with fixed or random partition, among others. His-
tograms with fixed partition were for example studied in Révész [1971], for an introduction to
the other mentioned methods see for instance Prakasa Rao [1983]. However, the estimators most
commonly used in nonparametric density estimation are the so-called kernel density estimators
introduced in Rosenblatt [1956] and studied in more detail in Parzen [1962]. An estimator of this
type has already been studied in a less general setting in Akaike [1954]. Due to its importance
the explicit formula will be given here. For n independent random variables X ,...,X that1 n
are identically distributed as X, the kernel density estimator of the density f at the point ξ isX
given by nX1 ξ−Xjˆf (ξ) = K ,X nh h
j=1
wherehisapositiverealnumber,theso-calledbandwidth,andK(y)isaso-calledkernelfunction.
For instance, the kernel could be a standard normal density.
The kernel density estimator is not very sensitive to the choice of the kernel, in contrast
to the choice of the bandwidth h. The bandwidth’s importance is justified by the observation
that the kernel density estimator is in general not an unbiased estimator of f , which meansX
that its expected value is not the value f (ξ) itself. However, it is common for estimators inX
nonparametric estimation procedures to be biased. Because of this bias, there are two opposing
objectives to find an appropriate bandwidth. On the one hand, it would be good to choose h
small such that only observations very close to ξ have an impact on the density estimate at the
point ξ. This approach will usually give an estimate with a small bias, whereas it has a large
variance since only very few observations determine the value of the estimator. On the other
hand, it would therefore be good to utilize a largeh in order to use many observations, resulting
in a small variance but a large bias. This so-called bias variance tradeoff is an intrinsic difficulty
of nonparametric estimation procedures. Since this tradeoff can be controlled by the choice of
the bandwidth, it is very important to find appropriate ones.
In finite samples, there are various procedures to find reasonable bandwidths. A popular ap-
proachinthiscontextisleastsquarescrossvalidationdevelopedindependentlyinRudemo[1982]
and Bowman [1984]. Another commonly used technique is the plug-in bandwidths selection, in
which an approximative formula for the so-called mean integrated squared error (MISE), defined
in(1.1.5)below,isderivedandsubsequentlyminimizedinh. Furthermethodsare,amongothers,
smoothedcross-validation,seeMüller[1985]andStaniswalis[1989],orempirical-biasbandwidths
selection (EBBS), see Ruppert [1997].
Inthecontextofthiswork, however, asymptoticallyoptimalbandwidthschoicesarerelevant.
This means that a sequence of bandwidths h with good properties has to be found. Thus, itn
is the aim to find sequences h such that the distance of the resulting estimate to the truen
underlying density f tends to zero with the fastest possible rate.
ˆTherefore, some concept of distance between an estimator f and the true density f isn
required. One possible approach is to define the error only at a single point, but one estimates
a function defined on the whole real line, hence it is preferable to measure the distance on the
whole real line as well. The two mostly studied measures of deviation on the real line are the
2 ˆMISE, which gives the expected value of the squared L -distance between the estimator f andn
the target density f, and the mean integrated absolute error (MIAE), defined in (1.1.7) below,
1 1which utilizes the L -distance. Although using the L -norm seems to give the correct distance,
considering that the distance between densities has to be evaluated, the MISE is more commonly
used. This popularity is justified by the facts that the MISE allows a direct decomposition in
a bias and variance part, see (1.1.6) below, and its easier manageable computational properties.Introduction and Summary v
Results about the MIAE as a measure of quality in density estimation are for instance given in
Devroye and Györfi [1985], Devroye [1987], and Eggermont and LaRiccia [2001].
In order to find the fastest possible rates of convergence for the distances introduced above,
it is necessary to restrict the considerations to subsets of all densities, so-called density classes,
suchthatthedifficultyofthedensityestimationiscomparableforthewholesubset. Theoptimal
rates of convergence of the MISE for different density classes were first studied in Watson and
Leadbetter [1963]. In Davis [1977], it is proved that the usual kernel density estimator reaches
these rates when using a special kernel, the so-called sinc-kernel defined in (1.1.8) below. It is
not clear in advance, however, that choosing appropriate bandwidths h the kernel estimatorn
can also reach optimal rates of convergence for more general kernels. Nonetheless, e.g. for the
MIAE in Devroye [1987], see Theorem 1.1.2 below, a bandwidths choice can be found such that
the kernel estimator reaches the fastest attainable rates.
The choice of the optimal bandwidths and the optimal convergence rates usually depend on
n and parameters influenced by the kernel K and the underlying density f that are in general
unknowninadvance. Hence, itcouldbearguedthatforthefinitesamplecasetheasymptotically
optimal bandwidths choices are not helpful. However, to analyse the convergence properties of
an estimator for growing sample size n and to compare the derived rates to the best attainable
ones in the corresponding situation are important questions in their own right. This importance
is justified by the fact that for growing sample size these rates indicate how large the error
improvement is that one can hope for.
Theparameterthatessentiallydeterminestheoptimalconvergenceratesisthesmoothnessof
thedensityf. Inshort,thesmootherf is,thefasteristheattainablerateofconvergence. Yet,for
commonsmoothnessclassesthebestattainableratesoftheMISE, thatcanbereacheduniformly
over the whole class, are slower than the usual parametric rate 1/n, see Theorem 1.1.4 below.
It is nonetheless possible to reach the rate 1/n here too, as proved in Watson and Leadbetter
[1963]. Inthisreferenceitisshownthatfordensitiesf withcharacteristicfunctionwithbounded
support the rate 1/n is attainable. Moreover, it is proved there that this rate is the best rate any
density estimator can reach for arbitrary densities f. It is important to note that to reach the
rate 1/n in parametric density estimation it is mandatory that the density follows a parametric
model and one specified the correct one beforehand.
Focus of this work
Although density estimation is well studied, there are many settings where classical approaches
are not directly applicable. From the introductory examples it can already be seen that in
many practical applications direct access to the data of interest might not be possible. In such
situations, the target densityf has to be restored from the data. With the purpose of explaining
this problem further, the example of the blood pressure from above is considered in more detail.
There, the observations might additionally depend on the time of day the blood pressure was
measured, the person that performed the measurement, or some plain measurement error. Yet,
of interest is only the participant’s true blood pressure. Such models where the data is not
observable directly but only contaminated with some unobservable additive effects are so-called
errors-in-variables models. In this situation a so-called deconvolution problem for the densities
has to be solved, which will be explained a little later.
Another possible situation, where reconstruction of the density is crucial, is exhibited in the
model of electricity consumption. Here one might not only be interested in the consumption
per household but also in the consumption per individual. In this setting, however, a large
amount of the data is not a direct observation of an individual person’s consumption but the
consumption of a larger household. Hence, these observations are the sum of the consumptionsvi Introduction and Summary
of more than one individual. Here, the data obtained from the larger households cannot be used
directly, whereas from a statistical point of view it would be desirable to include this data in
an estimation procedure. This type of models are called aggregated data models, it also will be
introduced in more detail in the next sections.
The interest in this work is on introducing estimators for the different studied settings that
fit realistic datasets better than the classical approaches. For all estimators their respective con-
vergence properties will be analysed and, in particular, optimal rates will be derived if possible.
Therefore, all proved rates will be compared to the - under additional assumptions - optimal
rates that are known and in one situation a minimax rate of convergence will be proved. Since
the interest is on asymptotic properties, data dependent choices of the parameters will not be
addressed here.
Deconvolution of densities
As explained above, for many realistic datasets an errors-in-variables model is useful. Further
examples and justifications for such models can for instance be found in Carroll et al. [1995]. In
these models, the observable quantity is usually modeled as a random variableW, which can be
written as the sum of the random variable X of interest and the error variableε. Consequently,
the observable random variable is given by W =X +ε, whereX andε are assumed to be inde-
pendent. Hence, one can only observe a sample from the convoluted distribution and assuming
X and ε to be absolutely continuous from the convoluted density f = f ∗f . Thus, beingW X ε
interested in the density f and requiring f to be known, a deconvolution problem has to beεX
solved. Usually these problems are easier to solve on the Fourier domain.
To find an estimator for f , one uses the fact that a convolution becomes a multiplicationX
on the Fourier domain, i.e. ϕ (t) = ϕ (t)ϕ (t), where ϕ (t), ϕ (t) denote the characteristicε εW X W
function of the random variables W and ε respectively. Hence, if ϕ (t) is nonzero on the wholeε
real line, it is possible to evaluate ϕ from ϕ (t) = ϕ (t)/ϕ (t). A commonly used idea isX X W ε
ˆto find an estimator of ϕ (t), called φ (t). Afterwards, this estimator is divided by ϕ (t) andW W ε
ˆFourier inversion is applied to define an estimator forf . In general, the quotientφ (t)/ϕ (t) isX W ε
not integrable, so some regularization technique for the inverse Fourier transform is needed. The
amountofnecessaryregularizationdependsheavilyuponthebehaviourofϕ (t)as|t|approachesε
infinity,theso-calledtailbehaviour. Inordertodistinguishdifferenttailbehaviours,twodifferent
types of density classes are usually studied. First, for so-called supersmooth random variables
ε the characteristic function ϕ is supposed to have exponential decay, see (1.2.6) below forε
an exact definition. This exponential decay implies that f is infinitely often differentiable, seeε
Theorem A.5.3 below. The second density class studied consists of so-called ordinary smooth
random variables ε, where the characteristic function ϕ is supposed to have polynomial decay,ε
see (1.2.7) below for an exact definition. Here the decay implies the existence of finitely many
derivatives of f , see again Theorem A.5.3.ε
For the best attainable convergence rates it is very important whether the error density is
ordinary smooth or supersmooth. In case of an ordinary smooth density f the optimal ratesX
are algebraic if the known error density is also ordinary smooth, see Theorem 1.2.6 below for a
precisestatement,whereasthebestattainableratesincaseofa knownsupersmootherrordensity
f are only logarithmic, see also Theorem 1.2.6. These logarithmic rates are rather unpleasantε
since many popular densities are supersmooth, like the normal density for instance. The first
proofs of lower bounds for the rates were given in Fan [1991a]. More precisely, it is shown in this
reference that the convergence rates mentioned before are optimal for the estimation of a density
pand its derivatives at a point ξ. The same result for L -norms over bounded intervals is shown
in Fan [1993]. Yet, in case the density f is supersmooth faster rates are attainable. There inX

Soyez le premier à déposer un commentaire !

17/1000 caractères maximum.