UniversitØ Joseph Fourier de Grenoble (UJF)
Extraction of Three-dimensional Information from Images
Application to Computer Graphics
Extraction d’informations tridimensionnelles partir d’images
Application l’informatique graphique
Sylvain PARIS
ThŁse prØsentØe pour l’obtention du titre de
Docteur de l’UniversitØ Joseph Fourier
SpØcialitØ Informatique
ArrŒtØ MinistØriel du 5 juillet 1984 et du 30 mars 1992
PrØparØe au sein du laboratoire
ARTIS-GRAVIR/IMAG-INRIA. UMR CNRS C5527.
Composition du jury :
Fran ois SILLION Directeur de thŁse
Georges-Pierre BONNEAU PrØsident du Jury
Wolfgang HEIDRICH Rapporteur
Bernard P ROCHE
Long QUAN Examinateur2Contents
1 Introduction 1
2 Surface reconstruction 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Problem statement and design of the functional . . . . . . . . . . . . . . . . . . . . 37
2.4 General presentation of graph cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.5 Global discrete solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.6 Practical algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3 Patchwork reconstruction 75
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2 Motivation and concept de nition . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.3 Implementation using graph cut and distance eld . . . . . . . . . . . . . . . . . . . 80
3.4 Two practical algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4 Face relighting 91
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3 Overview of the technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.4 Detail texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.5 Parameters of the skin model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.6 Implementation of the rendering engine . . . . . . . . . . . . . . . . . . . . . . . . 121
4.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.8 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5 Capture of hair geometry 131
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.2 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.4 Orientation of the segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.5 Practical implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.6 Captured hair results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
iContents Contents
6 Conclusions 163
6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Appendices 167
A Technical details and perspective on surface reconstruction 167
A.1 First-order and second-order regularization terms . . . . . . . . . . . . . . . . . . . 167
A.2 Some ideas to extend the functional . . . . . . . . . . . . . . . . . . . . . . . . . . 170
B Technical details on hair capture 175
B.1 In uence of the Projection Pro le on Canny’s Filter . . . . . . . . . . . . . . . . . . 175
B.2 More gures on the orientation measure . . . . . . . . . . . . . . . . . . . . . . . . 177
B.3 Geometric registration of the viewpoints . . . . . . . . . . . . . . . . . . . . . . . . 179
C RØsumØ fran ais 181
C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
C.2 Reconstruction de surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
C.3 de patchwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
C.4 RØ-Øclairage de visage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
C.5 Capture de la gØomØtrie d’une chevelure . . . . . . . . . . . . . . . . . . . . . . . . 196
C.6 Conclusion gØnØrale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
List of Figures 205
List of Tables 209
Bibliography 211
iiQue soient ici remerciØes toutes les personnes qui ont contribuØ cette thŁse.
Je n’en serais pas l sans l’Øducation que ma famille m’a offerte depuis le dØbut. Bien Øvidemment,
il n’y aurait pas eu de thŁse non plus sans Fran ois dont les qualitØs font largement oublier le fait qu’il
n’ait pas de stylo. C’est une vraie chance d’avoir pour directeur de thŁse quelqu’un qui m’a encadrØ
sans m’enfermer, qui m’a laissØ libre de mes choix sans m’abandonner. Merci tous mes collŁgues
du labo pour la richesse de leur environnement et pour leurs innombrables coups de main. Je tiens
remercier plus particuliŁrement Samuel, StØphane et Sylvain: ce fut un rØel plaisir de traverser ces
trois ans de thŁse en mŒme temps qu’eux; et aussi Gilles, Joºlle et Xavier qui m’ont vu plus que
rØguliŁrement dØbouler dans leur bureau et qui ont toujours pris le temps de me dØpanner et surtout
de m’apprendre les us et coutumes du milieu . Une pensØe aussi tous ceux qui m’ont accueilli
chaleureusement Hong Kong et plus particuliŁrement Long. Merci Øvidemment la DGA pour
avoir nancØ mes travaux, et spØcialement aux membres de la commission de suivi pour leurs conseils
avisØs. Je suis trŁs er d’avoir eu Wolfgang, Long, Bernard, Georges-Pierre et Fran ois dans mon jury
pour conclure cette thŁse. Je suis tout particuliŁrement reconnaissant envers Wolfgang et Bernard pour
leur travail sur mon manuscrit, et envers Wolfgang et Long pour s’Œtre dØplacØs depuis Vancouver et
Hong Kong pour ma soutenance. En n, merci mes amis, ceux qui m’ont rendu la vie Grenoble
agrØable et ceux qui ont gardØ le contact mŒme si j’Øtais loin.
J’ai choisi de ne pas faire une liste de noms interminable car j’en aurais forcØment oubliØs. Mais si
vous m’avez aidØ Œtre l , si nous avons partagØ une discussion, un problŁme, un bug, une deadline, un
thØ, une biŁre, une rando, une soirØe, une galŁre, un fou-rire... ces remerciements vous sont destinØs.
iiiContents Contents
iv1
Introduction
Ce manuscrit est en anglais, un rØsumØ fran ais This dissertation is in English, a French sum-
est proposØ en annexe C, page 181. mary can be found in Appendix C on page 181
This dissertation focuses on the creation of the information used in Computer Graphics. We
especially concentrate on the data needed to render images. These data can be roughly categorized
into three types: the shape of the objects, their appearance and their lighting environment. We are
mainly interested in the creation of the rst two data types (the object shape and appearance), even
if we regularly deal with the light. This thesis proposes several methods to generate these data using
real images: We do not ask the user to work directly on the object modeling but we rely on her to
provide one or several images of the targeted object. These pictures are then automatically analyzed
by the computer that extracts the sought data.
We expect from this approach data more faithful to the original object and a shorter creation time.
Let’s go in deeper details with a practical example.
Consider a simple Computer Graphics task, let’s say producing the data needed to render an image
of a teapot, the basic approach is to rely on human skills: The teapot shape is designed in a software
dedicated to 3D modeling and a porcelain appearance is de ned through dialog boxes that let the user
choose the characteristics of the (color, shininess, patterns, etc). This modeling process is
especially time-consuming since the user is in charge of creating everything. The quality of the
result also depends on the user pro ciency. A faster approach is to select these data from a library of
previously created shapes and materials. It eases the current task but it does not address the issue of
the creation the original data within the library.
Once these data have been created, there are several well-known methods to render the teapot
image. This rendering step is out of the scope of this dissertation.
Modeling an existing object
But now, consider a trickier case. One does not want a teapot anymore but the teapot she uses every
day. A direct approach is to rst create a standard teapot and then to work from it to match theChapter 1. Introduction
original teapot. Producing an approximate match is feasible. But imagine that the teapot is decorated
with some carved patterns and that there are some golden paintings over the base material. In that case,
an accurate match is almost intractable for the user. One has to use techniques more sophisticated than
user-driven modeling.
To capture the shape of an object, one can use a 3D scanner: It is an apparatus with a laser or a
beam of modulated light that measures the 3D shape of an object. The acquisition of the material can
also be done using a goniore ectometer [221]: The behavior of the material with the light is measured
for all possible light positions and view directions. Both techniques are accurate but require speci c
equipments and con guration. This often hinders the use of these techniques. In addition, several
cases foil down these measures. For instance, 3D scanners suffer from translucent and dispersive
materials (e.g. glass, fur, hair).
Our proposal
In this dissertation, we present an alternative approach based on images. For the teapot problem,
we propose to use one or several photographs of the original teapot to extract the data needed to
reproduce it. Then, an analysis is performed to produce the useful data from these images. Compared
to a user-driven approach, we expect several improvements:
Shorter user time: Obviously, an image-based algorithm requires less user time. It may last longer
because of numerical computation. However, during this step, the user can work on other tasks.
Objective characterization of the data: If one asks several persons to determine the shininess of the
same teapot, the answers are likely to be different. The different values result from different
subjective evaluations and it would be hard to choose the right one. On the other side, an
algorithm has a deterministic criterion whose accuracy can be studied. In addition, the measure
is reproducible.
More details: When it comes to reproduce an existing object, the user may miss some details and
features whereas an automatic reconstruction performs an exhaustive scan of the images.
However, it does not mean that we aim at replacing the user-driven creation. We propose a com-
plementary way to produce 3D data. Our approach is compliant with the user-driven and library-based
creation process: The shape stemming from the images can be further edited by the user and/or stored
in a shape library for later reuse.
Robustness
Unfortunately, we believe that perturbation is inherent in the capture process. The images are in-
evitably corrupted by some noise, blur, distortion, etc. The question is how we handle that fact. There
are two extreme solutions: On the one hand, we can allow the user to shoot images without any con-
straints. This implies that the effort must be done during the analysis step. The underlying algorithm
must be robust to be able to extract satisfying data. The counterpart is an easy acquisition process
accessible to a non-specialist user. On the other hand, the capture setup can be totally constrained
(e.g. dark room, quality and calibrated optical lens, professional camera, robotic gantry, etc). This
requires a specialist to drive the process. But the input data can be considered perfect . It allows the
algorithm to almost ignore the robustness issue and to focus on the precision. Some researchers have
2Chapter 1. Introduction
described intermediate solutions that impose a limited set of constraints to the user in order to extract
more accurate data. This de nes a continuous trade-off between the ease of acquisition and accuracy.
In this dissertation, we have deliberately chosen an approach that is more oriented toward the ease
of acquisition. This does not mean that accuracy is neglected. It simply implies that we strive to design
robust algorithms: We expect them to cope with potentially perturbed input in order to alleviate the
requirements on the user side. This also implies that the accuracy may not be always comparable to
what can be extracted from a highly controlled environment. Nonetheless, we expect the following
advantages:
A less cumbersome acquisition process: The input data come from a classical digital camera or from
a movie camera. Nowadays, quality cameras of small size are commonly available. In addition,
if we need a dark room, we strive to be robust enough to work with a common room with only
the lights turned off. We will not require a room with black matte walls and a blue screen.
A more exible acquisition system: From the same images, we can use different analyses depending
on the observed content whereas dedicated apparatuses have to be entirely changed when they
do not t with the targeted object.
A better correspondence with input images: In some cases, we may be able to work with an arbitrary
background. This makes possible to work with images of an object within its context. If the task
is to modify the original pictures (e.g. modify the appearance of an object, insert an additional
object), the data obtained directly from such images are likely to be more consistent than the
ones stemming from a technique that separates the object and its environment (e.g. they may
suffer from misalignment).
A better balance of the extracted details: Acquisition from images is more likely to capture the visu-
ally important features whereas other techniques (e.g. laser scanners) may recover unnoticeable
details and miss some others that may have a small size but a large visual impact.
Our approach: case study
Then comes a crucial question: What do we aim for? What kind of information do we extract? We
have a personal conviction that the general case is not tractable: Robustly acquiring the geometry and
the appearance of an object without any a priori knowledge cannot be done. We believe that a reason-
able approach is to focus on a typical scenario for which we can rely on some known characteristics.
One can then imagine to develop various scenarii and let the user select the appropriate algorithm. The
caveat would be to multiply the number of scenarii but as long as we study scenarii of broad interest,
we are convinced that this approach is valid and ef cient.
Therefore, we have chosen this approach in this dissertation. We identify a few useful cases
leading to interesting applications and concentrate on them. We address three main issues. We rst
present in Chapter 2 a method to recover the surface of a matte object from a short image sequence
whose viewpoint is moving. This approach is extended to a more general set of images in Chapter 3.
This technique is designed to be generic i.e. we avoid speci c assumptions to handle as general objects
as possible.
We then show in Chapter 4 how the appearance of a human face can be recovered from a single
image and how the extracted data can be used to render the original face under a new lighting envi-
ronment. We end this document with Chapter 5 that exposes a technique to capture the hair geometry
3Chapter 1. Introduction
using multiple images from a xed viewpoint and a moving light. These last two cases (face and
hair) target speci c entities that are really important characteristic features of a person. High accuracy
is mandatory to provide a visual match that makes the original person recognizable. We therefore
develop dedicated tools to reach a precision higher than user-based and generic techniques. General
conclusions are given in the last chapter.
4