Suivi automatique de la main à partir de séquences vidéo monoculaires, Model-based 3D hand pose estimation from monocular video

Thesee - Martin De La Gorce

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

229 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Sous la direction de Nikos Paragios
Thèse soutenue le 14 décembre 2009: Ecole centrale Paris
Dans cette thèse sont présentées deux méthodes visant à obtenir automatiquement une description tridimensionnelle des mouvements d'une main étant donnée une séquence vidéo monoculaire de cette main. En utilisant l'information fournie par la vidéo, l'objectif est de déterminer l'ensemble des paramètres cinématiques nécessaires à la description de la configuration spatiale des différentes parties de la main. Cet ensemble de paramètres est composé des angles de chaque articulation ainsi que de la position et de l'orientation globale du poignet. Ce problème est un problème difficile. La main a de nombreux degrés de liberté et les auto-occultations sont omniprésentes, ce qui rend difficile l'estimation de la configuration des parties partiellement ou totalement cachées. Dans cette thèse sont proposées deux nouvelles méthodes qui améliorent par certains aspects l'état de l'art pour ce problème. Ces deux méthodes sont basées sur un modèle de la main dont la configuration spatiale est ajustée pour que sa projection dans l'image corresponde au mieux à l'image de main observée. Ce processus est guidé par une fonction de coût qui définit une mesure quantitative de la qualité de l'alignement de la projection du modèle avec l'image observée. La procédure d'ajustement du modèle est réalisée grâce à un raffinement itératif de type descente de gradient quasi-newton qui vise à minimiser cette fonction de coût.Les deux méthodes proposées diffèrent principalement par le choix du modèle et de la fonction du coût. La première méthode repose sur un modèle de la main composé d'ellipsoïdes et d'une fonction coût utilisant un modèle de la distribution statistique de la couleur la main et du fond de l'image.La seconde méthode repose sur un modèle triangulé de la surface de la main qui est texturé est ombragé. La fonction de coût mesure directement, pixel par pixel, la différence entre l'image observée et l'image synthétique obtenue par projection du modèle de la main dans l'image. Lors du calcul du gradient de la fonction de coût, une attention particulière a été portée aux termes dûs aux changements de visibilité de la surface au voisinage des auto-occultations, termes qui ont été négligés dans les méthodes préexistantes.Ces deux méthodes ne fonctionnement malheureusement pas en temps réel, ce qui rend leur utilisation pour l'instant impossible dans un contexte d'interaction homme-machine. L'amélioration de la performance des ordinateur combinée avec une amélioration de ces méthodes pourrait éventuellement permettre d'obtenir un résultat en temps réel.
-Suivi de la main
-Modèle articulé
-Occultations
In this thesis we propose two methods that allow to recover automatically a full description of the 3d motion of a hand given a monocular video sequence of this hand. Using the information provided by the video, our aimto is to determine the full set of kinematic parameters that are required to describe the pose of the skeleton of the hand. This set of parameters is composed of the angles associate to each joint/articulation and the global position and orientation of the wrist. This problem is extremely challenging. The hand as many degrees of freedom and auto-occlusion are ubiquitous, which makes difficult the estimation of occluded or partially ocluded hand parts.In this thesis, we introduce two novel methods of increasing complexity that improve to certain extend the state-of-the-art for monocular hand tracking problem. Both are model-based methods and are based on a hand model that is fitted to the image. This process is guided by an objective function that defines some image-based measure of the hand projection given the model parameters. The fitting process is achieved through an iterative refinement technique that is based on gradient-descent and aims a minimizing the objective function. The two methos differ mainly by the choice of the hand model and of the cost function.The first method relies on a hand model made of ellipsoids and a simple discrepancy measure based on global color distributions of the hand and the background. The second method uses a triangulated surface model with texture and shading and exploits a robust distance between the synthetic and observed image as discrepancy measure.While computing the gradient of the discrepancy measure, a particular attention is given to terms related to the changes of visibility of the surface near self occlusion boundaries that are neglected in existing formulations. Our hand tracking method is not real-time, which makes interactive applications not yet possible. Increase of computation power of computers and improvement of our method might make real-time attainable.
-Hand tracking
-Deformable model
-Model-based shape from shading
Source: http://www.theses.fr/2009ECAP0045/document

Informations

Publié par	Thesee
Nombre de lectures	17
Langue	English
Poids de l'ouvrage	9 Mo

Extrait

ECOLE CENTRALE DE PARIS
P H D T H E S I S
to obtain the title of
PhD of Science
of Ecole Centrale de Paris
Specialty : Applied Mathematics
Defended by
Martinde La Gorce
Model-based 3D Hand Pose
Estimation from Monocular
Video
Thesis Advisor: Nikos Paragios
prepared at Ecole Centrale de Paris, MAS laboratory
defended on December 14, 2009
Jury :
Reviewers : Dimitri Metaxas - Rutgers University
Pascal Fua - EPFL
Advisor : Nikos Paragios - Ecole Centrale de Paris
Examinators : Radu Patrice Horaud - INRIA
Renaud Keriven - Ecole de Ponts Paritech
Adrien Bartoli - University d’Auvergne
Bjorn Stenger - Toshiba Research
Invited : David Fleet - University of Toronto.
tel-00619637, version 1 - 6 Sep 2011tel-00619637, version 1 - 6 Sep 2011Acknowledgments
I would like to thank the people that helped me during preparing my PhD these
last four years.
Thank to my PhD advisor, Nikos Paragios, who has been supportive and had
conﬁdenceinmywork. HegavemethefreedomIneeded,proposedmeveryrelevant
directions and has been precious in is help to communicate and yield visibility to
my research results. I also greatly appreciated his encouragements to establish
international collaborations by visiting other prestigious research centers.
Thank to David Fleet for the extremely productive collaboration we started
during a two month visit in the university of Toronto. While discussing the results
I obtained with the method presented in the third chapter of this manuscript,
he made a simple but sound remark that motivated the direction taken in the
fourth chapter: “if you needed to add shading onto your hand model to get a
good visualization of your results, that means that you should add shading in the
generative model you use for the tracking”. I deeply appreciated his strive for
perfectionintheexplanationofscientiﬁcideasandhisenthusiasmtodiscussabout
in-depthtechnicalaspectsaswellasthegreatchallengesoftheﬁeldingeneral. His
help as also be precious in the writing of this manuscript.
I am grateful to my thesis rapporteurs Dimitri Metaxas and Pascal Fua, for
havingkindlyacceptedtoreviewthiswork. Iappreciatedtheirvaluablecomments.
ThanktoRaduPatriceHoraud,RenaudKeriven,AdrienBartolyandBjornStenger
for examining it and for the constructive discussion during the thesis defense.
Thanks to all the current and former members of the Medical Imaging and
ComputerVisionGroupattheAppliedMathematicsDepartmentinEcoleCentrale
for the friendly international environment and the great working atmosphere. In
particularIwouldliketothankChaohuiWangandMicka¨elSavinaudforourfruitful
collaboration. Thank to Noura and Regis for their moral support. Thanks to my
friendsGeoﬀrayandRomainforhavingconﬁrmingmebytheirexampleintheidea
that having hobbies is a necessary condition to do a good PhD thesis. And ﬁnally,
thank to all my other friend and my family members who have been supporting
during these four years.
tel-00619637, version 1 - 6 Sep 2011tel-00619637, version 1 - 6 Sep 2011Index
1 Introduction 5
1.1 General introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Applications of Hand tracking . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Animation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Quantitative Motions analysis . . . . . . . . . . . . . . . . . . 8
1.2.3 Sign Language Recognition . . . . . . . . . . . . . . . . . . . 8
1.2.4 2D human-Computer interaction . . . . . . . . . . . . . . . . 10
1.2.5 3D human-Computer interaction . . . . . . . . . . . . . . . . 10
1.2.6 The hand as a high DOF control device . . . . . . . . . . . . 11
1.3 Hand Pose Estimation Scientiﬁc Challenges . . . . . . . . . . . . . . 11
1.4 Contributions & outline . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.1 Part-based Hand Representation and Statistical Inference . . 15
1.4.2 Triangular Mesh with Texture & Shading . . . . . . . . . . . 16
1.4.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 State of the art 19
2.1 Acquisition framework . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.1 Monocular setting . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.2 Small baseline stereo setting. . . . . . . . . . . . . . . . . . . 20
2.1.3 Wide baseline setting . . . . . . . . . . . . . . . . . . . . . . 20
2.1.4 Other settings . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.5 Markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.6 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Model-based tracking. . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.1 General principle . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Hand models . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.3 Images features . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.4 Fitting procedures . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3 Discriminative methods / Learning-Based Methods . . . . . . . . . . 43
2.3.1 Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.3.2 Database indexing methods . . . . . . . . . . . . . . . . . . . 44
2.3.3 Regression techniques . . . . . . . . . . . . . . . . . . . . . . 46
2.4 Other approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.5 Related literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.5.1 Non-optical systems . . . . . . . . . . . . . . . . . . . . . . . 48
2.6 Limitations of existing methods . . . . . . . . . . . . . . . . . . . . . 49
tel-00619637, version 1 - 6 Sep 2011iv Index
3 Silhouette Based Method 53
3.1 Method overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Articulated model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.1 Forward kinematic . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.2 Forward Kinematic Diﬀerentiation . . . . . . . . . . . . . . . 61
3.2.3 Hand anatomy terms . . . . . . . . . . . . . . . . . . . . . . . 62
3.2.4 The hand skeleton model . . . . . . . . . . . . . . . . . . . . 65
3.2.5 Linear constraints on joint angles . . . . . . . . . . . . . . . . 67
3.2.6 Model calibration. . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3 Hand surface model and projection . . . . . . . . . . . . . . . . . . . 70
3.3.1 surface model . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3.2 Camera model . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.3.3 Ellipsoid projection. . . . . . . . . . . . . . . . . . . . . . . . 74
3.3.4 Convex polytope projection . . . . . . . . . . . . . . . . . . . 75
3.3.5 Filled ellipses/polygons union . . . . . . . . . . . . . . . . . . 76
3.3.6 Intersecting two ellipses . . . . . . . . . . . . . . . . . . . . . 80
3.3.7 Intersecting an ellipse with a polyline . . . . . . . . . . . . . 80
3.3.8 Intersecting boundaries of two polygons . . . . . . . . . . . . 81
3.4 Matching cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.4.1 Generative colors models . . . . . . . . . . . . . . . . . . . . 84
3.4.2 The discontinuous likelihood . . . . . . . . . . . . . . . . . . 85
3.4.3 The continuous likelihood . . . . . . . . . . . . . . . . . . . . 88
3.5 Numerical computation of the matching cost . . . . . . . . . . . . . 92
3.5.1 Line segments. . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.5.2 Ellipsoid arcs . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.5.3 Approximating ﬁlled-ellipses by polygons . . . . . . . . . . . 98
3.6 The Matching Cost Derivatives . . . . . . . . . . . . . . . . . . . . . 100
3.6.1 Diﬀerentiation of the polytope transformation and projection 101
3.6.2 Diﬀerentiation of the ellipsoid transformation and projection 101
3.6.3 Diﬀerentiation of ellipses to convex polygons conversion . . . 102
3.6.4 Diﬀerentiation of segment intersections. . . . . . . . . . . . . 103
3.6.5 Force on silhouette vertices . . . . . . . . . . . . . . . . . . . 103
3.6.6 Second order derivatives . . . . . . . . . . . . . . . . . . . . . 106
3.7 Pose estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.7.1 Sequential Quadratic Programing with BFGS update . . . . 108
3.7.2 Variable metric descent . . . . . . . . . . . . . . . . . . . . . 110
3.7.3 Trust-Region method . . . . . . . . . . . . . . . . . . . . . . 112
3.7.4 Comparing the three Optimization methods . . . . . . . . . . 113
3.7.5 Exact versus Approximate Matching cost and derivatives . . 115
3.7.6 Smart Particle Filtering . . . . . . . . . . . . . . . . . . . . . 118
3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
3.8.1 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
3.8.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
tel-00619637, version 1 - 6 Sep 2011Index v
4 Method with texture & shading 127
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.2 Hand geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.2.1 The choice of triangulated surface . . . . . . . . . . . . . . . 129
4.2.2 Linear Blend Skinning . . . . . . . . . . . . . . . . . . . .