Contribution à l’étude et implantation de systèmes intelligents modulaires auto-organisateurs, Contribution to the Study and Implementation of Intelligent Modular Self-organizing Systems

Thesee - Ivan Budnyk

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

236 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Sous la direction de Kurosh Madani
Thèse soutenue le 08 décembre 2009: Paris Est
Les problèmes de la classification ont reçu une attention considérable dans des différents champs d’ingénierie comme traitement des images biomédicales, identification a partir de la voix, reconnaissance d'empreinte digitale etc. Les techniques d’intelligence artificielles, incluant les réseaux de neurones artificiels, permettent de traiter des problèmes de ce type. En particulier, les problèmes rencontrés nécessitent la manipulation de bases de données de tailles très importantes. Des structures de traitement adaptatives et exploitant des ensembles de classificateurs sont utilisées. Dans cette thèse, nous décrivons principalement le développement et des améliorations apportées à un outil de classification désigné par le terme Tree-like Divide to Simplify ou T-DTS. Nos efforts se sont portés sur l’un des modules de cet outil, le module d’estimation de complexité. L’architecture de l’outil T-DTS est très flexible et nécessite le choix d’un nombre important de paramètres. Afin de simplifier l’exploitation de T-DTS, nous avons conçu et développé une procédure automatique d’optimisation d’un de ces plus importants paramètres, le seuil de décision associé à la mesure de complexité. La contribution principale de cette thèse concerne le développement de modules pouvant s’implanté sur une architecture de calcul matérielle parallèle. Ce ceci permet de se rapproché d’une implantation purement matérielle de l’outil T-DTS
-Traitement de l’information
-Systèmes complexes
-Intelligence artificielle
-Systèmes d’apprentissages artificiels modulaires
-Cassification
-Estimation de la complexité
Classification problems deal with separating group of objects into sets of smaller classes; this set of problems have received considerable attention in diverse engineering fields such as biomedical imaging, speaker identification, fingerprint recognition, etc. Several effective approaches for automated classification were suggested based on artificial intelligence techniques, including neural networks. Still, one of the major challenges faced by these approaches is a large scale of data required for successful classification. In this thesis, we explore a possible solution to this problem based on a module-based Tree-like Divide to Simplify (T-DTS) classification model. We focus on enhancing the key module of this approach - complexity estimation module. Furthermore, we provide an automated procedure for optimizing key complexity estimation parameters of the T-DTS model; this considerably improves usability and allows for a more effective configuration of decomposition reasoning of the approach. Another major contribution of this work employs further development of T-DTS modules that could be implemented using parallel computer architecture, thereby allowing T-DTS to utilize an underlying hardware to the fullest extent
-Information processing
-Complex systems
-Artificial intelligence
-Modular artificial learning systems
-Classification
-Complexity estimating
Source: http://www.theses.fr/2009PEST0013/document

Sujets

Traitement de l'information

Système complexe

Intelligence artificielle

Artificial Intelligence

Informations

Publié par	Thesee
Nombre de lectures	112
Langue	English
Poids de l'ouvrage	3 Mo

Extrait

Thèse Présentée pour l’obtention du titre de DOCTEUR DE L’UNIVERSITÉ PARIS-EST Spécialité: Sciences Informatiques par Ivan BUDNYK Contribution à l’Étude et Implantation de Systèmes Intelligents Modulaires Auto-Organisateurs Soutenue publiquement 8 décembre 2009 devant la commission d’examen composée de

Rapporteur Prof. Gilles BERNARD Université Paris 8 Rapporteur Prof. Vladimir GOLOVKO Brest State Technical University Examinateur Prof. Hichem MAAREF Université d’Évry-Val d’Essonne Examinateur Dr. Abdennasser CHEBIRA Université Paris-Est (Paris 12) Invité Dr. Semen GOROKHOVSKYI Université nationale de Kiev-Mohyla-Académie Directeur de thèse Prof. Kurosh MADANI Université Paris-Est (Paris 12) LISSI Laboratoire Images, Signaux et Systèmes Intelligents EA 3956 Université Paris-Est (Paris 12), l’IUT Sénart-Fontainebleau, Département GEII Bât A., Av. P. Point, F-77127 Lieusaint ; Tél : +33(0)164134486, Fax : +33(0)164134503, http://lissi.univ-paris12.fr

Thesis Presented to obtain the degree of DOCTOR OF UNIVERSITY PARIS-EST Topic: Computer Sciences by Ivan BUDNYK Contribution to the Study and Implementation of Intelligent Modular Self-organizing Systems Defended on 8 December 2009 in presence of commission composed by

Rapporteur Prof. Gilles BERNARD University Paris 8 Rapporteur Prof. Vladimir GOLOVKO Brest State Technical University Examiner Prof. Hichem MAAREF University of Évry Val d’Essonne Examiner Dr. Abdennasser CHEBIRA University Paris-Est (Paris 12) Invited Dr. Semen GOROKHOVSKYI National University of Kyiv-Mohyla Academy Doctorate director Prof. Kurosh MADANI University Paris-Est (Paris 12) LISSI Laboratory of Images, Signals and Intelligent Systems EA 3956 University Paris-Est (Paris 12), IUT Senart-Fontainebleau, Department GEII Bat A., Av. P. Point, F-77127 Lieusaint ; Tel : +33(0)164134486, Fax : +33(0)164134503, http://lissi.univ-paris12.fr

Abstract Classification problems deal with separating group of objects into sets of smaller classes; this set of problems have received considerable attention in diverse engineering fields such as biomedical imaging, speaker identification, fingerprint recognition, etc. Several effective approaches for automated classification were suggested based on artificial intelligence techniques, including neural networks. Still, one of the major challenges faced by these approaches is a large scale of data required for successful classification. In this thesis, we explore a possible solution to this problem based on a module-based Tree-like Divide to Simplify (T-DTS) classification model. We focus on enhancing the key module of this approach -complexity estimation module. Furthermore, we provide an automated procedure for optimizing key complexity estimation parameters of the T-DTS model; this considerably improves usability and allows for a more effective configuration of decomposition reasoning of the approach. Another major contribution of this work employs further development of T-DTS modules that could be implemented using parallel computer architecture, thereby allowing T-DTS to utilize an underlying hardware to the fullest extent. Key words: Information processing, complex systems, artificial intelligence, modular artificial learning systems, classification, complexity estimating. Résume Les problèmes de la classification ont reçu une attention considérable dans des différents champs dingénierie comme traitement des images biomédicales, identification a partir de la voix, reconnaissance d'empreinte digitale etc. Les techniques dintelligence artificielles, incluant les réseaux de neurones artificiels, permettent de traiter des problèmes de ce type. En particulier, les problèmes rencontrés nécessitent la manipulation de bases de données de tailles très importantes. Des structures de traitement adaptatives et exploitant des ensembles de classificateurs sont utilisées. Dans cette thèse, nous décrivons principalement le développement et des améliorations apportées à un outil de classification désigné par le terme Tree-like Divide to Simplify ou T-DTS. Nos efforts se sont portés sur lun des modules de cet outil, le module destimation de complexité . Larchitecture de loutil T-DTS est très flexible et nécessite le choix dun nombre important de paramètres. Afin de simplifier lexploitation de T-DTS, nous avons conçu et développé une procédure automatique doptimisation dun de ces plus importants paramètres, le seuil de décision associé à la mesure de complexité. La contribution principale de cette thèse concerne le développement de modules pouvant simplanté sur une architecture de calcul matérielle parallèle. Ce ceci permet de se rapproché dune implantation purement matérielle de loutil T-DTS. Mots clés: Traitement de linformation, systèmes complexes, intelligence artificiel, systèmes dapprentissages artificiels modulaires, classification, estimation de la complexité .

Acknowledgement I would like to express my deepest gratitude to Prof. Kurosh Madani for his continuous support and patience. His help and guidance throughout my studies at Laboratoire Images Signaux et Systèmes Intelligents (LISSI / EA 3956) were invaluable. His experience was truly of great benefit to me and his advice in academia and otherwise has undoubtedly left a mark on me. I would also like to thank Dr. Abdennasser Chebira for his valuable input and meticulous efforts in reviewing much of this research, as well as his patience while I was preparing this thesis. I would also like to thank my thesis committee members Prof. Gilles Bernard, Prof. Vladimir Golovko and Prof. Hichem Maaref for the useful comments, suggestions, and the references. I am grateful to the faculty members of lIUT de Sénart including Dr. Veronique Amarger, Dr. Christophe Sabourin and Dr. Amine Chohra for their assistance. My deepest thanks also go to the members and ex-members of LISSI including Dr. Nadia Kanaoui, Dr. El Khier Sofiane Bouyoucef, Dr. Mariusz Rybnik, Dr. Lamine Thiaw, Dr. Matthieu Voiry, Weiwei Yu, Dalel Kanzari, Ting Wang, Dominik Maximilián Ramík and Arash Bahrammirzaee. I also thank Frances leading international exchange operator ÉGIDE, who gave me the initial opportunity to pursue research in France. Numerous other people and lifelong friends with whom I have worked and lived in France, Czech Republic and Ukraine, deserve many thanks as well, especially Dr. Maksym Petrenko and Dr. Semen Gorokhovskyi. Finally, I would like to thank my parents, Georgiy and Ganna Budnyk, because without them I would not be the person I am today.

Table of contents

List of tables............................................................................................................................v List of figures ........................................................................................................................ vi Index of symbol......................................................................................................................x General introduction ...............................................................................................................1 I State-of-the-art of classification approaches ........................................................................6 I.1 Concepts of classification............................................................................................7 I.2 Clustering methods ......................................................................................................8 I.3 Main classification methods ......................................................................................17 I.4 T-DTS (Tree-like Divide To Simplify)approach.......................................................37 I.5 Conclusion .................................................................................................................42 II Complexity concepts .........................................................................................................44 II.1 Introduction to complexity concepts ........................................................................44 II.2 Computational complexity measurement.................................................................48 II.3 ANN-structure based classification complexity estimator.......................................75 II.4 Conclusion................................................................................................................85 III T-DTS software architecture............................................................................................86 III.1 T-DTS concept: self-tuning procedure ...................................................................94 III.2 T-DTS software architecture and realization ........................................................101 III.3 Conclusion............................................................................................................118 IV Validation aspects ..........................................................................................................120 IV.1 ANN-structure based complexity estimators........................................................120 IV.1.1 Hardware-based validation .........................................................................121 IV.1.2 Software-based validation ...........................................................................133 IV.1.3 Summary......................................................................................................149 IV.2 T-DTS ...................................................................................................................152 IV.2.1 ANN-structure based complexity estimator validation ...............................152 IV.2.2 T-DTS self-tuning procedure validation......................................................160 IV.2.3 Summary......................................................................................................172 IV.3 Conclusion ............................................................................................................173 General conclusion and perspectives ..................................................................................175

iii

Appendixes .........................................................................................................................179 A The list of publication ...............................................................................................180 B Approaches to defining complexity ..........................................................................181 B.1 Defining complexity. Genesis of the concept complexity ...............................182 B.2 Systems attributes and complexity..................................................................183 C Neural Networks in hardware...................................................................................187 C.1 IBM© ZISC®-036 Neurocomputer .................................................................189 C.2 Systems attributes and complexity..................................................................193 Bibliography.......................................................................................................................197

List of tables II.1 Advantages and disadvantages of complexity estimating techniques............................72 IV.1 Benchmarks complexity rates obtained using IBM© ZISC®-036 implementation of ANN structure based complexity estimator ........................................................................123 IV.2 Complexity rates obtained for Splice-junction DNA classification problem (original database) using IBM© ZISC®-036 Neurocomputer ..........................................................132 IV.3 Complexity rates obtained for Splice-junction DNA classification problem (re-encoded database) using ANN-structure based and other applications ..............................147 IV.4 Complexity rates obtained for Tic-tac-toe endgame classification problem using sixteen classification complexity criteria including ANN-structure based complexity estimator..............................................................................................................................148 IV.5 Classification results: Four spiral benchmark, two classes, generalization database size 500 prototypes, learning database size 500 prototypes ...............................................163 IV.6 Classification results: Tic-tac-toe endgame classification problem ...........................167 IV.7 Classification results: Splice-junction DNA sequences classification problem, three classes, generalization and learning database size 1595 prototypes ...................................170 IV.8 Consolidation of classification results: Splice-junction DNA sequences classification problem ...............................................................................................................................171

List of figures I.1 SVM: space mapping using linear hyperplane ................................................................21 I.2 SVM: space mapping using different space kernel functions Φ .....................................22 I.3 General block scheme diagram of the T-DTS structure constructing .............................40 II.1 Description and interpretation process ...........................................................................52 II.2 Retained adherence subset for two classes near the boundary.......................................69 II.3 Taxonomy of classification complexity (separability) measures ...................................71 II.4 Examples of Voronoy polyhedron for 2D and 3D classification problems ...................77 II.5 Q(m) indicator function behaviour.................................................................................78 III.1 Diagram of T-DTS implementation for classification tasks .........................................87 III.2 Scheme of T-DTS learning concept..............................................................................90 III.3 T-DTS operating ...........................................................................................................90 III.4 An example of maximal possible decomposition tree ..................................................95 III.5 An example of distribution of the clusters number over [Amin; Amax] complexity interval .................................................................................................................................97 III.6 T-DTS software architecture.......................................................................................102 III.7 Principal T-DTS v. 2.50 Matlab software architecture...............................................104 III.8 Detailed T-DTS v. 2.50 Matlab software architecture ...............................................107 III.9 Matlab T-DTS software realization v. 2.50, Control panel ........................................108 III.10 T-DTS GUI: Results : 2 stripe-like benchmark (576 prototypes).............................113 III.11 GUI of T-DTS, decomposition clusters chart..........................................................114 III.12 GUI of T-DTS, decomposition tree charts................................................................115 III.13 GUI of T-DTS, decomposition tree charts in 3D......................................................115 III.14 Menu of T-DTS, Configuration ................................................................................116 III.15 Menu of T-DTS, Set Constant ..................................................................................117 III.16 Menu of T-DTS, Set EC Options ..............................................................................117 III.17 Menu of T-DTS, Analysis .........................................................................................117 IV.1 Stripe classification benchmarks ...............................................................................122 IV.2 Stripe classification benchmarks : Q i (m) behavior versus learning database size m , LSUP ZISC®-036 mode.....................................................................................................124 IV.3 Stripe classification benchmarks : Q i (m) behavior versus learning database size m , L1 ZISC®-036 mode................................................................................................................124 IV.4 Benchmarks classification rates behavior versus learning database size m , LSUP ZISC®-036 mode................................................................................................................126

IV.5 Benchmarks classification rates behavior versus learning database size m , L1 ZISC®-036 mode ............................................................................................................................126 IV.6 Q k (m) evaluation for DNA splice-junction classification problem using different ZISC®-036 k -MIF parameters ( k : 55,56, 4096) for: Q 55 (m), Q 56 (m), Q 4096 (m), m k  corresponds to calculated m 0 for each k -curve....................................................................129 IV.7 Quality check of RCE-kNN-like Voronoy polyhedron construction based on its generalization ability performed for k -MIF parameter k=55 ..............................................129 IV.8 Quality check of RCE-kNN-like Voronoy polyhedron construction based on its generalization ability performed for k -MIF parameter k=56 ..............................................131 IV.9 Quality check of RCE-kNN-like Voronoy polyhedron construction based on its generalization ability performed for k -MIF parameter k=4096 ..........................................131 IV.10 Square classification benchmarks, 2 classes, 2000 prototypes.................................134 IV.11 ANN-structure based complexity estimator evaluation for: Square benchmarks, 2 classes, 2000 prototypes, MIF = 1024, 3 distance modes...................................................135 IV.12 ANN-structure based complexity estimator evaluation for: 8 Stripe benchmark, 2 classes, 2000 prototypes, LSUP distance mode..................................................................136 IV.13 ANN-structure based complexity estimator evaluation for: 8 Stripe benchmark, 2 classes (4&4 stripes), LSUP distance mode .......................................................................137 IV.14 Grid classification benchmarks in D1 D2 and D3 dimension ..................................138 IV.15 ANN-structure based complexity estimator evaluation for: Grid benchmark, 2 classes, EUCL distance mode .............................................................................................139 IV.16 ANN-structure based complexity estimator evaluation for: Grid and 8-stripe-benchmarks, 2 classes, EUCL distance mode, MIF=1024 .................................................140 IV.17 Four spiral classification benchmarks, 2 classes, 2000 prototypes ..........................142 IV.18 ANN-structure based complexity estimator evaluation for: 4 Spiral benchmark, 2 classes, EUCL distance mode .............................................................................................142 IV.19 Six classification benchmarks [from left to right, from top to down: 2 Stripes (2ST), 2 Grids (2GR), 2 Squares (2SQ), 2 Sinusoids (2SN), 2 Spirals (2SP), 2 Circles (2CR) with small overlapping zone] ......................................................................................................143 IV.20 ANN-structure based complexity estimator evaluation for: 6 classification benchmarks, 2 classes, 2000 prototypes, MIF=1024..........................................................144 IV.21 Validation Matlab implementation of ANN-structure based complexity estimator embedded into T-DTS framework: 2 Stripe benchmark, 2 classes, generalization database size 1000 prototypes, learning database size 1000 prototypes, DU  CNN, PU  LVQ1 ..153 IV.22 Validation ANN-structure based complexity estimator embedded into T-DTS framework: 10 Stripe benchmark, 2 classes, generalization database size 1600 prototypes, learning database size 400 prototypes, DU  CNN, PU  LVQ1 .......................................154 IV.23 Validation Matlab implementation of ANN-structure based complexity estimator embedded into T-DTS framework: Tic-tac-toe endgame problem, 2 classes, generalization database size 479 prototypes, learning database size 479 prototypes, DU  CNN, PU  MLP FF GDM...................................................................................................................157 _ _

vii

IV.24 Validation ANN-structure based complexity estimator embedded into T-DTS framework: Tic-tac-toe endgame problem, 2 classes, generalization database size 766 prototypes, learning database size 192 prototypes, DU  CNN, PU  MLP_FF_GDM ....158 IV.25 Validation ANN-structure based complexity estimator embedded into T-DTS framework: Splice-junction DNA sequence classification problem, 3 classes, generalization database size 1520 prototypes, learning database size 380 prototypes, DU  CNN, PU  MLP FF GDM...................................................................................................................158 _ _ IV.26 Validation T-DTS self-tuning threshold procedure, Average learning rate (including its corridor of the standard deviations) as a function of θ - threshold: 4 Spiral benchmark, 2 classes, generalization database size 500 prototypes, learning database size 500 prototypes, DU  CNN, PU  PNN, Fisher measure based complexity estimator ................................160 IV.27 Validation T-DTS self-tuning threshold procedure, Average generalization rate (including its corridor of the standard deviations) as a function of θ -threshold: 4 Spiral benchmark, 2 classes, generalization database size 500 prototypes, learning database size 500 prototypes, DU  CNN, PU  PNN, Fisher measure based complexity estimator ......161 IV.28 Validation T-DTS self-tuning threshold procedure, Average clusters number as a function of θ -threshold: 4 Spiral benchmark, 2 classes, generalization database size 500 prototypes, learning database size 500 prototypes, DU  CNN, PU  PNN, Fisher measure based complexity estimator.................................................................................................161 IV.29 Validation T-DTS self-tuning threshold procedure, Performance estimating function P( θ ): 4 Spiral benchmark, 2 classes, generalization database size 500 prototypes, learning database size 500 prototypes, DU  CNN, PU  PNN, Fisher measure based complexity estimator..............................................................................................................................162 IV.30 Validation T-DTS self-tuning threshold procedure, Clusters number distribution: 4 Spiral benchmark, 2 classes, learning database size 500 prototypes, DU  CNN, Collective entropy based complexity estimator ...................................................................................163 IV.31 Validation T-DTS self-tuning threshold procedure: 10 stripe benchmark, 2 classes, generalization database size 1000 prototypes, learning database size 1000 prototypes, DU  CNN, PU  LVQ1, 4 complexity estimators ......................................................................164 IV.32 Validation T-DTS self-tuning threshold procedure, Clusters number distribution: Tic-tac-toe endgame problem, 2 classes, DU  CNN, Collective entropy complexity estimator..............................................................................................................................166 IV.33 Validation T-DTS self-tuning threshold procedure, Clusters number distribution: Tic-tac-toe endgame problem, 2 classes, DU  CNN, ANN-structure based complexity estimator..............................................................................................................................166 IV.34 Validation T-DTS self-tuning threshold procedure, Splice-junction DNA sequences classification problem, 3 classes, generalization database size 1520 prototypes, learning database size 380 prototypes, DU  CNN, PU  MLP_FF_GDM, 3 complexity estimators .............................................................................................................................................168 IV.35 Validation T-DTS self-tuning threshold procedure, Clusters number distribution: Splice-junction DNA sequences classification problem, 3 classes, learning database size 1595 prototypes, DU  CNN, Purity PRISM based complexity estimator.........................170 IV.36 Validation T-DTS self-tuning threshold procedure, Clusters number distribution: Splice-junction DNA sequences classification problem, 3 classes, learning database size

viii