La lecture en ligne est gratuite
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
Télécharger Lire

A coprocessor for fast searching in large databases [Elektronische Ressource] : associative computing engine / by Christophe Layer

De
148 pages
A Coprocessor for Fast Searching in LargeDatabases: Associative Computing EngineDISSERTATIONsubmitted in partial fulfillment ofthe requirements for the degree ofDoktor-Ingenieur (Dr.-Ing.)to the Faculty of Engineering Scienceand Informatics of the University of UlmbyChristophe Layerfrom DijonFirst Examiner: Prof. Dr.-Ing. H.-J. PfleidererSecond Examiner: Prof. Dr.-Ing. T. G. NollActing Dean: Prof. Dr. rer. nat. H. PartschExamination Date: June 12, 2007c Copyright 2002-2007by Christophe Layer— All Rights Reserved —AbstractAs a matter of fact, information retrieval has changed considerably in recent years withthe expansion of the Internet and the advent of always more modern and inexpensivemass storage devices. Due to the enormous increase in stored digital contents, multi-media devices must provide effective and intelligent search functionalities. However,in order to retrieve a few relevant kilobytes from a large digital store, one moves upto hundreds of gigabytes of data between memory and processor over a bandwidth-restrictedbus. Theonlylong-termsolutionistodelegatethistasktothestoragemediumitself.For that purpose, this thesis describes the development and prototyping of an as-sociative search coprocessor called ACE: an Associative Computing Engine dedicatedto the retrieval of digital information by means of approximate matching procedures.
Voir plus Voir moins

A Coprocessor for Fast Searching in Large
Databases: Associative Computing Engine
DISSERTATION
submitted in partial fulfillment of
the requirements for the degree of
Doktor-Ingenieur (Dr.-Ing.)
to the Faculty of Engineering Science
and Informatics of the University of Ulm
by
Christophe Layer
from Dijon
First Examiner: Prof. Dr.-Ing. H.-J. Pfleiderer
Second Examiner: Prof. Dr.-Ing. T. G. Noll
Acting Dean: Prof. Dr. rer. nat. H. Partsch
Examination Date: June 12, 2007c Copyright 2002-2007
by Christophe Layer
— All Rights Reserved —Abstract
As a matter of fact, information retrieval has changed considerably in recent years with
the expansion of the Internet and the advent of always more modern and inexpensive
mass storage devices. Due to the enormous increase in stored digital contents, multi-
media devices must provide effective and intelligent search functionalities. However,
in order to retrieve a few relevant kilobytes from a large digital store, one moves up
to hundreds of gigabytes of data between memory and processor over a bandwidth-
restrictedbus. Theonlylong-termsolutionistodelegatethistasktothestoragemedium
itself.
For that purpose, this thesis describes the development and prototyping of an as-
sociative search coprocessor called ACE: an Associative Computing Engine dedicated
to the retrieval of digital information by means of approximate matching procedures.
In this work, the two most important features are the overall scalability of the design
allowing the support of different bit widths and module depths along the data path,
as well as the obtainment of the minimum hardware area while ensuring the maximum
throughput of the global architecture.
After simulation, the correctness of the results delivered by the FPGA prototyping
board could be verified using a text encoding scheme based on hashing theory. The
performance of the hardware platform has been compared with the same application
running in software on a general purpose processor. Accordingly, a speed-up of three to
four orders of magnitude is achieved for the FPGA system versus many recent personal
computers with different internal hardware configurations.Acknowledgments
First and foremost, I express my gratitude to Prof. H.-J. Pfleiderer for his constant
support, motivation and encouragements. Thank you for welcoming me in the Institute
of Microelectronics at the University of Ulm and giving me the opportunity to pursue
my research interests.
Furthermore, I am very grateful to Prof. T. G. Noll of the RWTH Aachen for his
interest in my work and his agreement to report about it. Thank you for the extremely
helpful feedback, for the suggestions about the completion of the manuscript and for
advising on this dissertation.
I would also like to thank Prof. P. Ruj´an for his continuous encouragements and
for pursuing my work over further international projects. Dr. G. Lapir is gratefully
acknowledged for the discussions at an earlier stage of this project and for providing
information and materials about the associative search method.
Iwould like tothank my colleagues fromthe Institute ofMicroelectronics, especially
M. Bschorr, C. Gu¨nter, O. Pf¨ander, J. Rauscher, W. Schlecker, K. Schmidt, R. Schreier
and E. Schubert for all the constructive discussions and comments, as well as for the
very comfortable atmosphere at work. Nonetheless, I am indebted to G. Kirilov for his
expertise and contribution to the synthesis of the memory controller.
Finally, special thanks go to Mrs. H¨ofer for her kindness, help and support.
Ulm, October 18, 2007 Christophe LayerTable of Contents
Chapter I ⋄ Introduction......................................... 1
1 Information Retrieval Systems ....................................... 1
2 The Problem with Storage Devices ................................... 3
2.1 Running into the Memory Wall .................................. 3
2.2 Need for Hardware Support...................................... 5
3 Organization of the Thesis .......................................... 6
Chapter II ⋄ Background ........................................ 9
1 Designing Digital Systems........................................... 9
1.1 Integrated Circuits Implementation Strategies...................... 10
1.2 Microprocessor Design Techniques ................................ 14
1.3 Field Programmable Gate Arrays................................. 16
1.4 Semiconductor Memory Devices.................................. 18
2 Pattern Matching in Strings ......................................... 20
2.1 Classical Algorithms............................................ 20
2.2 Flexible Pattern Matching....................................... 21
2.3 Dynamic Programming ......................................... 22
2.4 Hash Functions and Text Signatures .............................. 23
3 Sorting Techniques and Algorithms................................... 25
3.1 Sequential Sorting.............................................. 25
3.2 Bit-Level Structures ............................................ 28
3.3 Sorting Networks............................................... 30
3.4 Summarization ................................................ 31
Chapter III ⋄ Related Work..................................... 33
1 State of the Art in Software ......................................... 33
1.1 Web Search Engines ............................................ 33
1.2 Software Functionalities for Computers............................ 34
2 Hardware Accelerators.............................................. 35
2.1 Associative and Parallel Processors Systems ....................... 35
2.2 Merging Logic and Memory ..................................... 36
2.3 Special Purpose Coprocessors .................................... 38
2.4 Summarization ................................................ 39
3 Associative Access Method .......................................... 40
3.1 Building the Signature File ...................................... 40
3.2 The Retrieval Process........................................... 43
Chapter IV ⋄ System Level Analysis........................... 49
1 Motivation and Expectations ........................................ 49
1.1 Problem Statement............................................. 49
1.2 Proposed Research ............................................. 50
2 Profiling and Analysis .............................................. 52
2.1 Exploration of the Software Model................................ 53VIII Table of Contents
2.2 Sequential Algorithm Analysis ................................... 54
3 Hardware Accelerator Design ........................................ 58
3.1 Parallelization of the Algorithm .................................. 58
3.2 Modular System Architecture .................................... 62
Chapter V ⋄ Architectural Hardware Design ................. 65
1 System Management and Peripherals ................................. 65
1.1 Operation Scheduling ........................................... 65
1.2 Designing the Memory Interface.................................. 66
2 Building the Computational Data Path ............................... 71
2.1 Penalty Calculating Unit........................................ 71
2.2 Score Calculating Unit.......................................... 74
3 Hardware Sorting and Merging....................................... 80
3.1 Parallel Sorting with Bitonic Networks ............................ 80
3.2 Optimization Methodologies ..................................... 83
3.3 Hardware Merging Solutions ..................................... 85
Chapter VI ⋄ Results and Evaluation.......................... 91
1 Hardware Implementation........................................... 91
1.1 Adaptation to the Development Platform.......................... 91
1.2 Synchronizing the Processing Units ............................... 94
1.3 Synthesis Results............................................... 98
2 Evaluation of the Hardware Model ................................... 100
2.1 Benchmarking Environment ..................................... 100
2.2 Scaling the Design.............................................. 103
2.3 Performance Appraisal.......................................... 104
Chapter VII ⋄ Conclusion and Outlook........................ 109
1 On the Associative Computing Engine ................................ 109
2 Future Work ...................................................... 111
Appendix A ⋄ On the Realization of Logarithms............. 113
1 Error Analysis ..................................................... 113
2 Error Correction ................................................... 116
3 Exponential Function............................................... 118
Appendix B ⋄ High Throughput Memory Controller........ 119
1 Finite State Machine ............................................... 119
2 SDRAM Timings .................................................. 120
List of Abbreviations ............................................. 123
Bibliography and References .................................... 125List of Figures
1.1 The process of retrieving information .................................. 2
1.2 Storage system trends ............................................... 4
1.3 Hardware performance trends and memory wall......................... 4
1.4 Hierarchical organization of the thesis ................................. 6
2.1 Flexibility, performance and power consumption trade-off ................ 10
2.2 Increasing the throughput using pipelining ............................. 11
2.3 Accumulator design using the cut-set technique ......................... 12
2.4 Retiming an architecture with feedback ................................ 13
2.5 Representation of an island-style FPGA ............................... 16
2.6 FPGA density and performance trends vs. CPU ........................ 17
2.7 Semiconductor memory devices classification ........................... 19
2.8 Example of pattern matching with the BM algorithm.................... 21
2.9 Deterministic automaton for pattern matching.......................... 22
2.10 Edit distance examples using dynamic programming..................... 23
2.11 Progression of the data in the insertion-sort algorithm ................... 26
2.12 Progression of the data in the selection-sort algorithm ................... 27
2.13 Progression of the data in the bubble-sort algorithm..................... 28
2.14 Progression of the data in the radix-sort algorithm ...................... 29
2.15 Elementary sorting networks in a Knuth diagram ....................... 30
2.16 Progression of the data in the bitonic sorting algorithm .................. 31
2.17 Bitonic merging network in a Knuth diagram ........................... 31
3.1 Different associative processor designs ................................. 36
3.2 Trigram based signature of a character string ........................... 41
3.3 Textual filtering and compilation of the BAM .......................... 42
3.4 A two phases search algorithm........................................ 44
3.5 Possible distributions of the results after the filtering phase............... 44
3.6 Coding a long query string using trigrams.............................. 46
4.1 Prospective hardware accelerator system ............................... 52
4.2 Software profiling of the entire retrieval algorithm ....................... 54
4.3 Flow chart of the AAF algorithm ..................................... 56
4.4 Profiling measurements against query length............................ 57
4.5 Profiling measurements against BAM size .............................. 57
4.6 Temporal versus spatial locality....................................... 59
4.7 Parallelization of the AAF algorithm .................................. 61
4.8 Modular system architecture ......................................... 63
5.1 Vertical accesses to the BAM......................................... 67
5.2 Transistor level representation of a DRAM and an SRAM cell ............ 68
5.3 BAM mapping onto standard SDRAM memory devices .................. 70
5.4 Two possibilities to handle the burst accesses........................... 72
5.5 Bit-level signal processing in the PCU ................................. 73X List of Figures
5.6 Calculation of the penalty using a state-machine. ....................... 73
5.7 RTL design of the Penalty Calculating Unit ............................ 74
5.8 Linear approximation of the binary logarithm .......................... 76
5.9 Realization of the NLF using a Barrel shifter ........................... 78
5.10 Recurrent realization of the NLF module............................... 78
5.11 RTL design of the Score Calculating Unit .............................. 79
5.12 Batcher’s bitonic sorting network in a Knuth diagram ................... 80
5.13 Stone’s regular structure pattern of the bitonic sorting network ........... 81
5.14 Comparator set for sorting networks................................... 82
5.15 Controlled switch comparators........................................ 82
5.16 A 16 keys recurrent bitonic sorting network ............................ 83
5.17 Retiming switch comparator modules.................................. 84
5.18 Recombination of recurrent bitonic sorters ............................. 85
5.19 Merging proposal based on bitonic sort algorithm ....................... 86
5.20 Merging proposal based on insertion sort algorithm...................... 87
5.21 Recurrent sorting network based on the odd-even transposition ........... 88
5.22 Sorting structure based on the parallel bubble sort algorithm ............. 89
5.23 Extremities of the bubble sorting structure ............................. 89
6.1 Hardware prototyping module with FPGA ............................. 92
6.2 System on chip architecture of the ACE ............................... 93
6.3 Dimensioning the data path of the ACE ............................... 94
6.4 Timing diagrams for PCU to RSU1 ................................... 96
6.5 Timing diagrams for SCU to RMU1 ................................... 97
6.6 Two dimensional Fourier transform of the BAM ........................ 102
6.7 Spatial frequency analysis: expectations and unwished results............. 102
6.8 Using a three level sorting unit in the RSU............................. 103
6.9 Performance measurements of the AAF with different architectures ........ 106
A.1 NLF for different input widths........................................ 114
A.2 Absolute and relative error in the NLF ................................ 115
A.3 Correction of the NLF using xor gates ................................ 116
A.4 Correction of the NLF using a LUT ................................... 117
A.5 Comparison of the absolute error in the NLF ........................... 117
A.6 Reverse NLF function with an 8-bit input.............................. 118
B.1 State machine of the memory controller................................ 120
B.2 Timings diagram for the memory controller ............................ 121

Un pour Un
Permettre à tous d'accéder à la lecture
Pour chaque accès à la bibliothèque, YouScribe donne un accès à une personne dans le besoin