Synthesis and exploration of loop accelerators for systems-on-a-chip [Elektronische Ressource] / vorgelegt von Hritam Dutta
229 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Synthesis and exploration of loop accelerators for systems-on-a-chip [Elektronische Ressource] / vorgelegt von Hritam Dutta

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
229 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Synthesis and Exploration of LoopAccelerators for Systems-on-a-ChipDer Technischen Fakultät derUniversität Erlangen-Nürnbergzur Erlangung des GradesD O K T O R - I N G E N I E U Rvorgelegt vonHritam DuttaErlangen 2011Als Dissertation genehmigt von der TechnischenFakultät der Universität Erlangen-NürnbergTag der Einreichung: ................................ 10. Januar 2011Tag der Promotion: ....................................03. März 2011Dekan: ...............................Prof. Dr.-Ing. Reinhard GermanBerichterstatter: ...........................Prof. Dr.-Ing. Jürgen Teich.....................Prof. Christian Lengauer, Ph.D.AcknowledgementsI owe my deepest gratitude to my adviser, Professor Jürgen Teich, for always beingenthusiastic to propose and discuss new ideas. He also provided me a great amountof freedom, and valuable scientific and editorial feedback. I would also like to thankProfessor Christian Lengauer for agreeing to serve on my dissertation committeeand the suggestions to improve the dissertation. My sincere gratitude also goes toProfessor Bernard Pottier and Professor Ulrich Rüde for introducing me to new ideasand fields of research.My special thanks goes to all colleagues, especially, Frank Hannig, Dmitrij Kissler,Joachim Keinert, Richard Membarth, Moritz Schmid, Jens Gladigau, Dirk Kochfor brainstorming sessions and intensive co-operation, which led to key scientificprogress and enrichment of my knowledge.

Sujets

Informations

Publié par
Publié le 01 janvier 2011
Nombre de lectures 16
Langue English
Poids de l'ouvrage 2 Mo

Extrait

Synthesis and Exploration of Loop
Accelerators for Systems-on-a-Chip
Der Technischen Fakultät der
Universität Erlangen-Nürnberg
zur Erlangung des Grades
D O K T O R - I N G E N I E U R
vorgelegt von
Hritam Dutta
Erlangen 2011Als Dissertation genehmigt von der Technischen
Fakultät der Universität Erlangen-Nürnberg
Tag der Einreichung: ................................ 10. Januar 2011
Tag der Promotion: ....................................03. März 2011
Dekan: ...............................Prof. Dr.-Ing. Reinhard German
Berichterstatter: ...........................Prof. Dr.-Ing. Jürgen Teich
.....................Prof. Christian Lengauer, Ph.D.Acknowledgements
I owe my deepest gratitude to my adviser, Professor Jürgen Teich, for always being
enthusiastic to propose and discuss new ideas. He also provided me a great amount
of freedom, and valuable scientific and editorial feedback. I would also like to thank
Professor Christian Lengauer for agreeing to serve on my dissertation committee
and the suggestions to improve the dissertation. My sincere gratitude also goes to
Professor Bernard Pottier and Professor Ulrich Rüde for introducing me to new ideas
and fields of research.
My special thanks goes to all colleagues, especially, Frank Hannig, Dmitrij Kissler,
Joachim Keinert, Richard Membarth, Moritz Schmid, Jens Gladigau, Dirk Koch
for brainstorming sessions and intensive co-operation, which led to key scientific
progress and enrichment of my knowledge. I appreciate Frank’s patience in reading
the whole dissertation and making valuable suggestions. I was fortunate to have won-
derful office mates in Mateusz Majer and Tobias Ziermann, and thank them both for
all the technical and non-technical discussions. My sincere acknowledgements also
goes to external colleagues Sebastian Siegel, Rainer Schaffer (TU Dresden), Wolf-
gang Haid (ETH Zürich), Samar Yazdani (UBO, Brest) for co-operation on important
research problems. I also appreciate the efforts of undergraduate students, especially,
Teddy Zhai and Holger Ruckdeschel, in the software development of PARO method-
ology. I am also deeply indebted to Sonja Heidner and Ina Derr for helping me sort
out several administrative issues. I greatly value the friendship of all the people who
made my stay in Erlangen a real pleasure. My family has been a constant source
of love, concern, support, and strength all these years. I would like to express my
heart-felt gratitude to my family and dedicate this dissertation to them.
Hritam Dutta
iiiivContents
1. Introduction 1
1.1. Next Generation Applications . . . . . . . . . . . . . . . . . . . . 3
1.2. Accelerator based SoC Architectures . . . . . . . . . . . . . . . . . 5
1.3. Programming Models for SoC . . . . . . . . . . . . . . . . . . . . 8
1.4. Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5. Contributions and Bibliographic notes . . . . . . . . . . . . . . . . 12
1.6. A Guided Tour through the Thesis . . . . . . . . . . . . . . . . . . 14
2. Fundamentals and Related Work 15
2.1. Algorithm Specification in the Polytope Model . . . . . . . . . . . 15
2.1.1. Fundamentals: Algorithm Specification . . . . . . . . . . . 15
2.1.2. Specification of Communicating Loop Nests . . . . . . . . 20
2.1.3. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2. A Generic Accelerator Scheme . . . . . . . . . . . . . . . . . . . 26
2.2.1. Characterization and Classification of Loop Accelerators . . 26
2.2.2. Accelerator Subsystem for Streaming Application . . . . . . 28
2.3. High-level Synthesis of Hardware Accelerators . . . . . . . . . . . 30
2.3.1. Front End: Loop Transformations . . . . . . . . . . . . . . 30
2.3.1.1. Program T . . . . . . . . . . . . . 31
2.3.1.2. Tiling . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2. Front End: Scheduling . . . . . . . . . . . . . . . . . . . . 34
2.3.2.1. Global Scheduling and Binding . . . . . . . . . . 35
2.3.2.2. Local and Resource Binding . . . . . 37
2.3.3. Back End: Synthesis . . . . . . . . . . . . . . . . . . . . . 38
2.3.3.1. of Processor Element . . . . . . . . . . 39
2.3.3.2. Synthesis of Array Interconnection Structure . . . 40
2.3.3.3. of Control Hardware . . . . . . . . . . 40
2.4. Accelerator Design Space Exploration . . . . . . . . . . . . . . . . 42
2.5. High-level Synthesis Tools . . . . . . . . . . . . . . . . . . . . . . 44
2.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3. Accelerator Generation: Loop Transformations and Back End 49
3.1. Loop Optimizations for Accelerator Tuning . . . . . . . . . . . . . 50
3.1.1. Loop Transformations . . . . . . . . . . . . . . . . . . . . 51
vContents
3.1.1.1. Loop Permutation . . . . . . . . . . . . . . . . . 53
3.1.1.2. Loop Tiling . . . . . . . . . . . . . . . . . . . . 55
3.1.2. Hierarchical Tiling . . . . . . . . . . . . . . . . . . . . . . 56
3.1.2.1. Tiling: Decomposition of the Iteration Space . . . 62
3.1.2.2. Embedding: Splitting of Data Dependencies . . . 67
3.1.2.3. Iteration dependent Conditions . . . . . . . . . . 70
3.1.2.4. Parallelization of Tiled Piecewise Linear Algorithms 71
3.1.3. Results: Scalability and Overhead of Hierarchical Tiling . . 74
3.2. Controller Generation . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2.1. Accelerator Control Engine: Architecture and Synthesis Method-
ology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2.1.1. Counter Generation . . . . . . . . . . . . . . . . 78
3.2.1.2. Determination of Processor Element Type . . . . 88
3.2.1.3. Global and Local Controller Unit . . . . . . . . . 89
3.2.1.4. Propagation of Global Control and Counter Signals 91
3.2.2. I/O Communication Controller . . . . . . . . . . . . . . . . 92
3.2.2.1. Buffer Modeling and Synthesis . . . . . . . . . . 92
3.2.2.2. I/O Controller Synthesis . . . . . . . . . . . . . . 94
3.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.3.1. Embedded Computation Motifs . . . . . . . . . . . . . . . 97
3.3.2. Impact of Compiler Transformations on Controller Overhead 100
3.4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4. Accelerator Subsystem for Streaming Applications: Synthesis and Sys-
tem Integration 105
4.1. Communicating Loop Model . . . . . . . . . . . . . . . . . . . . . 107
4.1.1. Loop Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.1.2. Accelerator Model . . . . . . . . . . . . . . . . . . . . . . 110
4.1.3. Mapping: Putting it all together . . . . . . . . . . . . . . . 111
4.2. Automated Generation of a Communicating Accelerator Subsystem 116
4.2.1. Modeling of Communication Channels . . . . . . . . . . . 118
4.2.1.1. Simplified Windowed Synchronous Data Flow Model118
4.2.1.2. Conversion from the Polyhedral Model to the Data
Flow Representation . . . . . . . . . . . . . . . . 119
4.2.2. Multi-dimensional FIFO: Architecture and Synthesis . . . . 127
4.3. Synthesis of Accelerators for MPSoCs . . . . . . . . . . . . . . . . 131
4.3.1. Interface Synthesis . . . . . . . . . . . . . . . . . . . . . . 133
4.3.1.1. Accelerator Memory Map Generation . . . . . . . 133
4.3.1.2. Hardware Wrapper . . . . . . . . . . . . . . . . 135
4.3.1.3. Software Driver . . . . . . . . . . . . . . . . . . 137
4.3.2. Accelerator Integration in SoC . . . . . . . . . . . . . . . 139
4.4. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
viContents
4.4.1. Overhead of Communication Primitives . . . . . . . . . . . 141
4.4.2. Accelerators as Components in SoC . . . . . . . . . . . . . 143
4.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5. Design Space Exploration: Accelerator Tuning 145
5.1. Single Accelerator Exploration . . . . . . . . . . . . . . . . . . . . 146
5.1.1. Model Representation and Problem Definition . . . . . . . . 148
5.1.2. Multiple Objectives . . . . . . . . . . . . . . . . . . . . . 150
5.1.3. Objective Functions . . . . . . . . . . . . . . . . . . . . . 151
5.1.3.1. Rapid Estimation Models . . . . . . . . . . . . . 152
5.1.4. Optimization Engine . . . . . . . . . . . . . . . . . . . . . 154
5.1.4.1. Baseline: Random or Exhaustive Search . . . . . 154
5.1.4.2. Evolutionary Algorithms . . . . . . . . . . . . . 155
5.2. Performance Analysis of Accelerators in an SoC System . . . . . . 161
5.2.1. Modular Performance Analysis (MPA) . . . . . . . . . . . 163
5.2.2. Objective Parameter Estimation for Accelerators . . . . . . 164
5.2.2.1. Accelerator Performance: Service Curve Estima-
tion . . . . . . . . . . . . . . . . . . . . . . . . 166
5.2.3. Optimal Configuration Selection in System Context . . . . . 169
5.2.4. Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.2.4.1. Motion JPEG Decoder . . . . . . . . . . . . . . . 170
5.3. Conclusion and Summary . . . . . . . . . . . . . . . . . . . . . . 172
6. Conclusions and Outlook 175
6.1. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.2. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
A. Glossary 179
B. Hermite Normal Form 181
C. Loop Benchmarks 183
German Part 195
Bibliography 199
List of Abbreviations 219
Curriculum Vitae 221

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents