Lehrstuhl für Integrierte Systeme
Technische Universität München
Methodology for
System Partitioning of
Chip-Level Multiprocessor Systems
Winthir Anton Brunnbauer
Vollständiger Abdruck der von der Fakultät für Elektrotechnik und Informationstechnik
der Technischen Universität München zur Erlangung des akademischen Grades eines
Doktor-Ingenieurs
genehmigten Dissertation.
Vorsitzender: Univ.-Prof. Dr.-Ing. Ulf Schlichtmann
Prüfer der Dissertation:
1. Univ.-Prof. Dr. sc.techn. (ETH) Andreas Herkersdorf
2. Hon.-Prof. Dr.-Ing. Lajos Gazsi, Ruhr-Universität Bochum
Diese Dissertation wurde am 06.12.05 bei der Technischen Universität München eingere-
icht und durch die Fakultät für Elektrotechnik und Informationstechnik am 05.07.06 an-
genommen."He who can properly define and divide is to be considered a god."
Plato (427-347 BC)
iiiAcknowledgements
The very beginnings of this work go back to the visions of Prof. Ingolf Ruge by
bringing networking applications and marketing together. To be at the leading edge of
technology, he arranged for the western outpost of the Lehrstuhl für Integrierte
Schaltungen located at and funded by Infineon Technologies NA Corp., San Jose, CA,
USA. I owe him special thanks for the opportunity to live and work in the Silicon Valley
and be a part of the research community.
After my return from the USA, Prof. Ruge’s successor, Prof. Andreas Herkersdorf,
accommodated me at his institute and accepted the supervision of my PhD work. This
knowledge and experience was a very valuable enrichment of this thesis. Thank you very
much for fostering me!
For his offer to act as the second examiner, I am very grateful to Prof. Lajos Gazsi. His
interest during the early stages of my PhD work inspired me that this topic is red hot.
I reminisce the "Munich Base" - the networking group of LIS with Thomas Wild, Nuria
Pazos and Jürgen Foag - and my PhD buddy Gordon Cichon during my time at the LIS
and Infineon. Unfortunately, he decided to change his plans and move back to Germany
sooner than planned.
During and especially in the end of my PhD time, I appreciated the intense supervision
of Thomas Wild. Thank you so much! Especially, I remember our daily field tests of video
conferencing between two continents. Moreover, I owe special thanks to Verena Draga,
Wolfgang Kohtz, Dr. Walter Stechele, and Doris Zeller for their support at LIS.
Many thanks to Charles Bry and Gunnar Hagen of Infineon Technologies NA Corp. for
taking care of me through economy ups and downs in the Silicon Valley.
I have appreciated the endless discussions about all the world and his brother with Sven
Haar, Roland Zukunft, Fabian Vogelbruch and Paul Zuber. I’m especially grateful to
Fabian and Paul to be accompanied by them during the last steps of my PhD work.
A special thank you to my proof-readers Charles Bry, Anton Lindner, Hubert
Mooshofer, Rainer Seidl, and Thomas Wild.
Many thanks to my parents, Otto and Margot Brunnbauer, for enabling me to pursue
this doctorate. In particular, I am very much obliged to my wife Anja and Rainer Seidl for
believing in me and motivating me during difficult times. This work is dedicated to them.
The financial support for this work by Infineon Technologies NA Corp. and Lehrstuhl
für Integrierte Systeme is gratefully acknowledged.
Winthir Brunnbauer
München, December 2005
vAbstract
The steady reduction of structure size in microelectronics allows the production of
increasingly more complex and powerful integrated circuits. The increase of the
productivity gap requires the development of new suitable design methodologies in order
to increase the design productivity. By raising the entry level of automated design tools to
higher abstraction levels, the usage of prefabricated modules permits a briefer and more
efficient development of complex systems. Many design methodologies have been
proposed which support only limited implementation possibilities based on the architect’s
experience. Hitherto, such kind of design tools can especially be found at research
institutes.
The design of such integrated circuits, to find a suitable architecture meeting all
specifications, is a challenging and complex task. Single chips comprising the same
functionalities which have been implemented on boards are increasingly common, so-
called systems on chip (SoC). The allocation of microprocessors and dedicated hardware
on the SoC and the mapping of the different functionalities to hardware or software
resources is a nontrivial task. In order to identify the best suitable implementation which
meets the requirements, an extensive architecture exploration is necessary. Based on the
ratio of software and hardware, the performance flexibility of SoC can be adjusted with
respect to the application requirements.
This thesis addresses the partitioning of control dominated applications, in this case
from the networking domain. Hitherto, only few algorithms are known which support this
requirement during partitioning. Furthermore, the presented methodology partitions HW/
SW systems with special focus on the internal communication of complex SoC
architectures. This factor is increasingly recognized as a crucial factor for the efficiency of
SoC. The necessity to use an efficient design of suitable communication architectures
becomes important regarding potential system complexity.
A main contribution of this thesis is the extension of a constructive heuristic for the
partitioning of functionalities with the consideration of events lying in the future. In
particular, the consideration of inevitable imminent data transfers which may cause
blockings of communication resources can result in improvements of the performance. The
proposed variants of the improved partitioning methodology selects and assigns one task
after another to suitable resources based on list scheduling. The application modeling
carried out in this thesis utilizes annotated process graphs for the representation of
functionalities and estimations of worst case execution times (WCET) for the resource
library. With consideration of the current allocations of all resources, each task/resource
combination which is ready to be partitioned is evaluated. The highest dynamic task
priority decides about the selection of task and resource. A later change of the
accomplished assignments cannot be performed. In this way, very short runtimes of the
algorithms are possible even for large process graphs. However, these early assignments
viimay affect the overall performance unfavorably.
Partitioning algorithms usually handle each functionality of design models as self-
contained which is individually assigned to a resource. Due to the granularity on the task
graph, for instance, a memory access is represented as one functionality distributed in two
nodes as design entity which is processed by one resource. However, partitioning
algorithms may use different resources for these two nodes. In order to avoid such
situations, a further contribution of this thesis is the enforcement of implementation
constraints on multiple tasks. With the help of Common Implementation Nodes (CIN)
grouped to clusters, various resources can be evaluated and one resource is assigned for all
CINs of this cluster. This processing of design models permits a flexible consideration of
design constraints.
On the basis of synthetic process graphs as well as a real-world example DiffServ, a
networking application on a router linecard, the performances of the partitioning
algorithms have been examined. Different scenarios of process graphs and target
architectures allow statements about the behavior and applicability of the partitioning
algorithms. The investigations of the variants of the constructive heuristic algorithms
introduced in this thesis exhibit considerable improvements for wide ranges of parameters
in relation to the conventional constructive algorithm. The provision for known inevitable
events during task priority calculation allows a reduction in scheduling latency up to
about 35%. However, the structure dependencies of the constructive algorithms on the
process graphs and the content of the target architectures significantly influence the
outcome. The improved algorithms cannot outperform the reference algorithm for all
scenarios. To compensate for the structure dependencies within clusters, the usage of all
CINs within a cluster for the priority computation instead of only the first CIN leads to
better results in most cases.
The dependency of the constructive partitioning algorithms on the structure of the
models becomes apparent with the evaluation of DiffServ. It has been shown that the
performance of the introduced algorithms cannot be forecasted in advance. However, the
short runtimes in the order of seconds allow the consecutive use of several variants of
constructive algorithms to determine the best result. Thus, constructive partitioning
heuristics can profit from the new approaches.
viiiTable of Contents
1 Introduction........................................................................................... 1
1.1 System-Level Design......................................................................................... 1
1.2 Design Flow....................................................................................................... 7
1.3 Scope and Objective ........................................................................................ 13
1.4 Outline ............................................................................................................. 14
2 Related Work ...................................................................................... 15
2.1 Exact Methodologies ....................................................................................... 16
2.1.1 Enumeration .......................................................................................... 16
2.1.2 Integer Linear Programming (ILP) ....................................................... 17
2.2 Heuristic Methodologies.................................................................................. 19
2.2.1 Simulated Annealing (SA) .................................................................... 20
2.2.2 Tabu Search (TS) 20
2.2.3 Genetic/ Evolutionary Algorithms ........................................................ 21
2.2.4 Hierarchical Clustering ......................................................................... 23
2.2.5 Greedy Algorithms................................................................................ 24
2.3 Features of Partitioning Algorithms ................................................................ 27
2.3.1 Consideration of Communication ......................................................... 28
2.3.2 Support of Conditions ........................................................................... 28
2.3.3 Look-Ahead........................................................................................... 29
2.3.4 Clustering .............................................................................................. 30
2.4 Summary, Comments and Conclusions........................................................... 30
ix3 Reference Algorithm........................................................................... 33
3.1 Modeling.......................................................................................................... 34
3.2 Constructive Heuristic by Xie et al.................................................................. 38
3.2.1 Partitioning Algorithm .......................................................................... 38
3.2.2 Calculation of List Scheduling Priorities .............................................. 41
3.2.3 Detection of Conditional Branches ....................................................... 42
3.2.4 Examples ............................................................................................... 44
3.3 Reference Constructive Algorithm.................................................................. 45
3.3.1 Algorithm Adjustments......................................................................... 45
3.3.2 Implementation ..................................................................................... 46
3.3.3 Improved Condition Support................................................................. 47
3.4 Performance Evaluation of the Reference Algorithm ..................................... 49
3.4.1 Generation of synthetic test pattern....................................................... 49
3.4.2 Evaluation Environment and Tools 52
3.4.3 Evaluated Partitioning Algorithms........................................................ 54
3.4.4 Design Model and Architecture Assumptions ...................................... 55
3.4.5 Results ................................................................................................... 56
3.5 Summary and Conclusions .............................................................................. 59
4 Enhanced Constructive Algorithm.................................................... 61
4.1 Partitioning Issues and Motivation for Improvement...................................... 62
4.2 Look-Ahead ..................................................................................................... 65
4.2.1 Resource Selection ................................................................................ 65
4.2.2 Sequencing of Task Scheduling ............................................................ 68
4.2.3 Algorithm Improvements for Resource Selection................................. 68
4.2.4 Algorithm Improvements for Sequencing of Task Scheduling............. 72
4.3 Clustering......................................................................................................... 73
4.3.1 Different Design Objectives.................................................................. 74
x