A flow analysis framework for realistic scheme programs [Elektronische Ressource] / vorgelegt von Eric Jean Knauel

De
A flow-analysis framework forrealistic Scheme programsDissertationder Fakultat¨ fur¨ Informations- und Kognitionswissenschaftender Eberhard-Karls-Universitat¨ Tubing¨ enzur Erlangung des Grades einesDoktors der Naturwissenschaften (Dr. rer. nat.)vorgelegt vonDipl.-Inform. Eric Jean Knauelaus Buchholz in der NordheideTubingen¨2008Tag der mun¨ dlichen Qualifikation: 7. Mai 2008Dekan: Prof. Dr. Michael Diehl1. Berichterstatter: Prof. Dr. Herbert Klaeren2. Berich Prof. Dr. Schroeder-HeisterAbstractIt is possible to scale control-flow analyses for higher-order languages to com-plete, fully-fledged programming languages and consequently compute the flowanalysis of realistic programs. This dissertation gives a formal specification ofa universal flow-analysis framework for the functional programming languagesScheme and PreScheme that covers all aspects and features of these languages.Moreover, the dissertation proposes new implementation strategies and tech-niques that have been developed and tested in context of an implementationfor Scheme 48. These techniques yield an efficient implementation that enablesanalysis of realistic programs.ZusammenfassungEs ist moglich, Kontrollflussanalysen fur Sprachen hoherer Ordnung auf reali-¨ ¨ ¨stische Programme anzuwenden. Diese Dissertation spezifiziert eine universellverwendbare Flussanalyse fur die funktionalen Programmiersprachen Scheme¨und PreScheme, die alle Aspekte und Fahigkeiten dieser Sprachen abdeckt.
Publié le : mardi 1 janvier 2008
Lecture(s) : 21
Tags :
Source : TOBIAS-LIB.UB.UNI-TUEBINGEN.DE/VOLLTEXTE/2008/3363/PDF/DRUCKVERSION_CLICK.PDF
Nombre de pages : 166
Voir plus Voir moins

A flow-analysis framework for
realistic Scheme programs
Dissertation
der Fakultat¨ fur¨ Informations- und Kognitionswissenschaften
der Eberhard-Karls-Universitat¨ Tubing¨ en
zur Erlangung des Grades eines
Doktors der Naturwissenschaften (Dr. rer. nat.)
vorgelegt von
Dipl.-Inform. Eric Jean Knauel
aus Buchholz in der Nordheide
Tubingen¨
2008Tag der mun¨ dlichen Qualifikation: 7. Mai 2008
Dekan: Prof. Dr. Michael Diehl
1. Berichterstatter: Prof. Dr. Herbert Klaeren
2. Berich Prof. Dr. Schroeder-HeisterAbstract
It is possible to scale control-flow analyses for higher-order languages to com-
plete, fully-fledged programming languages and consequently compute the flow
analysis of realistic programs. This dissertation gives a formal specification of
a universal flow-analysis framework for the functional programming languages
Scheme and PreScheme that covers all aspects and features of these languages.
Moreover, the dissertation proposes new implementation strategies and tech-
niques that have been developed and tested in context of an implementation
for Scheme 48. These techniques yield an efficient implementation that enables
analysis of realistic programs.
Zusammenfassung
Es ist moglich, Kontrollflussanalysen fur Sprachen hoherer Ordnung auf reali-¨ ¨ ¨
stische Programme anzuwenden. Diese Dissertation spezifiziert eine universell
verwendbare Flussanalyse fur die funktionalen Programmiersprachen Scheme¨
und PreScheme, die alle Aspekte und Fahigkeiten dieser Sprachen abdeckt. Fur¨ ¨
die praktische Umsetzung dieser Analyse werden neuartige Implementierungs-
strategienund-technikenvorgestellt,dieimRahmeneinerImplementierungfur¨
Scheme 48 entwickelt und erprobt wurden. Diese Techniken fuh¨ ren zu einer effi-
zienten Implementierung, welche die Analyse realistischer Programme erlaubt.4Contents
1 Introduction 9
1.1 Implementation Scalability . . . . . . . . . . . . . . . . . . . . . 9
1.2 Executable semantic model . . . . . . . . . . . . . . . . . . . . . 12
1.3 Setting the scene . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Structure of this dissertation . . . . . . . . . . . . . . . . . . . . 14
2 Intermediate Language 17
2.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Printing CPS Programs . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Complete programs . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4.1 Complete PreScheme programs . . . . . . . . . . . . . . . 30
2.4.2 Complete Scheme programs . . . . . . . . . . . . . . . . . 31
2.4.3 Treating complete programs . . . . . . . . . . . . . . . . . 31
2.4.4 Initial state . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3 Flow analysis of Programs 33
3.1 Control flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Using control-flow information for optimization . . . . . . . . . . 35
3.3 Control-flow analysis for higher-order languages . . . . . . . . . . 36
4 Analysis Framework 39
4.1 Semantic domains . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 State transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Semantic correctness . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Language-specific abstractions. . . . . . . . . . . . . . . . . . . . 52
4.4.1 Abstracting PreScheme values . . . . . . . . . . . . . . . 53
4.4.2g Scheme values . . . . . . . . . . . . . . . . . 60
d4.5 Precision and Time . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.5.1 k-CFA time abstractions . . . . . . . . . . . . . . . . . . . 74
4.6 Flow analysis examples. . . . . . . . . . . . . . . . . . . . . . . . 75
5 Executable Semantic Model 83
5.1 Example trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.2 Implementation with PLT Redex . . . . . . . . . . . . . . . . . . 86
5.2.1 Grammar definition . . . . . . . . . . . . . . . . . . . . . 86
5.2.2 Metafunctions . . . . . . . . . . . . . . . . . . . . . . . . 90
56 CONTENTS
5.2.3 Reduction relations . . . . . . . . . . . . . . . . . . . . . . 100
5.2.4 Generating traces . . . . . . . . . . . . . . . . . . . . . . . 102
6 Implementation 105
6.1 Abstract syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2 Semantic domains . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.2.1 Representing time . . . . . . . . . . . . . . . . . . . . . . 106
6.2.2 Representing the binding environment . . . . . . . . . . . 106
6.2.3 Representing variable environments . . . . . . . . . . . . 107
6.2.4 Representing abstract values . . . . . . . . . . . . . . . . 107
6.2.5 Implementing the visited set . . . . . . . . . . . . . . . . 112
6.3 Description of flow information . . . . . . . . . . . . . . . . . . . 114
6.4 Analysis with Garbage Collection . . . . . . . . . . . . . . . . . . 116
6.4.1 Semantics with global garbage collection . . . . . . . . . . 117
6.4.2 Abstract values with GC marks . . . . . . . . . . . . . . . 122
6.4.3 Garbage Collection and variable environment . . . . . . . 124
6.4.4 Splitting the root set . . . . . . . . . . . . . . . . . . . . . 127
6.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.6 Testing and Debugging . . . . . . . . . . . . . . . . . . . . . . . . 131
7 Conclusion 133
7.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
A Notation 137
B Semantic Correctness 139
C PreScheme Primops 149
C.1 Primops with one return value . . . . . . . . . . . . . . . . . . . 149
C.2 Pr for compound values . . . . . . . . . . . . . . . . . . . . 151
C.3 Primops with multiple return values . . . . . . . . . . . . . . . . 153
D Scheme Primops 155
D.1 Arithmetic Primops . . . . . . . . . . . . . . . . . . . . . . . . . 157
D.2 Primops for compound values . . . . . . . . . . . . . . . . . . . . 158
D.3 Procedures with rest list argument . . . . . . . . . . . . . . . . . 161
D.4 Multiple return values . . . . . . . . . . . . . . . . . . . . . . . . 162Acknowledgments
First,IthankmyadvisorProfessorHerbertKlaeren. ProfessorKlaerenprovided
a pleasant work environment and allowed me great latitude in my choice of
research activities, and agreed with this dissertation project. I thank Professor
Peter Schroeder-Heister who immediately agreed to co-review this dissertation.
I am greatly indebted to Mike Sperber. Mike was the first to direct my
attention to research in compiler construction and especially to flow analyses.
He wakened my curiosity for these subjects. I am deeply thankful that Mike
shared his extensive knowledge with me, and the time and effort he spent read-
ing the various drafts of this dissertation. Mike’s profound remarks, numerous
correctionsandsuggestions—theremainingmistakesaremine—reallyhelped
me improve this work.
My colleague Marcus Crestani deserves special thanks. Marcus not only
carefully proof-read this dissertation and suggested valuable improvements, but
also relieved me of parts of my teaching workload during the last semester.
I am grateful to Matthew Might and Olin Shivers for patiently answering
all my e-mails on flow analyses, garbage collection for flow analyses, and frame
strings. Matthew Might also gave me access to his ΔCFA and ΓCFA implemen-
tations. Matthew and Olin also shared draft versions of their latest research
results with me. For all of this, I am grateful.
I thank Robby Findler for answering my questions on PLT Redex — using
this tool to model the semantics ranks among the most pleasant parts of this
work.
Martin Gasbichler is an expert on Scheme 48 and I thank him for patiently
answering all my questions on the byte code, native code, and PreScheme com-
pilers, the virtual machine, and other aspects of this complex system. Espe-
cially, I thank Martin for helping me develop the id tables for Scheme 48. My
colleague Holger Gast was always available to help me with typesetting prob-
lems — I would probably still be struggling with strange T X problems withoutE
his help.
Frank Suh¨ l and Thomas Rasch, both close friends of mine, provided many
helpful comments and encouragement — for both I am grateful.
I am very grateful for the support from my parents Hermann and Marie-
Louise. Marie-Louise proof-read the whole work and suggested lots of linguistic
improvements. Both supported and encouraged me tireless from day one of my
studies till the end of this dissertation project.
Finally, I thank my wife Gertraud for her love, support, and patience during
the months I suffered from something she identified as the “compiling syn-
drome” — a state of mind, characterized by confusion and mental absence,
caused by long debugging sessions of flow analyses.8 CONTENTSChapter 1
Introduction
It is possible to scale control-flow analyses for higher-order languages to com-
plete, fully-fledged programming languages and consequently compute the flow
analysis of realistic programs.
Only few compilers for functional languages employ a flow analysis to gain
information on the run-time behavior of a program and subsequently optimize
the program. This failure of flow analysis for functional languages in practice is
paradoxical,asarichamountoftechnicalliteratureproposesnewtechniquesfor
computing analyses, develops more precise abstractions, and finds new applica-
tions for analyses. The following observations help understanding the reasons
for this paradox:
• Technicalliteratureconsidersminimalisttoylanguages,leavingopenmany
questions that arise when adapting an analysis for a real programming
language.
• There are very few reports on experience with implementation techniques
for flow analyses. Most literature only reports on prototype and proof-of-
concept implementations.
• Only few descriptions of flow analyses show how they scale to realistic
programs.
This dissertation aims at eliminating these shortcomings.
This introduction describes the problems that arise when scaling to a com-
plete language (Section 1.1). Section 1.2 motivates the need for an executable
semantic model. Section 1.3 describes the compiler architecture that is the ba-
sis for my flow analysis. A summary of the contributions (Section 1.4) and an
overview of dissertation (Section 1.5) follow.
1.1 Implementation Scalability
Researchliteratureintheareaofflowanalysesforhigher-orderlanguagespresents
new ideas as formal specifications with corresponding soundness proofs, accom-
panied by measurements supposed to show the relevance and usefulness of the
910 CHAPTER 1. INTRODUCTION
proposed ideas. It is striking that almost all measurements take place in proof-
of-concept implementations, or at best, prototype implementations. These im-
plementations have in common that they feature only a minimal subset of lan-
guage constructs and are only applied to small programs.
However, there are a few notable exceptions:
• The Bigloo Scheme compiler [Serrano, 2004] uses a special-purpose flow
analysis to optimize closure allocation [Serrano, 1995].
• The MLton compiler for Standard ML uses a flow analysis to perform a
closure conversion that translates the code to a first-order intermediate
language [Cejtin et al., 2000]. Subsequently, MLton carries out optimiza-
tions on the first-order language.
• In the past, the Chez Scheme compiler used a flow analysis to improve
the inlining optimization [Jagannathan and Wright, 1996; Ashley, 1997].
The flow-directed inlining algorithm, however, was replaced in favor of a
special inlining algorithm that does not require a flow analysis [Waddell
and Dybvig, 1997] — the developers considered a flow analysis slow and
impractical [Dybvig, 2006].
Each of these exceptions use a flow analysis geared towards a single and specific
purpose. This dissertation, however, considers a flow analysis that produces
results that are useful for more than one application: It computes the control
flow of the program and information on the values used in a program. The
analysis is more general than the analyses described above as its results can
be used to drive more than one optimization. For example, the results may be
used for an inlining optimization as well as for the elimination of run-time type
checks. Additionally, this dissertation gives a formal specification that covers
the complete PreScheme and Scheme programming languages while the cited
publications only consider the aspects of the language that are relevant for the
specific purpose of the analysis.
Defining a flow analysis for a full programming language is different from
specifying an analysis for a small and artificial language. There are three chal-
lengesforscalingaflowanalysis: Addingsupportforcomplexlanguagefeatures,
integrationwithacompiler,anddevelopingefficientimplementationtechniques.
Consider language features first. The challenge in supporting a complete
language lies in the interaction and dependencies of the features: It is often not
possible to treat the features independently. Procedure calls with variable arity,
for example, may allocate heap objects and thus the analysis needs to model
heap objects. The abstraction functions for heap objects, on the other hand,
rely on calls to distinguish references. For a precise model of compound heap
objects a specific representation of integers in the analysis is indispensable, and
so on.
Consequently,theanalysisspecifiedinthisdissertationconsidersall features
of Scheme (and PreScheme) in concert. To my knowledge, this coherent and
complete specification of a general-purpose flow analysis for Scheme is novel. In
addition to the core features the specification also covers the following features:
• Procedure calls with variable arity and a precise abstraction of the rest
list.

Soyez le premier à déposer un commentaire !

17/1000 caractères maximum.