Understanding
and Writing
Compilers
A do-it-yourself guide
Richard Bornat
Middlesex University, London.
richard@bornat.me.ukFirst published|1979.
Internet edition|2007; corrected 2008.
Copyright
c 1979, 2007, 2008 Richard Bornat.
Permission to copy without fee all or part of this material is granted provided
that the copies are not made or distributed for direct commercial advantage,
the copyright notice above and the title of the book appear, and notice is given
that copying is by permission of Richard Bornat.
Original edition published by Macmillan Press Ltd., Basingstoke, UK.
Internet by Richard Bornat, 28 Albany Road, LONDON N4
4RL, UK. richard@bornat.me.ukPreface to the online
edition
I wrote this book on compiling in the late 1970s. It was a success. I still meet
people who learnt about from it. I get three or four requests a year
from people who’d like a copy. I used to tell them to use Abebooks.com, but
now there aren’t any copies even there. Since I own the copyright (thanks to
Macmillan), I can publish it again, for free.
For a while I tried to reproduce the original book from the original nro source,
but Unix version 6 nro is long dead, and the printer-driver hacks that made
Abold and underlining work are even deader. So I hacked it into LT X, and hereE
it is. I xed the errors that I knew about, changed a very few infelicities, and
otherwise tried to change as little as possible.
Why isn’t it in C?
When I began to write this book I was heavily into BCPL. By the time it was
nished I was a C programmer. The display editor ded that I developed to write
the book was written in C. C is in many respects a direct descendant of BCPL.
I was pressed, even before the book came out, to rewrite all the examples in C.
I didn’t do that for various reasons, some good and some bad. It would be an
enormous e ort to do it now, because C is now so far from its origins.
Old-fashioned opinions
I agree with quite a lot of this book, but by no means all of it. I’ve decided to
change nothing. For better or worse, here it is.
iiiPreface
In the past compiler writers and designers seemed to form an elite group within
computing science, set apart by their esoteric knowledge and their ability to
produce large, important system programs which really worked. The admiration
of the computing public, whether it was once deserved or not, is no longer
merited now that the principles of programming-language implementation are
so well understood. Compiler-writing is no longer a mystery.
This book attempts to explain and demystify the principles of compiler writing
so that you can go out and build a working compiler of your own. There is
enough detail in this book for you to build a for quite a complicated
language { certainly PASCAL, perhaps ALGOL 68 or SIMULA 67 { but it
doesn’t attempt an encyclopaedic coverage of the eld. It is intended more as an
introduction to compiler-writing and a do-it-yourself kit for the compiler-writer,
giving enough detail for you to understand the principles of the subject, than as
a survey of past history or present horizons. The of interpretation are
close enough to those of compilation for chapter 19 to give a simple introduction
to interpreter writing.
The method of treatment and the relative amount of attention given to various
topics in this book re ects my own views about the relative importance of those
topics. There is a separate section on run-time support, less attention is paid
than is perhaps usual to the topic of parsing or syntax analysis and the discussion
of translation is totally oriented to tree-walking. I have presented the subject
in this way for both practical and educational reasons. First, the object code
instruction sequences which implement run-time support are more important
in practice than is usually recognised. It is di erences in run-time mechanisms,
as much as or more than anything else, which distinguish one language from
another { say SIMULA 67 from ALGOL 68, POP-2 from ALGOL 60 { and
the e ciency of run-time support code fragments is crucial to the e ciency of
the object program. Second, I believe it is more important to give a practical
description of syntax analysis in a book which is intended for the practical
compiler-writer than to give a more formal and complete introduction to the
topic. The syntax analysis mechanisms chosen for illustration in section IV] are
selected for their practical relevance. Of the three mechanisms presented, the
‘one-track’ and ‘operator-precedence’ mechanisms are now rather old-fashioned
but are still quite adequate to the task of parsing popular modern languages.
iiiiv
Finally, my insistence on tree-walking as the best means of translation is both
because it makes explanation of translation algorithms much easier and enables
me to bring out the topic of ‘crucial code fragments’ which forms so much of the
life of the professional compiler writer; also in my experience it is a practical
way in which both novice and expert can quickly build a working translator
containing the minimum number of errors.
Throughout the book I have emphasised the fact that the task of compilation can
be divided into separate modular sub-tasks. It is largely the identi cation of and
emphasis on this essential modularity that has clari ed the subject. Emphasis
on modular design also helps me to avoid discussing every known technique
for each of the tasks { if you know what the task is, one or two good ways to
accomplish it and how to recognise another good way if somebody shows you
one, then you are on the way to becoming a capable compiler writer.
Throughout the book I have emphasised the need for the compiler to provide
a service to its users. It seems to me that the demands of system or compiler
e ciency are too often given precedence over the justi able demands of the
user who wants understandable error reports, accurate and reliable object code
or strict adherence to an industry standard. The same goes, I believe, for the
demands of small compiler size or simplicity of construction. A good compiler
can be acceptably e cient and reasonably small yet still provide a good user
service. Indeed I believe that the well-designed compiler will out-perform the
‘e cient’ special-purpose construction, but even if it isn’t so the compiler writer
should stand by the principle that machines are provided to save human time
and e ort. A few seconds of machine time which saves minutes (or more often
hours) of human time is machine time well spent!
Host, Object and Source Language in Examples
In the examples throughout the book most algorithms are written in a version
of the language BCPL, details of which are brie y explained in appendix A.
Some example algorithms are in PASCAL: I would have used PASCAL more
extensively were it not for the fact that its lack of block structure, lack of
conditional expressions and lack of a simple ‘union-type’ convention forces an
obscure programming style. BCPL’s advantage is that it is untyped and that
therefore the examples can show the bare bones of an algorithm without too
much unnecessary hedging round of type declarations. At the same time this
is a drawback: BCPL’s lack of data structure declarations makes it di cult to
explain some fairly simple algorithms. Early examples give some explanation
of the data structures which are manipulated by the algorithms: it is worth
pointing out that in every case the values which are manipulated are pointers
to record-structures rather than the structures themselves.
Appendix B explains the assembly code which is used to illustrate the oper-
ation of the translation algorithms in sections II and III. It is the code of a
single-address, multi-register single-segment-addressing machine. Throughoutv
the book there is an emphasis on the machine-independence of compiler design
and the fact that details of the object machine’s instruction set don’t a ect the
design of the compiler. Nevertheless it is useful, when discussing translation al-
gorithms, to illustrate the code of an example object machine in order to show
the advantages of good design of code fragments.
With the present explosion in the use of microprocessors interest in compil-
ing has re-emerged, particularly interest in compiling system-programming lan-
guages. The problems of compiler-writing for small machines are mainly to do
with producing compact object code: the examples presented in this book are
not directly oriented towards this purpose but may be readily adapted to it.
The desire to produce a very small compiler, as opposed to a small object pro-
gram, should be curbed until you properly understand the principles of compiler
design (when perhaps the desire will go away!)
The source language which is used to illustrate syntax analysis and translation
algorithms has a variable syntax and semantics, oating somewhere in the space
between BCPL, ALGOL 60, ALGOL 68 and PASCAL. Some attention is given
to the detailed di culties which arise in writing a compiler for each of these
languages. I didn’t wish to limit the discussion to those features displayed by
any particular language, however, nor to get bogged down in the details of that
language. Most examples therefore relate to the general di culties of compiling
any language which includes recursive procedures, block-structuring, dynamic
arrays, pointer variables, and so on.
Acknowledgements
I’m enormously indebted to all the people who have argued with me about
compilers and compiling over the years, lectured to me, listened to me, corrected
me and generally helped me develop the ideas I present in this book. Many of
the good ideas are those of my mentors. Thanks especially to Je Rohl for
helping me in my rst steps and encouraging me to develop my ideas as far as
they have gone: thanks also to Bernard Sufrin for so many invaluable discussions
about the best way to build and to explain compilers. Thanks to my colleagues
at the University of Essex { Bruce Anderson, Mike Brady,