Flow-Based Function Customization in the Presence of Representation Pollution, by Allyn Dimock, Ian Westmacott, Robert Muller, Franklyn Turbak and Joe Wells. Submitted, November 16, 2000.
Abstract: The CIL compiler for core Standard ML compiles whole ML programs using a novel typed intermediate language that supports the generation of type-safe customized data representations. In this paper, we present empirical data comparing the relative efficacy of several different customization strategies for function representations. We develop a cost model to interpret dynamic counts of operations required for each strategy. One of our results is data showing that compiling with a function representation strategy that makes customization decisions based on the presence or absence of free variables of a function and which removes representation pollution by introducing multiway dispatch (what we call the selective sink splitting strategy) can produce better code than that produced by a defunctionalizing strategy similar to that in the literature.
[Compressed Postscript (144K)]
[DVI (175K)]
Program Representation Size in an Intermediate Language
with Intersection and Union Types,
by Allyn Dimock, Ian Westmacott, Robert Muller, Franklyn Turbak,
Joe Wells and Jeff Considine. To appear in the proceedings of
the Third International Workshop on Types in Compilation,
Montreal, CA, September 22, 2000.
Abstract:
The CIL compiler for core Standard ML compiles whole programs using the
CIL typed intermediate language with flow labels and intersection and
union types. Flow labels embed flow information in the types and
intersection and union types support precise polyvariant type and flow
information, without the use of type-level abstraction or quantification.
Compile-time representations of CIL types and terms are potentially
large compared to those for similar types and terms in systems based
on quantified types. The listing-based nature of intersection and
union types, together with flow label annotations on types, contribute
to the size of CIL types. The CIL term representation duplicates
portions of the program where intersection types are introduced and
union types are eliminated. This duplication makes it easier to
represent type information and to introduce multiple representation
conventions, but incurs a compile-time space cost.
This paper presents empirical data on the compile-time space costs of
using CIL. These costs can be made tractable using standard
hash-consing techniques. Surprisingly, the duplicating nature of CIL
has acceptable compile-time space performance in practice on the
benchmarks and flow analyses that we have investigated. Increasing
the precision of flow analysis can significantly reduce compile-time
space costs. There is also preliminary evidence that the flow
information encoded in CIL's type system can effectively guide data
customization.
This paper supersedes an earlier version
[DimockWestmacottMullerTurbakWellsConsidine00b] which appeared in an informal proceedings of the workshop.
[Compressed Postscript (231K)]
[DVI (313K)]
Space Issues in Compiling with Intersection and Union Types,
by Allyn Dimock, Ian Westmacott, Robert Muller, Franklyn Turbak,
Joe Wells and Jeff Considine. In the Informal Proceedings
of the Third International Workshop on Types in Compilation,
Montreal, CA, September 22, 2000. The Informal Proceedings
were published as CMU Technical Report CMU-CS-00-161.
Abstract:
The CIL compiler for core Standard ML compiles whole programs using the
CIL typed intermediate language with flow labels and intersection and
union types. Flow labels embed flow information in the types and
intersection and union types support precise polyvariant type and flow
information, without the use of type-level abstraction or quantification.
Compile-time representations of CIL types and terms are potentially
large compared to those for similar types and terms in systems based
on quantified types. The listing-based nature of intersection and
union types, together with flow label annotations on types, contribute
to the size of CIL types. The CIL term representation duplicates
portions of the program where intersection types are introduced and
union types are eliminated. This duplication makes it easier to
represent type information and to introduce multiple representation
conventions, but incurs a compile-time space cost.
This paper presents empirical data on the compile-time space costs of
using CIL. These costs can be made tractable using standard
hash-consing techniques. Surprisingly, the duplicating nature of CIL
has acceptable compile-time space performance in practice on the
benchmarks and flow analyses that we have investigated. Increasing
the precision of flow analysis can significantly reduce compile-time
space costs. There is also preliminary evidence that the flow
information encoded in CIL's type system can effectively guide data
customization.
This preliminary version of the paper is superseded by
[DimockWestmacottMullerTurbakWellsConsidine00a].
[Compressed Postscript (225K)]
[DVI (278K)]
[Bibtex Citation]
A Calculus with Polymorphic and Polyvariant Flow Types,
by Joe Wells, Allyn Dimock, Robert Muller and Franklyn Turbak.
To appear in the Journal of Functional Programming,
Cambridge University Press.
Abstract:
We present the CIL calculus, a typed lambda calculus which is intended
to serve as the basis of an intermediate language for optimizing compilers
for higher-order polymorphic programming languages. The
language CIL has function, product, sum and recursive
types. In addition, CIL has a novel formulation of
intersection and union types and flow labels on both terms
and types. The calculus is formulated to support the
integration of polyvariant flow information in a polymorphically
typed program representation. These flow types can
guide a compiler in generating customized data representations
in a strongly typed setting. We prove that the calculus
satisfies confluence and subject reduction properties.
This paper supersedes [WellsDimockMullerTurbak97].
[Compressed Postscript (259K)]
[DVI (262K)]
[Bibtex Citation]
Sufficient Conditions for the Existence of
Unique Minimal Fixpoints of Nonmonotomic Functions,
by Yuli Zhou and Robert Muller,
To appear in the Logic Journal of the IGPL, Oxford University
Press.
Abstract: We prove several lattice theoretical fixpoint
theorems based on the classical theorem of Knaster and Tarski. These
theorems give sufficient conditions for a system of generally
nonmonotonic functions on a complete lattice to define a unique
minimal fixpoint. The primary objective of this paper is to develop a
domain theoretic framework to study the semantics of logic programs as
well as various rule-based systems where the rules define generally
nonmonotonic functions on lattices.
This paper supersedes [ZhouMuller90].
[Compressed Postscript (K)]
[DVI (90K)]
Two Applications of Standardization and Evaluation in
Combinatory Reduction Systems,
by Robert Muller and Joe Wells. Revised version under review.
Abstract:
This paper presents two worked applications of a general framework
that can be used to support reasoning about the operational equality
relation defined by a programming language semantics.
The framework, based on Combinatory Reduction Systems, facilitates the
proof of standardization theorems for programming calculi.
The importance of standardization theorems to programming language
semantics was shown by Plotkin: standardization together with
confluence guarantee that two terms equated in the calculus are
semantically equal.
The framework is applied to the call-by-value lambda calculus and
to an untyped version of the CIL calculus.
The latter is a basis for an intermediate language being used in a
compiler.
This and the following paper are companion papers.
[Compressed Postscript (253K)]
[DVI (229K)]
[Bibtex Citation]
Standardization and Evaluation in Combinatory Reduction Systems,
by Joe Wells and Robert Muller. Working paper.
Abstract:
A rewrite system has standardization iff for any rewrite
sequence there is an equivalent one which contracts the redexes in a
standard order. Standardization is extremely useful for
finding normalizing strategies and proving that a rewrite system
for a programming language is sound with respect to the language's
operational semantics.
Although for some rewrite systems the standard-order can be simple,
e.g., left-to-right or outermost-first, many systems need a more
delicate order.
There are abstract notions of standard order which always apply, but
proofs (often quite difficult) are required that the rewrite system
satisfies a number of axioms and not much guidance is provided for
finding a concrete order that satisfies the abstract definition.
This paper gives a framework based on combinatory reduction systems
(CRS's) which is general enough to handle many programming languages.
If the CRS is orthogonal and fully extended and a good redex
ordering can be found, then a standard order is obtained together with
the standardization theorem.
If the CRS also satisfies further criteria, then a good redex ordering
is mechanically obtained from the rewrite rules.
If the CRS is a constructor system and satisfies an additional
requirement, then definitions of value and evaluation
providing an operational semantics are automatically obtained together
with a Plotkin/Wadsworth/Felleisen-style standardization theorem.
This and the preceding paper are companion papers.
Region Inference with Rank-2 Intersection Types and its
Formalization in Isabelle,
by Ian Westmacott, Robert Muller, Torben Amtoft and Joe Wells. Working
paper.
Abstract:
This working paper develops a Tofte-Talpin style region inference
system for terms in a polymorphic programming language with rank-2
intersection types. This subsumes core ML. Region inference systems
translate untyped programs into equivalent programs containing
explicit storage management annotations that allocate and
deallocate storage in a stack-like manner. We prove the soundness of
our translation with respect to the operational semantics. Except
where noted, all of the definitions, axioms, lemmas and theorems appearing
in this paper have been formalized in Isabelle and all proofs have been
mechanically verified.
Comments welcome!
Compiling with Polymorphic and Polyvariant Flow Types,
by Franklyn Turbak, Allyn Dimock, Robert Muller and Joe Wells, in the
Proceedings of the First International Workshop on Types in Compilation,
Amsterdam, June, 1997.
Abstract:
Optimizing compilers for function-oriented and object-oriented languages
exploit type and flow information for efficient implementation.
Although type and flow information (both control and
data flow) are inseparably intertwined, compilers usually compute
and represent them separately.
Partially, this has been a result of the usual polymorphic type systems
using universal and existential quantifiers, which are difficult to use
in combination with flow annotations.
In the Church Project, we are experimenting with intermediate
languages that integrate type and flow information into a single
flow type framework. This
integration facilitates the preservation of flow and type information
through program transformations. In this paper we describe CIL, an
intermediate language supporting polymorphic types and polyvariant
flow information and describe its application in program optimiziation.
We are experimenting with this intermediate language in a flow and
type-directed compiler for a functional language. The types of
CIL encode flow information (1) via labels that approximate
sources and sinks of values and (2) via intersection and union types,
finitary versions of universal and existential types that support type
polymorphism and (in combination with the labels) polyvariant flow
analysis. Accurate flow types expose opportunities for a wide range
of optimizing program transformations.
[Compressed Postscript (119K)]
[DVI (67K)]
[Bibtex Citation]
Strongly Typed Flow-Directed Representation Transformations,
by Allyn Dimock, Robert Muller, Franklyn Turbak and Joe Wells, in the
Proceedings of the International Conference on Functional Programming,
Amsterdam, June, 1997.
Abstract:
We present a new framework for transforming data representations in
a strongly typed intermediate language.
Our method allows both value producers (sources) and value consumers
(sinks) to support multiple representations,
automatically inserting any required code.
% to handle these possibilities.
Specialized representations can be easily chosen for particular
source/sink pairs.
The framework is based on these techniques:
[Compressed Postscript (187K)]
[Bibtex Citation]
A Typed Intermediate Language for Flow-Directed Compilation
,
by Joe Wells, Allyn Dimock, Robert Muller and Franklyn Turbak, in the
Proceedings of TAPSOFT'97: Theory and Practice of Software Development
TAPSOFT (CAAP/FASE), 7th International Joint Conference, Lille, France,
April, 1997.
Abstract:
We present a typed intermediate language for optimizing compilers for
function-oriented and polymorphically typed programming languages
(e.g., ML). The language CIL is a typed lambda calculus with product,
sum, intersection, and union types as well as function types annotated
with flow labels. A novel formulation of intersection and union types
supports encoding flow information in the typed program
representation. This flow information can direct optimization.
This paper is superseded by [WellsDimockMullerTurbak99].
[Compressed Postscript (143K)]
[Bibtex Citation]
A Staging Calculus and its Application to the Verification
of Translators (Preliminary Report), in the Proceedings
of the 21st Annual ACM SIGPLAN-SIGACT Symposium on Principles
of Programming Languages , pp. 389 - 396.
Abstract: We develop a calculus in which the computation steps
required to execute a computer program can be separated into discrete
stages. The calculus, denoted lambda-2, is embedded within the pure
untyped lambda calculus. The main result of the paper is a
characterization of sufficient conditions for confluence for terms in
the calculus. The condition can be taken as a correctness criterion
for translators that perform reductions in one stage leaving residual
redexes over for subsequent computation stages. As an application of
the theory, we verify the correctness of a macro expansion
algorithm. The expansion algorithm is of some interest in its own
right since it solves the problem of desired variable capture using
only the familiar capture avoiding substitutions.
[Compressed Postscript (92K)]
[DVI (62K)]
[Bibtex Citation]
Safe and Decidable Type Checking in an Object-Oriented Language
, by Kim B. Bruce, Jon Crabtree, Allyn Dimock, Robert Muller,
Thomas P. Murtagh, Robert van Gent, in the Proceedings of the
ACM OOPSLA '93 Conference on Object-Oriented Programming Systems,
Languages and Applications.
Abstract: Over the last several years, much
interesting work has been done in modelling object-oriented
programming languages in terms of extensions of the bounded
second-order lambda calculus, $F_{\le}$. Unfortunately, it has
recently been shown by Pierce (\cite{Pierce92}) that type checking
$F_{\le}$ is undecidable. Moreover he showed that the undecidability
arises in the seemingly simpler problem of determining whether one
type is a subtype of another. In \cite{Bruce92,BrPOPL93}, the first
author introduced a statically-typed, functional, object-oriented
programming language, TOOPL, which supports classes, objects, methods,
instance variables, subtypes, and inheritance. The semantics of TOOPL
is based on $F_{\le}$, so the question arises whether type checking in
this language is decidable. In this paper we show that type checking
for TOOPLE, a minor variant of TOOPL (Typed Object-Oriented
Programming Language), is decidable. The proof proceeds by showing
that subtyping is decidable, that all terms of TOOPLE have minimum
types (which are in fact computable), and then using these two results
to show that type checking is decidable. Interestingly, conditional
statements introduce significant problems which necessitated the
computation of generalized least upper bounds of types. The
interaction of the least upper bounds with the implicit recursion in
object and class definitions and the contravariant nature of function
spaces makes the computation of appropriate least upper bounds more
subtle than might be expected. Our algorithm fails to be polynomial in
the size of the term because the size of the type of a term can be
exponential in the size of the term. Nevertheless, it performs well in
practice.
This paper concentrates on the language without instance variables,
though the results can be extended to the full language, at the cost
of some added complexity.
[Compressed Postscript (155K)]
[DVI (110K)]
[Bibtex Citation]
M-LISP: A Representation Independent Dialect of LISP
with Reduction Semantics,
by Robert Muller,
ACM Transactions on Programming Languages and Systems, Vol. 14, No. 4,
October, 1992, pp. 589-616.
Abstract:
In this paper we introduce M-LISP, a simple new dialect of LISP which is
designed with an eye toward reconciling LISP's metalinguistic power with
the structural style of operational semantics advocated by Plotkin
[Plo75]. We begin by reviewing the original definition of LISP
[Mc60] in an attempt to clarify the source of its metalinguistic
power. We find that it arises from a problematic clause in this definition.
We then define the abstract syntax and operational semantics of M-LISP,
essentially a hybrid of M-expression LISP and Scheme. Next, we tie the
operational semantics to the corresponding equational logic. As usual,
provable equality in the logic implies operational equality.
Having established this framework we then extend M-LISP with the
metalinguistic eval and reify operators (the latter is
a non-strict operator which converts its argument to its metalanguage
representation.) These operators encapsulate the metalinguistic
representation conversions that occur globally in S-expression LISP.
We show that the naive versions of these operators render LISP's
equational logic inconsistent. On the positive side, we show that a
naturally restricted form of the eval operator is confluent and
therefore is a conservative extension of M-LISP. Unfortunately, we
must weaken the logic considerably to obtain a consistent theory of
reification.
This paper supersedes [Muller91].
[Compressed Postscript (149K)]
[DVI (105K)]
[Bibtex Citation]
Abstract Interpretation in Weak Powerdomains,
by Robert Muller and Yuli Zhou,
in the
Proceedings of the ACM Conference on LISP and
Functional Programming, June 1992.
Abstract:
We introduce a class of semantic domains, weak powerdomains,
that are intended to serve as value spaces for abstract
interpretations in which safety is a concern. We apply them
to the analysis of PCF programs. In the classical abstract
interpretation approach, abstract domains are constructed
explicitly and the abstract semantics is then related to the
concrete semantics. In the approach presented here, abstract
domains are derived directly from concrete domains. The
conditions for deriving the domains are intended to be as general
as possible while still guaranteeing that the derived domain has
sufficient structure so that it can be used as a basis for
computing correct information about the concrete semantics. We
prove three main theorems, the last of which ensures the
correctness of abstract interpretation of PCF programs given safe
interpretations of the constants. This generalizes earlier results
obtained for the special case of strictness analysis.
[Compressed Postscript (102K)]
[DVI (58K)]
[Bibtex Citation]
MLISP: Its Natural Semantics and Equational Logic (Extended
Abstract),
by Robert Muller,
in the
Proceedings of the ACM SIGPLAN and IFIP Symposium on
Partial Evaluation and Semantics Based Program Manipulation
, June 1991.
Abstract:
The LISP evaluator is a virtual machine analog of the stored-program
computer on which it executes --- it has universal power and dynamically
constructed representations of programs can be converted by the eval
operator into executable programs. In this paper we study the natural
operational semantics and equational logic of a dialect of pure LISP
and an extension which includes the eval operator and fexprs (i.e.,
non-strict functions whose arguments are passed by-representation).
We begin by defining a natural operational semantics for the pure subset
of M-LISP, a simple, representation-independent hybrid of McCarthy's
original M-expression LISP and Scheme. We then establish the connection
between the semantics and its equational logic in the usual way and prove
that the logic is sound and consistent.
With this as our setting we define the axioms and inference rules which
give a deterministic operational semantics for the eval operator and
fexprs. While the new constructions achieve the same ends as their
S-expression LISP counterparts, they do so in somewhat different ways.
The new versions encapsulate within the constructions the
representation decoding which occurs globally in S-expression LISP.
Turning to the equational logics we confirm the folklore that these
constructs are in some sense unreasonable: as they are traditionally
defined, neither construct is Church-Rosser (CR). On the positive side
we show that a naturally restricted form of the eval operator is CR and
therefore its equational theory is consistent.
This paper is superseded by [Muller92].
[Compressed Postscript (106K)]
[DVI (67K)]
[Bibtex Citation]
Domain Theory for Nonmonotonic Functions,
by Yuli Zhou and Robert Muller,
Proceedings of the Second
International Conference on Algebraic and Logic Programming, Nancy,
France, October 1990.
Abstract:
We prove several lattice theoretical fixpoint theorems based on
the classical theorem of Knaster and Tarski. These theorems give sufficient
conditions for a system of generally nonmonotonic functions on
a complete lattice to define a unique minimal fixpoint. The primary objective of
this paper is to develop a domain theoretic framework to study the semantics
of logic programs as well as various rule-based systems where the
rules define generally nonmonotonic functions on lattices.
This paper is superseded by [ZhouMuller98].
[Compressed Postscript (117K)]
[DVI (69K)]
[Bibtex Citation]
Proceedings of the First International Workshop on Types in Compilation
(TIC97), Amsterdam, The Netherlands, June, 1997,
published as Technical Report BCCS-97-03, Computer Science Department,
Boston College.
Abstract:
This is the informal proceedings of the TIC97 workshop.
Lecture Notes on Domain Theory,
Harvard University CRCT Technical Report TR-06-92.
Abstract:
What follows is a collection of lecture notes for a series of three
lectures introducing Scott's {\em domain theory}. The lecture notes are
not intended to serve as a primary reference but rather as a supplement
to a more comprehensive treatment such as \cite{sc86} or \cite{st77}.
Abstract Semantics of First-Order Recursive Schemes,
with Y. Zhou, Harvard University CRCT Technical Report TR-08-91.
Abstract:
We develop a general framework for deriving abstract domains from
concrete semantic domains in the context of first-order recursive
schemes and prove several theorems which ensure the correctness
(safety) of abstract computations. The abstract domains, which we
call {\em Weak Hoare powerdomains}, subsume the roles of both the
abstract domains and the collecting interpretations in the abstract
interpretation literature.
Semantic Prototyping in M-LISP:
A Representation Independent Dialect
of LISP with Reduction Semantics,
Harvard University CRCT Technical Report TR-05-90.
Abstract:
In this paper we describe a new semantic metalanguage which simplifies
prototyping of programming languages. The system integrates Paulson's
semantic grammars within a new dialect of LISP, M-LISP, which has somewhat
closer connections to the $\lambda$-calculus than other LISP dialects
such as Scheme. The semantic grammars are expressed as attribute grammars.
The generated parsers are M-LISP functions that can return denotational
(i.e., higher-order) representations of abstract syntax. We illustrate
the system with several examples and compare it to related systems.
Syntax Macros in M-LISP:
A Representation Independent Dialect
of LISP with Reduction Semantics,
Harvard University CRCT Technical Report TR-04-90.
Abstract:
In this paper we present an efficient algorithm for avoiding unintended
name captures during syntax macro transcription in LISP. The algorithm
is a hybrid of Kohlbecker's {\em Macro-by-Example} and {\em Hygienic}
algorithms adapted for a representation-independent dialect of LISP.
The adaptation yields a substantially different model of syntax macros
than is found in S-expression LISP dialects. The most important
difference is that $\lambda$-binding patterns become apparent when
an abstraction is first (i.e., partially) transcribed in the syntax
tree. This allows us to avoid a larger class of name captures than
is possible in S-expression dialects such as Scheme where
$\lambda$-binding patterns are not apparent until the tree
is completely transcribed.
As an instance of our framework, we provide a function representation
transformation that encompasses both closure conversion and inlining.
Our framework is adaptable to data other than functions.
Selected Technical Reports