CS366 Programming Languages Spring 2015 Boston College Prof. Muller Lecture: 1 A language is a system that enables the expression of ideas. A programming language is a system that enables the expression of algorithms. Generally speaking, the programmer is expressing the computational steps to be carried out by a computing device. This course is concerned with the design, specification and implementation of programming languages. Major Take-aways from this course: 1. A deeper understand of computer software that will yield benefits for many years. - You will be able to master new programming languages much - faster than most people, You will be able to design and - implement new languages when you need to. 2. Introduction to an interesting and increasingly important style of programming. Language Design There are thousands of different programming languages with varying purposes and user communities. Some, such as C and Java, are general purpose, some with large user communities, some with small. Some are special purpose such as postscript or PDF. Almost all programming languages are text-based and support the manipulation of multiple types of data, almost all provide for the introduction and management of symbolic NAMES that are meaningful to humans. For the purposes of this course, we will consider general purpose programming languages, their essential properties and design desiderata. Almost all general purpose programming languages are centered around the idea of a function (procedure, method or routine). For the purposes of this course, we will emphasize the interests of the software CONSUMER. The relationship between: 1. the consumer, 2. the programmer, 3. the compiler developer and 4. the language designer. For the purposes of this class, the design goal that we seek to achieve is: To design programming languages in such a way that the compiler developer can implement the language in such a way that the software can execute using reasonable resources and so that the software consumer is justified in being confident that the software does what the programmer thinks it does. Efficiency / \ Reliability --- Ease of Development & Maintenance The core idea of this course is to design and develop a sequence of increasingly more realistic languages, with variations, introducing one key feature at a time and seeing how the new feature impacts the language. We will progress through 5 or 6 programming languages. Reliability Typed vs. Untyped Languages A TYPE is an annotation for a variable. A language in which variables can be consistently associated with types is called a TYPED LANGUAGE. Otherwise the language is UNTYPED. Explicitly typed .vs. Implicitly typed languages. A TYPE SYSTEM is that part of a typed language that keeps track of the types associated with variables and expressions. Execution Errors Trapped Errors software trap (e.g., divide by zero) hardware trap (e.g., overflow) Untrapped Errors Array index out of bounds A language is SAFE if it's implementation does not allow Untrapped Errors. An untyped language can enforce safety by performing run-time checks. Typed languages may enforce safety by statically rejecting all programs that are potentially unsafe. Typed languages may also use a mixture of run time and static checks. Typed languages usually aim to rule out also large classes of trapped errors, along with the untrapped ones. Execution errors and well-behaved programs For any given language, we may designate a subset of the possible execution errors as FORBIDDEN errors. The forbidden errors should include all of the untrapped errors, plus a subset of the trapped errors. A program fragment is said to be WELL-BEHAVED, if it does not cause any forbidden error to occur. A well behaved fragment is safe. A language where all of the (legal) programs have good behavior is called strongly checked. Thus, with respect to a given type system, the following holds for a strongly checked language: • No untrapped errors occur (safety guarantee). • None of the trapped errors designated as forbidden errors occur. • Other trapped errors may occur; it is the programmer’s responsibility to avoid them. Typed languages can enforce good behavior (including safety) by performing static (i.e., compile time) checks to prevent unsafe and ill behaved programs from ever running. These languages are statically checked; the checking process is called typechecking, and the algorithm that performs this checking is called the typechecker. A program that passes the typechecker is said to be well typed; otherwise, it is ill-typed, which may mean that it is actually ill-behaved, or simply that it could not be guaranteed to be well behaved. Examples of statically checked languages are ML and Pascal (with the caveat that Pascal has some unsafe features). Untyped languages can enforce good behavior (including safety) in a by performing sufficiently detailed run time checks to rule out all forbidden errors. (For example, they may check all array bounds, and all division operations, generating recoverable exceptions when forbidden errors would happen.) The checking process in these languages is called dynamic checking; LISP is an example of such a language. These languages are strongly checked even though they have neither static checking, nor a type system. Even statically checked languages usually need to perform tests at run time to achieve safety. For example, array bounds must in general be tested dynamically. The fact that a language is statically checked does not necessarily mean that execution can proceed entirely blindly. Polymorphism : the ability to reuse code Untyped .vs. Typed Languages Specification PL = Syntax + Semantics Syntax Grammars and Parsers We can use the theory of grammars to automate much of the processing of the syntactic form of a language Semantics Human utterances have meaning. If I tell you to "Please turn it down." you know what I mean and (maybe) take action. What does a program mean? A really really fascinating area. (Programming Language Semantics). We will use so-called "Natural Semantics" Implementation We will use the programming language F# to write interpreters and mini-compilers. Our interpreters will interpret ASTs Our compilers will translate from ASTs to byte code for stack machines. We'll use either sublime text or the emacs editor together some F# customization code as well as make as our IDE. The structure of a simple language implementation: pgm -> LEXER -> token stream -> PARSER -> ast -> INTERPRETER or pgm -> LEXER -> token stream -> PARSER -> ast -> TRANSLATOR -> bytecode where the byte code is in the language of a given VIRTUAL MACHINE (VM) e.g., JVM, .NET, ... or pgm -> LEXER -> token stream -> PARSER -> ast -> TRANSLATOR -> ast -> optimizer -> machinecode where the machinecode is in the native language of a particular computing device. History of PLs Assembly Language, Fortran (1953, Backus), Algol (1958, Backus et al), COBOL (1959, Hopper) Simula (1962 Dahl), Pascal (1968, Wirth), C (1969 Ritchie), C++ (1979 Stroustrup), Java (1991 Gosling, et al), ..., LISP (1960, McCarthy), ISWIM (1965, Landin), Scheme (1974, Steele), ML (1970s Milner), Smalltalk (1980, Kay), Python (1989 van Rossum), Haskell (1992, Peyton-Jones et al), ..., Javascript (1995, Eich), OCaml (1996, Leroy et al), Ruby (1990s, Matsumoto), ... Common Theme: functions Course Admin 45% problem sets 45% for 3 exams 10% class participation Homework submitted via Canvas Tools F#, sublime or emacs, make Brief demo of each Mathematical Preliminaries Nomenclature Gamma, gamma, alpha, beta, delta, epsilon, lambda Set Theory - sets, relations, orders and maps