CS366 Programming Languages
Spring 2015
Boston College

Prof. Muller

Lecture: 1

A language is a system that enables the expression of ideas.

A programming language is a system that enables the expression of
algorithms.  Generally speaking, the programmer is expressing the
computational steps to be carried out by a computing device.

This course is concerned with the design, specification and
implementation of programming languages.

Major Take-aways from this course: 

  1. A deeper understand of computer software that will yield benefits
  for many years.

     - You will be able to master new programming languages much
     - faster than most people, You will be able to design and
     - implement new languages when you need to.

  2. Introduction to an interesting and increasingly important style
  of programming.

Language Design

There are thousands of different programming languages with varying
purposes and user communities. Some, such as C and Java, are general
purpose, some with large user communities, some with small. Some are
special purpose such as postscript or PDF.

Almost all programming languages are text-based and support the
manipulation of multiple types of data, almost all provide for the
introduction and management of symbolic NAMES that are meaningful to
humans.

For the purposes of this course, we will consider general purpose
programming languages, their essential properties and design
desiderata.

Almost all general purpose programming languages are centered around
the idea of a function (procedure, method or routine). For the
purposes of this course, we will emphasize the interests of the
software CONSUMER.

The relationship between:
 1. the consumer,
 2. the programmer,
 3. the compiler developer and
 4. the language designer.

For the purposes of this class, the design goal that we seek to achieve is:

To design programming languages in such a way that the compiler
developer can implement the language in such a way that the software
can execute using reasonable resources and so that the software
consumer is justified in being confident that the software does what
the programmer thinks it does.

                    Efficiency

                    /         \

        Reliability     ---    Ease of Development & Maintenance

The core idea of this course is to design and develop a sequence of
increasingly more realistic languages, with variations, introducing
one key feature at a time and seeing how the new feature impacts the
language. We will progress through 5 or 6 programming languages.

Reliability

Typed vs. Untyped Languages

A TYPE is an annotation for a variable.  

A language in which variables can be consistently associated with
types is called a TYPED LANGUAGE.  Otherwise the language is UNTYPED.

Explicitly typed .vs. Implicitly typed languages.

A TYPE SYSTEM is that part of a typed language that keeps track of the
types associated with variables and expressions.

Execution Errors

  Trapped Errors
    software trap (e.g., divide by zero)
    hardware trap (e.g., overflow)

  Untrapped Errors
    Array index out of bounds

A language is SAFE if it's implementation does not allow Untrapped Errors.

An untyped language can enforce safety by performing run-time checks.

Typed languages may enforce safety by statically rejecting all
programs that are potentially unsafe.

Typed languages may also use a mixture of run time and static checks.

Typed languages usually aim to rule out also large classes of trapped
errors, along with the untrapped ones.

Execution errors and well-behaved programs

For any given language, we may designate a subset of the possible
execution errors as FORBIDDEN errors. The forbidden errors should
include all of the untrapped errors, plus a subset of the trapped
errors.

A program fragment is said to be WELL-BEHAVED, if it does not cause
any forbidden error to occur.

A well behaved fragment is safe. A language where all of the (legal)
programs have good behavior is called strongly checked.

Thus, with respect to a given type system, the following holds for a
strongly checked language:

• No untrapped errors occur (safety guarantee).

• None of the trapped errors designated as forbidden errors occur.

• Other trapped errors may occur; it is the programmer’s
  responsibility to avoid them.

Typed languages can enforce good behavior (including safety) by
performing static (i.e., compile time) checks to prevent unsafe and
ill behaved programs from ever running.

These languages are statically checked; the checking process is called
typechecking, and the algorithm that performs this checking is called
the typechecker.

A program that passes the typechecker is said to be well typed;
otherwise, it is ill-typed, which may mean that it is actually
ill-behaved, or simply that it could not be guaranteed to be well
behaved. 

Examples of statically checked languages are ML and Pascal (with the
caveat that Pascal has some unsafe features).

Untyped languages can enforce good behavior (including safety) in a by
performing sufficiently detailed run time checks to rule out all
forbidden errors. (For example, they may check all array bounds, and
all division operations, generating recoverable exceptions when
forbidden errors would happen.) The checking process in these
languages is called dynamic checking; LISP is an example of such a
language. These languages are strongly checked even though they have
neither static checking, nor a type system.

Even statically checked languages usually need to perform tests at run
time to achieve safety. For example, array bounds must in general be
tested dynamically. The fact that a language is statically checked
does not necessarily mean that execution can proceed entirely blindly.

Polymorphism : the ability to reuse code

Untyped .vs. Typed Languages

Specification

  PL = Syntax + Semantics

    Syntax

      Grammars and Parsers

      We can use the theory of grammars to automate much of the
      processing of the syntactic form of a language

    Semantics

      Human utterances have meaning. If I tell you to "Please turn it
      down." you know what I mean and (maybe) take action.

      What does a program mean? A really really fascinating
      area. (Programming Language Semantics).

      We will use so-called "Natural Semantics"

Implementation

  We will use the programming language F# to write interpreters and
  mini-compilers.

  Our interpreters will interpret ASTs

  Our compilers will translate from ASTs to byte code for stack
  machines.

  We'll use either sublime text or the emacs editor together some
  F# customization code as well as make as our IDE.

  The structure of a simple language implementation:

  pgm -> LEXER -> token stream -> PARSER -> ast -> INTERPRETER

  or

  pgm -> LEXER -> token stream -> PARSER -> ast -> TRANSLATOR -> bytecode

  where the byte code is in the language of a given VIRTUAL MACHINE
  (VM) e.g., JVM, .NET, ...

  or

  pgm -> LEXER -> token stream -> PARSER -> ast -> TRANSLATOR -> ast
  -> optimizer -> machinecode

  where the machinecode is in the native language of a particular
  computing device.

History of PLs

  Assembly Language, Fortran (1953, Backus), Algol (1958, Backus et
  al), COBOL (1959, Hopper) Simula (1962 Dahl), Pascal (1968, Wirth),
  C (1969 Ritchie), C++ (1979 Stroustrup), Java (1991 Gosling, et al),
  ...,

  LISP (1960, McCarthy), ISWIM (1965, Landin), Scheme (1974, Steele),
  ML (1970s Milner), Smalltalk (1980, Kay), Python (1989 van Rossum),
  Haskell (1992, Peyton-Jones et al), ..., Javascript (1995, Eich),
  OCaml (1996, Leroy et al), Ruby (1990s, Matsumoto), ...

Common Theme:

  functions

Course Admin

  45% problem sets
  45% for 3 exams
  10% class participation

  Homework submitted via Canvas

Tools

  F#, sublime or emacs, make

  Brief demo of each

Mathematical Preliminaries  

  Nomenclature

    Gamma, gamma, alpha, beta, delta, epsilon, lambda

    Set Theory - sets, relations, orders and maps