For this example, a new language called SimpleScala is introduced. SimpleScala is a hybrid between Scala and Haskell, with the following features of note:
- Top-level function definitions with
def, which require explicit parameter types (like Scala). Unlike Scala, an explicit return type is also required. - Algebraic data types, like Haskell.
New algebraic data types are introduced with the
algebraickeyword. - A restricted version of pattern matching, which does not support catch-all patterns or nested patterns.
- A special placeholder type, which, despite being in the syntax, cannot be used directly by programmers.
- Function arguments, local variables, and generic type specializations are completely inferred
- Tuples, but the only way to get values out of tuples is via pattern-matching. This makes inference easier, since it forces us to always know exactly how many fields a tuple has when we access it.
- String concatenation is a separate operator from integer addition.
This makes inference easier, as otherwise we would have nondeterminism during typechecking (i.e.,
e + ecould either return a string OR an integer, not both) - Higher-order functions always take exactly one parameter, for simplicity. If more parameters are desired, a tuple can be taken. If fewer parameters are desired, unit can be passed as a dummy value.
- Constructors always take exactly one parameter, for simplicity. The same workarounds as for higher-order functions apply to constructors, as well.
x is a variable
str is a string
b is a boolean
i is an integer
n is a natural number
fn is a function name
cn is a constructor name
un is a user-defined type name
T is a type variable
p ::= placeholder n // used internally during type inference; these denote
// types which may or may not be known at a given moment
type ::= string | boolean | integer // usuals
| unitType // common to functional programming; only one value of
// unitType exists, namely unit
| type => type // higher-order functions
| (type*) // tuples
| un[type*] // user-defined types
| T
| p
op ::= + | - | * | `/` // typical arithmetic operators
| && | `||` // typical boolean operators
| < | <= // typical integer comparison operators
| ++ // string concatenation
val ::= `val` x = exp // local variable definition
exp ::= x | str | b | i // the usuals
| unit // only value of type unitType
| exp op exp
| x => exp // higher-order function creation
| exp(exp) // calling a higher-order function
| fn(exp) // calling a top-level function
| if (exp) exp else exp
| {val* e} // blocks
| (exp*) // tuple creation
| cn(exp) // algebraic data type constructor
| exp match {case*} // pattern matching
case ::= `case` cn(x) => exp // constructor pattern
| `case` (x*) => exp // tuple pattern
tdef ::= algebraic un[T*] = cdef* // algebraic data type declaration
cdef ::= cn(type) // constructor declaration
def ::= `def` fn[T*](x: type): type = exp // top-level function definition
program ::= tdef* def* exp
A mathematical formalism of how this typechecker works has been provided in formalism.pdf.
It is not expected that you will be able to understand this in one sitting, but it's incredibly concise relative to the code.
The typechecker for SimpleScala has been implemented in both Scala and Java. The Java-based implementation is a very direct translation from the Scala-based implementation, and may be difficult to understand as a result. A s-expression-based parser has also been provided for both implementations. For the typechecker code, see:
- Scala:
scala/src/main/scala/polytyped/typechecker.scala - Java:
java/simplescala_experimentation/src/main/java/simplescala_experimentation/typechecker/Typechecker.java
The code was originally tested via an experimental automated approach. While it mostly lacks a traditional unit test suite, I am highly confident that it is correct.
With type inference, the basic idea is that we can infer the type of values used, based on context clues from the program. For example, consider the following code snippet in Scala, which features type inference:
val x = 42This snippet defines a local variable x, and assigns it the value 42.
The Scala compiler knows that x is of type Int (i.e., x is an integer), because 42 is an integer.
There is no need to explicitly say that x is an integer; the Scala compiler infers it.
As another example, consider the following:
// Java-like class definition
class Foo<A> {
public final A a;
public Foo(final A a) {
this.a = a;
}
}
...
// Scala-like variable definition
val f = new Foo(7)
In this case, the Scala compiler knows that f is of type Foo<Int>; there is no need to provide any annotations.
As humans, we're pretty good at figuring out the context clues. However, the compiler has to take a more incremental, stepwise approach. To do this, the compiler introduces the notion of a type which isn't known yet. In SimpleScala, this is what placeholders are for: they maintain that something is a type, but we might not know exactly what type yet. As values with placeholder types are used, we effectively "fill-in" the placeholder with information as we find it.
To show an example, let's consider again the following code snippet:
val f = new Foo(7)Going from left to right, the compiler does the following:
- At
val f, it knows thatfis of some yet-unknown type. It givesfa placeholder type, which will be filled in later. For discussion, we will call this placeholderp0. - At
=, it knows that the type offmust be the same as the type of the expression to the right of=. As such, the compiler states that bothfand the expression are of the same placeholder type, i.e., both are of typep0. - The compiler sees
new Foo. This expression creates something of typeFoo<p1>, wherep1is a different placeholder. The fact thatp1is used comes from the definition ofFoo, which took a type variable. We don't know exactly whatFoois going to take at this point, so we use a placeholder here. Putting it all together, from the prior reasoning with=, the compiler learns thatp0 = Foo<p1>. - The compiler sees
7. From the definition ofFoo, the parameter to the constructor has the same type as the generic type held inFoo(i.e., they both take something of typeA). As such,p1must be of the same type as whatever was passed in the constructor, sop1 = Int. - Putting everything together, the type of
fisp0,p0 = Foo<p1>, andp1 = Int. Substituting variables for their values, we ultimately get thatfis of typeFoo<Int>.
This setup makes use of unification, which is more commonly seen in logic programming languages like Prolog. Each placeholder acts as a logical variable, and these logical variables/placeholders are filled in later with type information. As with unification, once a variable gets a value, it can never be replaced by a different value.
Type errors can still occur when type inference is in play. For example, consider the following:
// Java-like class definition
class Bar<A> {
public Bar(A first, A second) { ... }
}
...
// Scala-like variable definition
val b = new Bar(7, "foo")
As humans, we can see a problem: Bar is supposed to take two values of the same type, but here we take two values of different types.
This causes a type error.
From the compiler's standpoint, it will figure out there is a type error via the following stepwise process:
- From
val b, the compiler doesn't know what typebis yet. It assigns it a placeholder type, sobis of typep0. - From
=, it knows that the expression must be of the same type asb. - From
new Bar, it knows that there is something of typeBar<p1>, wherep1is a new placeholder. Putting this all together,p0 = Bar<p1>. - From
7, it knows thatp1 = Int. - From
"foo", the compiler wants to assignp1 = String. However,p1already has a type, and that type isn'tString. As such, the compiler fails to typecheck at this point.
This situation plays out differently if no type error were present, like so:
val b = new Bar(7, 8)Steps 1-4 are the same as before.
However, at step 5, the compiler sees 8 instead of "foo".
The compiler sees that p1 = Int, and it's trying to assign Int again to p1.
While there is already a type, that type is the same as the type we attempt to put into it, which is ok as far as unification is concerned.
As such, this program snippet is well-typed, and it ultimately determines that b is of type Bar<Int> (b is of type p0, p0 = Bar<p1>, and p1 = Int).