Haskell function type declarations

In this chapter, we describe the syntax and informal semantics of Haskell declarations .

module	→	module modid [ exports ] where body
\|	body
body	→
\|
\|
topdecls	→	topdecl ₁ ; … ; topdecl _n	( n ≥ 1)
topdecl	→	type simpletype = type
\|	data [ context => ] simpletype [ = constrs ] [ deriving ]
\|	newtype [ context => ] simpletype = newconstr [ deriving ]
\|	class [ scontext => ] tycls tyvar [ where cdecls ]
\|	instance [ scontext => ] qtycls inst [ where idecls ]
\|	default ( type ₁ , … , type _n )	( n ≥ 0)
\|	foreign fdecl
\|	decl
decls	→	< decl ₁ ; … ; decl _n >	( n ≥ 0)
decl	→	gendecl
\|	( funlhs \| pat ) rhs
cdecls	→	< cdecl ₁ ; … ; cdecl _n >	( n ≥ 0)
cdecl	→	gendecl
\|	( funlhs \| var ) rhs
idecls	→	< idecl ₁ ; … ; idecl _n >	( n ≥ 0)
idecl	→	( funlhs \| var ) rhs
\|	( empty )
gendecl	→	vars :: [ context => ] type	( type signature )
\|	fixity [ integer ] ops	( fixity declaration )
\|	( empty declaration )
ops	→	op ₁ , … , op _n	( n ≥ 1)
vars	→	var ₁ , … , var _n	( n ≥ 1)
fixity	→	infixl \| infixr \| infix

The declarations in the syntactic category topdecls are only allowed at the top level of a Haskell module (see Chapter 5 ), whereas decls may be used either at the top level or in nested scopes (i.e. those within a let or where construct).

For exposition, we divide the declarations into three groups: user-defined datatypes, consisting of type , newtype , and data declarations (Section 4.2 ); type classes and overloading, consisting of class , instance , and default declarations (Section 4.3 ); and nested declarations, consisting of value bindings, type signatures, and fixity declarations (Section 4.4 ).

Haskell has several primitive datatypes that are “hard-wired” (such as integers and floating-point numbers), but most “built-in” datatypes are defined with normal Haskell code, using normal type and data declarations. These “built-in” datatypes are described in detail in Section 6.1 .

4.1 Overview of Types and Classes

Haskell uses a traditional Hindley-Milner polymorphic type system to provide a static type semantics [4, 6] , but the type system has been extended with type classes (or just classes ) that provide a structured way to introduce overloaded functions.

A class declaration (Section 4.3.1 ) introduces a new type class and the overloaded operations that must be supported by any type that is an instance of that class. An instance declaration (Section 4.3.2 ) declares that a type is an instance of a class and includes the definitions of the overloaded operations—called class methods —instantiated on the named type.

For example, suppose we wish to overload the operations (+) and negate on types Int and Float . We introduce a new type class called Num :

class Num a where -- simplified class declaration for Num
(+) :: a -> a -> a -- (Num is defined in the Prelude)
negate :: a -> a

This declaration may be read “a type a is an instance of the class Num if there are class methods (+) and negate , of the given types, defined on it.”

We may then declare Int and Float to be instances of this class:

instance Num Int where -- simplified instance of Num Int
x + y = addInt x y
negate x = negateInt x

instance Num Float where -- simplified instance of Num Float
x + y = addFloat x y
negate x = negateFloat x

where addInt , negateInt , addFloat , and negateFloat are assumed in this case to be primitive functions, but in general could be any user-defined function. The first declaration above may be read “ Int is an instance of the class Num as witnessed by these definitions (i.e. class methods) for (+) and negate .”

More examples of type classes can be found in the papers by Jones [8] or Wadler and Blott [13] . The term ‘type class’ was used to describe the original Haskell 1.0 type system; ‘constructor class’ was used to describe an extension to the original type classes. There is no longer any reason to use two different terms: in this report, ‘type class’ includes both the original Haskell type classes and the constructor classes introduced by Jones.

4.1.1 Kinds

The symbol ∗ represents the kind of all nullary type constructors.
If κ ₁ and κ ₂ are kinds, then κ ₁ → κ ₂ is the kind of types that take a type of kind κ ₁ and return a type of kind κ ₂ .

Kind inference checks the validity of type expressions in a similar way that type inference checks the validity of value expressions. However, unlike types, kinds are entirely implicit and are not a visible part of the language. Kind inference is discussed in Section 4.6 .

4.1.2 Syntax of Types

type	→	btype [ -> type ]	( function type )
btype	→	[ btype ] atype	( type application )
atype	→	gtycon
\|	tyvar
\|	( type ₁ , … , type _k )	( tuple type , k ≥ 2)
\|	[ type ]	( list type )
\|	( type )	( parenthesised constructor )
gtycon	→	qtycon
\|	()	( unit type )
\|	[]	( list constructor )
\|	(->)	( function constructor )
\|	(, < , >)	( tupling constructors )

The syntax for Haskell type expressions is given above. Just as data values are built using data constructors, type values are built from type constructors . As with data constructors, the names of type constructors start with uppercase letters. Unlike data constructors, infix type constructors are not allowed (other than (->) ).

Type variables, written as identifiers beginning with a lowercase letter. The kind of a variable is determined implicitly by the context in which it appears.
Type constructors. Most type constructors are written as an identifier beginning with an uppercase letter. For example:
- Char , Int , Integer , Float , Double and Bool are type constants with kind ∗ .
- Maybe and IO are unary type constructors, and treated as types with kind ∗→∗ .
- The declarations data T . or newtype T . add the type constructor T to the type vocabulary. The kind of T is determined by kind inference.

The trivial type is written as () and has kind ∗ . It denotes the “nullary tuple” type, and has exactly one value, also written () (see Sections 3.9 and 6.1.5 ).
The function type is written as (->) and has kind ∗→∗→∗ .
The list type is written as [] and has kind ∗→∗ .
The tuple types are written as (,) , (,,) , and so on. Their kinds are ∗→∗→∗ , ∗→∗→∗→ ∗ , and so on.

For example, the type expression IO a can be understood as the application of a constant, IO , to the variable a . Since the IO type constructor has kind ∗→∗ , it follows that both the variable a and the whole expression, IO a , must have kind ∗ . In general, a process of kind inference (see Section 4.6 ) is needed to determine appropriate kinds for user-defined datatypes, type synonyms, and classes.

A function type has the form t ₁ -> t ₂ , which is equivalent to the type (->) t ₁ t ₂ . Function arrows associate to the right. For example, Int -> Int -> Float means Int -> (Int -> Float) .
A tuple type has the form ( t ₁ , … , t _k ) where k ≥ 2 , which is equivalent to the type (, … ,) t ₁ … t _k where there are k − 1 commas between the parenthesis. It denotes the type of k -tuples with the first component of type t ₁ , the second component of type t ₂ , and so on (see Sections 3.8 and 6.1.4 ).
A list type has the form [ t ] , which is equivalent to the type [] t . It denotes the type of lists with elements of type t (see Sections 3.7 and 6.1.3 ).

These special syntactic forms always denote the built-in type constructors for functions, tuples, and lists, regardless of what is in scope. In a similar way, the prefix type constructors (->) , [] , () , (,) , and so on, always denote the built-in type constructors; they cannot be qualified, nor mentioned in import or export lists (Chapter 5 ). (Hence the special production, “gtycon”, above.)

Although the list and tuple types have special syntax, their semantics is the same as the equivalent user-defined algebraic data types.

Notice that expressions and types have a consistent syntax. If t _i is the type of expression or pattern e _i , then the expressions (\ e ₁ -> e ₂ ) , [ e ₁ ] , and ( e ₁ ,e ₂ ) have the types ( t ₁ -> t ₂ ) , [ t ₁ ] , and ( t ₁ ,t ₂ ) , respectively.

With one exception (that of the distinguished type variable in a class declaration (Section 4.3.1 )), the type variables in a Haskell type expression are all assumed to be universally quantified; there is no explicit syntax for universal quantification [4] . For example, the type expression a -> a denotes the type ∀ a. a → a . For clarity, however, we often write quantification explicitly when discussing the types of Haskell programs. When we write an explicitly quantified type, the scope of the ∀ extends as far to the right as possible; for example, ∀ a. a → a means ∀ a. ( a → a ) .

4.1.3 Syntax of Class Assertions and Contexts

context	→	class
\|	( class ₁ , … , class _n )	( n ≥ 0)
class	→	qtycls tyvar
\|	qtycls ( tyvar atype ₁ … atype _n )	( n ≥ 1)
qtycls	→	[ modid . ] tycls
tycls	→	conid
tyvar	→	varid

A class assertion has form qtycls tyvar , and indicates the membership of the type tyvar in the class qtycls . A class identifier begins with an uppercase letter. A context consists of zero or more class assertions, and has the general form ( C ₁ u ₁ , … , C _n u _n ) where C ₁ , … , C _n are class identifiers, and each of the u ₁ , … , u _n is either a type variable, or the application of type variable to one or more types. The outer parentheses may be omitted when n = 1 . In general, we use cx to denote a context and we write cx => t to indicate the type t restricted by the context cx . The context cx must only contain type variables referenced in t . For convenience, we write cx => t even if the context cx is empty, although in this case the concrete syntax contains no => .

4.1.4 Semantics of Types and Classes

In this section, we provide informal details of the type system. (Wadler and Blott [13] and Jones [8] discuss type and constructor classes, respectively, in more detail.)

The Haskell type system attributes a type to each expression in the program. In general, a type is of the form ∀ u . cx ⇒ t , where u is a set of type variables u ₁ , … , u _n . In any such type, any of the universally-quantified type variables u _i that are free in cx must also be free in t . Furthermore, the context cx must be of the form given above in Section 4.1.3 . For example, here are some valid types:

Eq a => a -> a
(Eq a, Show a, Eq b) => [a] -> [b] -> String
(Eq (f a), Functor f) => (a -> b) -> f a -> f b -> Bool

In the third type, the constraint Eq (f a) cannot be made simpler because f is universally quantified.

The type of an expression e depends on a type environment that gives types for the free variables in e , and a class environment that declares which types are instances of which classes (a type becomes an instance of a class only via the presence of an instance declaration or a deriving clause).

Types are related by a generalization preorder (specified below); the most general type, up to the equivalence induced by the generalization preorder, that can be assigned to a particular expression (in a given environment) is called its principal type . Haskell’s extended Hindley-Milner type system can infer the principal type of all expressions, including the proper use of overloaded class methods (although certain ambiguous overloadings could arise, as described in Section 4.3.4 ). Therefore, explicit typings (called type signatures ) are usually optional (see Sections 3.16 and 4.4.1 ).

t ₂ is identical to S ( t ₁ ) .
Whenever cx ₂ holds in the class environment, S ( cx ₁ ) also holds.

A value of type ∀ u . cx ⇒ t , may be instantiated at types s if and only if the context cx [ s ∕ u ] holds. For example, consider the function double :

double x = x + x

The most general type of double is ∀ a. Num a ⇒ a → a . double may be applied to values of type Int (instantiating a to Int ), since Num Int holds, because Int is an instance of the class Num . However, double may not normally be applied to values of type Char , because Char is not normally an instance of class Num . The user may choose to declare such an instance, in which case double may indeed be applied to a Char .

4.2 User-Defined Datatypes

In this section, we describe algebraic datatypes ( data declarations), renamed datatypes ( newtype declarations), and type synonyms ( type declarations). These declarations may only appear at the top level of a module.

4.2.1 Algebraic Datatype Declarations

topdecl	→	data [ context => ] simpletype [ = constrs ] [ deriving ]
simpletype	→	tycon tyvar ₁ … tyvar _k	( k ≥ 0)
constrs	→	constr ₁ \| … \| constr _n	( n ≥ 1)
constr	→	con [ ! ] atype ₁ … [ ! ] atype _k	( arity con = k, k ≥ 0)
\|	( btype \| ! atype ) conop ( btype \| ! atype )	( infix conop )
\|	con < fielddecl ₁ , … , fielddecl _n >	( n ≥ 0)
fielddecl	→	vars :: ( type \| ! atype )
deriving	→	deriving ( dclass \| ( dclass ₁ , … , dclass _n ) )	( n ≥ 0)
dclass	→	qtycls

The precedence for constr is the same as that for expressions—normal constructor application has higher precedence than infix constructor application (thus a : Foo a parses as a : (Foo a) ).

⋅⋅⋅

An algebraic datatype declaration has the form: data cx => T u ₁ … u _k = K ₁ t ₁₁ … t _{1 k ₁} | | K _n t _{n 1} … t _{nk _n} where cx is a context. This declaration introduces a new type constructor T with zero or more constituent data constructors K ₁ , … , K _n . In this Report, the unqualified term “constructor” always means “data constructor”.

⋅⋅⋅

The types of the data constructors are given by: K _i :: ∀ u ₁ … u _k . cx _i ⇒ t _{i 1} → → t _{ik _i} → ( T u ₁ … u _k ) where cx _i is the largest subset of cx that constrains only those type variables free in the types t _{i 1} , … , t _{ik _i}. The type variables u ₁ through u _k must be distinct and may appear in cx and the t _ij ; it is a static error for any other type variable to appear in cx or on the right-hand-side. The new type constant T has a kind of the form κ ₁ → … → κ _k →∗ where the kinds κ _i of the argument variables u _i are determined by kind inference as described in Section 4.6 . This means that T may be used in type expressions with anywhere between 0 and k arguments.

For example, the declaration

data Eq a => Set a = NilSet | ConsSet a (Set a)

introduces a type constructor Set of kind ∗→∗ , and constructors NilSet and ConsSet with types

:: ∀ a. Eq a ⇒ a → Set a → Set a

In the example given, the overloaded type for ConsSet ensures that ConsSet can only be applied to values whose type is an instance of the class Eq . Pattern matching against ConsSet also gives rise to an Eq a constraint. For example:

f (ConsSet a s) = a

the function f has inferred type Eq a => Set a -> a . The context in the data declaration has no other effect whatsoever.

The visibility of a datatype’s constructors (i.e. the “abstractness” of the datatype) outside of the module in which the datatype is defined is controlled by the form of the datatype’s name in the export list as described in Section 5.8 .

The optional deriving part of a data declaration has to do with derived instances , and is described in Section 4.3.3 .

Labelled Fields A data constructor of arity k creates an object with k components. These components are normally accessed positionally as arguments to the constructor in expressions or patterns. For large datatypes it is useful to assign field labels to the components of a data object. This allows a specific field to be referenced independently of its location within the constructor.

A constructor definition in a data declaration may assign labels to the fields of the constructor, using the record syntax ( C < . >). Constructors using field labels may be freely mixed with constructors without them. A constructor with associated field labels may still be used as an ordinary constructor; features using labels are simply a shorthand for operations using an underlying positional constructor. The arguments to the positional constructor occur in the same order as the labeled fields. For example, the declaration

defines a type and constructor identical to the one produced by

data C = F Int Int Bool

Operations using field labels are described in Section 3.15 . A data declaration may use the same field label in multiple constructors as long as the typing of the field is the same in all cases after type synonym expansion. A label cannot be shared by more than one type in scope. Field names share the top level namespace with ordinary variables and class methods and must not conflict with other top level names in scope.

The pattern F <> matches any value built with constructor F , whether or not F was declared with record syntax .

Strictness Flags Whenever a data constructor is applied, each argument to the constructor is evaluated if and only if the corresponding type in the algebraic datatype declaration has a strictness flag, denoted by an exclamation point, “ ! ”. Lexically, “ ! ” is an ordinary varsym not a reservedop ; it has special significance only in the context of the argument types of a data declaration.

Translation: A declaration of the form data cx => T u ₁ … u _k = … | K s ₁ … s _n | … where each s _i is either of the form ! t _i or t _i , replaces every occurrence of K in an expression by (\ x ₁ … x _n -> ( (( K op ₁ x ₁ ) op ₂ x ₂ ) … ) op _n x _n ) where op _i is the non-strict apply function $ if s _i is of the form t _i , and op _i is the strict apply function $! (see Section 6.2 ) if s _i is of the form ! t _i . Pattern matching on K is not affected by strictness flags.

4.2.2 Type Synonym Declarations

topdecl	→	type simpletype = type
simpletype	→	tycon tyvar ₁ … tyvar _k	( k ≥ 0)

A type synonym declaration introduces a new type that is equivalent to an old type. It has the form type T u ₁ … u _k = t which introduces a new type constructor, T . The type ( T t ₁ … t _k ) is equivalent to the type t [ t ₁ ∕u ₁ , … , t _k ∕u _k ] . The type variables u ₁ through u _k must be distinct and are scoped only over t ; it is a static error for any other type variable to appear in t . The kind of the new type constructor T is of the form κ ₁ → … → κ _k → κ where the kinds κ _i of the arguments u _i and κ of the right hand side t are determined by kind inference as described in Section 4.6 . For example, the following definition can be used to provide an alternative way of writing the list type constructor:

Type constructor symbols T introduced by type synonym declarations cannot be partially applied; it is a static error to use T without the full number of arguments.

Although recursive and mutually recursive datatypes are allowed, this is not so for type synonyms, unless an algebraic datatype intervenes . For example,

type Rec a = [Circ a]
data Circ a = Tag [Rec a]

is allowed, whereas

type Rec a = [Circ a] -- invalid
type Circ a = [Rec a] -- invalid

is not. Similarly, type Rec a = [Rec a] is not allowed.

Type synonyms are a convenient, but strictly syntactic, mechanism to make type signatures more readable. A synonym and its definition are completely interchangeable, except in the instance type of an instance declaration (Section 4.3.2 ).

4.2.3 Datatype Renamings

topdecl	→	newtype [ context => ] simpletype = newconstr [ deriving ]
newconstr	→	con atype
\|	con
simpletype	→	tycon tyvar ₁ … tyvar _k	( k ≥ 0)

A declaration of the form newtype cx => T u ₁ … u _k = N t introduces a new type whose representation is the same as an existing type. The type ( T u ₁ … u _k ) renames the datatype t . It differs from a type synonym in that it creates a distinct type that must be explicitly coerced to or from the original type. Also, unlike type synonyms, newtype may be used to define recursive types. The constructor N in an expression coerces a value from type t to type ( T u ₁ … u _k ) . Using N in a pattern coerces a value from type ( T u ₁ … u _k ) to type t . These coercions may be implemented without execution time overhead; newtype does not change the underlying representation of an object.

New instances (see Section 4.3.2 ) can be defined for a type defined by newtype but may not be defined for a type synonym. A type created by newtype differs from an algebraic datatype in that the representation of an algebraic datatype has an extra level of indirection. This difference may make access to the representation less efficient. The difference is reflected in different rules for pattern matching (see Section 3.17 ). Unlike algebraic datatypes, the newtype constructor N is unlifted , so that N ⊥ is the same as ⊥ .

The following examples clarify the differences between data (algebraic datatypes), type (type synonyms), and newtype (renaming types.) Given the declarations

data D1 = D1 Int
data D2 = D2 !Int
type S = Int
newtype N = N Int
d1 (D1 i) = 42
d2 (D2 i) = 42
s i = 42
n (N i) = 42

the expressions (d1 ⊥ ) , (d2 ⊥ ) and (d2 (D2 ⊥ )) are all equivalent to ⊥ , whereas (n ⊥ ) , (n (N ⊥ )) , (d1 (D1 ⊥ )) and (s ⊥ ) are all equivalent to 42 . In particular, (N ⊥ ) is equivalent to ⊥ while (D1 ⊥ ) is not equivalent to ⊥ .

The optional deriving part of a newtype declaration is treated in the same way as the deriving component of a data declaration; see Section 4.3.3 .

A newtype declaration may use field-naming syntax, though of course there may only be one field. Thus:

newtype Age = Age

brings into scope both a constructor and a de-constructor: