Haskell
I originally chose Erlang as the first functional programming language to attempt learning. However, the combination of different programming paradigm, obscure Prolog-like syntax, and unfamiliar concurrency paradigm made it particularly difficult for me to see the big picture of functional programming.
I’m actually familiar with Haskell now. This very site is written in Haskell, and I have worked on a few other Haskell projects. That said, Haskell is also a programming language theory (PLT) playground. To that end, GHC contains various Haskell language extensions which I’m not entirely familiar with. This page of notes will therefore not necessarily cover the more basic aspects and instead will cover language extensions, popular libraries, idiomatic patterns, modules, parallelism and concurrency, and so on.
User Guide
It’s possible to build an EPUB eBook of the GHC User Guide. You’ll need to have the required dependencies to build the documentation. In my experience I ended up having to boot an Ubuntu VM because the xsltproc packages on arch were too new. It’s necessary to build the documentation at least once to generate the necessary files, which unfortunately ends up building GHC itself it seems. See this IBM article for more information on building eBooks with docbook.
$ ./configure
$ cd docs/users_guide
$ make html stage=0 FAST=YES
$ cd ..
$ /usr/bin/xsltproc \
> --stringparam base.dir docs/users_guide/users_guide/ \
> --stringparam use.id.as.filename 1 \
> --stringparam html.stylesheet fptools.css \
> --nonet \
> --stringparam toc.section.depth 3 \
> --stringparam section.autolabel 1 \
> --stringparam section.label.includes.component.label 1 \
> http://docbook.sourceforge.net/release/xsl/current/epub3/chunk.xsl \
> docs/users_guide/users_guide.xml
$ cd docs/users_guide
$ rm *.html
$ echo "application/epub+zip" > mimetype
$ zip -0Xq ghc.epub mimetype
$ zip -Xr9D ghc.epub *
$ stat ghc.epub
Type Classes
Many types can be reasoned about with a core set of type classes. This section goes over the typeclassopedia.
Functors
A functor represents a computational context. The functor laws ensure that the fmap
operation doesn’t change the structure of the container, only its elements. The functor laws aren’t enforced at the type level. Actually, satisfying the first law automatically satisfies the second.
class Functor f where
fmap :: (a -> b) -> f a -> f b
-- functor laws
fmap id = id
fmap (g . h) = (fmap g) . (fmap h)
A way of thinking of fmap
is that it lifts a function into one that operates over contexts. This is evident from the type signature that explicitly emphasizes the fact that it is partially applied.
fmap :: (a -> b) -> f a -> f b
fmap :: (a -> b) -> (f a -> f b)
g :: a -> b
fmap g :: f a -> f b
Applicative Functors
Applicative functors lie somewhere between functors and monads in expressivity. While functors allow the lifting of a normal function to a function on a computational context, applicative functors allow for the application of a function within a computational context. The type class includes a function pure
for embedding values in a default, effect free context. Applicatives are functors by definition.
class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
The <*>
function is essentially function application within a computational context.
($) :: (a -> b) -> a -> b
(<*>) :: f (a -> b) -> f a -> f b
It’s straightforward enough to create a convenient infix function that can be used like the regular function application $
by embedding a regular function in a computational context before applying it to the parameter:
(<$>) :: (a -> b) -> f a -> f b
(<$>) g x = pure g <*> x
However, this is what fmap
already does based on its type. For this reason, the <$>
convenience function is simply an infix alias for fmap
.
fmap :: (a -> b) -> f a -> f b
(<$>) :: (a -> b) -> f a -> f b
(<$>) = fmap
Note that in a monadic context, the effects are applied sequentially, from left to right. This is makes sense because function application is left-associative, and each application of <*>
, which for monads is ap
which itself is liftM2 id
, has to extract the pure function and value from their respective monad ‘container’ 1, thereby performing the monad’s effects.
There also exists functions *>
and <*
that sequence actions while discarding the value of one of the arguments: left and right respectively. These are immensely useful for monadic parser combinators such as those found in the Parsec library. For example, *>
is useful in Parsec to consume the first parser but return the second, similar to >>
in a monadic context.
(*>) :: f a -> f b -> f b
u *> v = pure (const id) <*> u <*> v
(<*) :: f a -> f b -> f a
u <* v = pure const <*> u <*> v
Finally, there’s a function that replaces all of the locations in the input context with the provided value. This is useful in Parsec when we want to parse some input, and if the input is successfully consumed, return something else, such as parsing a URL-encoded space with ' ' <$ char '+'
.
(<$) :: a -> f b -> f a
(<$) = fmap . const
-- (fmap . const) x y
-- ((fmap . const) x) y
-- (fmap (const x)) y <- (f . g) x = f (g x)
-- fmap (const x) y
-- can also be thought of as
x <$ y = pure x <* y
Parallelism
Amdahl’s law places an upper bound on potential speedup that may be available by adding more processors. The expected speedup $S$
can be described as a function of the number of processors $N$
and the percentage of runtime that can be parallelized $P$
. The implications are that the potential speedup increasingly becomes negligible with the increase in processors, but more importantly that most programs have a theoretical maximum amount of parallelism.
ThreadScope
The ThreadScope tool helps visualize parallel execution, particularly for profiling. To use ThreadScope, the program should be compiled with the -eventlog
GHC option and then run with the +RTS -l
option to generate the eventlog. ThreadScope is then run on this event log:
$ ghc -O2 program.hs -threaded -rtsopts -eventlog
$ ./program +RTS -N2 -l
$ threadscope program.eventlog
Eval Monad
The Eval
monad from Control.Parallel.Strategies expresses parallelism naturally. The rpar
combinator expresses that the argument can be evaluated in parallel, and rseq
forces sequential evaluation. Evaluation is to weak head normal form (WHNF), i.e. the outermost type constructor is evaluated. Notice that the monad is completely pure, with no need for the IO
monad.
data Eval a
instance Monad Eval
runEval :: Eval a -> a
rpar :: a -> Eval a
rseq :: a -> Eval a
For example, the following evaluates two thunks in parallel then waits for both to be evaluated before returning:
runEval $ do
a <- rpar $ f x
b <- rpar $ f y
rseq a
rseq b
return (a, b)
Consider the following parallelized map
implementation:
parMap :: (a -> b) -> [a] -> Eval [b]
parMap f [] = return []
parMap f (a:as) = do
b <- rpar (f a)
bs <- parMap f as
return (b:bs)
runEval $ parMap f xs
Sparks
An argument to rpar
is called a spark. The runtime collects sparks in a pool and distributes them to available processors using a technique known as work stealing. These sparks are to be converted, i.e. evaluated in parallel, though this may fail in a variety of ways. The +RTS -s
option provides information about the sparks and what became of them during the execution of the program in the following form:
SPARKS: 100 (100 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
Term | Definition |
---|---|
converted | evaluated in parallel |
overflowed | didn’t fit in the spark pool |
dud | already evaluated, ignored |
GC’d | unused, garbage collected |
fizzled | evaluated some place else |
Deepseq
The Control.Deepseq module contains various utilities for forcing the evaluation of a thunk. It defines the NFData
, i.e. normal-form data, type class. This type class has only one method, rnf
, i.e. reduce to normal-form, which defines how the particular type may be evaluated to normal form, returning ()
afterward:
class NFData a where
rnf :: a -> ()
rnf a = a `seq` ()
The module also defines a deepseq
function which performs a deep evaluation to normal form, similar to seq
which performs shallow evaluation to WHNF. There’s also a more convenient force
function which evaluates the argument to normal form and then returns the result.
deepseq :: NFData a => a -> b -> b
deepseq a b = rnf a `seq` b
force :: NFData a => a -> a
force x = x `deepseq` x
Evaluation Strategies
Evaluation strategies decouple algorithms from parallelism, allowing for parallelizing in different ways by substituting a different strategy. A Strategy
is a function in Eval
that returns its argument.
type Strategy a = a -> Eval a
A strategy would take a data structure as input which is then traversed and evaluated with parallelism and returns the original value. For example a strategy for evaluating a pair tuple could look like:
parPair :: Strategy (a, b)
parPair (a, b) = do
a' <- rpar a
b' <- rpar b
return (a', b')
This strategy could then be used either directly or using the using
combinator, which reads as use this expression by evaluating it with this strategy:
using :: a -> Strategy a -> a
x `using` s = runEval (s x)
runEval . parPair $ (fib 35, fib 36)
(fib 35, fib 36) `using` parPair
Parameterized Strategies
The parPair
always evaluates the pair components in parallel and always to WHNF. A parameterized strategy could take as arguments strategies to apply to the type’s components. The following evalPair
function is so called because it no longer assume parallelism, instead delegating that decision to the passed in strategy.
evalPair :: Strategy a -> Strategy b -> Strategy (a, b)
evalPair sa sb (a, b) = do
a' <- sa a
b' <- sb b
return (a', b')
It’s then possible to define a parPair
function in terms of evalPair
that evaluates the pair’s components in parallel. However, the rpar
strategy only evaluates to WHNF, restricting the evaluation strategy. We could instead use rdeepseq
—a strategy to evaluate to normal form—by wrapping rdeepseq
with rpar
. The rparWith
combinator allows the wrapping of strategies in this manner:
rdeepseq :: NFData a => Strategy a
rdeepseq x = rseq (force x)
rparWith :: Strategy a -> Strategy a
parPair :: Strategy a -> Strategy b -> Strategy (a, b)
parPair sa sb = evalPair (rparWith sa) (rparWith sb)
This then allows us to use parPair
to create a strategy that evaluates a pair tuple to normal form, which can then be used with the using
combinator:
(fib 35, fib 36) `using` parPair rdeepseq rdeepseq
Sometimes it may be required to not evaluate certain type components, which can be accomplished using the r0
strategy. For example, the following evaluates only the first components of a pair of pairs, i.e. a
and c
:
r0 :: Strategy a
r0 x = return x
evalPair (evalPair rpar r0) (evalPair rpar r0) :: Strategy ((a, b), (c, d))
The parMap
function can be defined in terms of evaluation strategies. The parList
function is a strategy that evaluates list elements in parallel. Defining parList
can take the same approach as before: define a parameterized strategy on lists called evalList
and then define a parameterized function parList
that performs evalList
in parallel. Both of these functions are already defined in Control.Parallel.Strategies:
evalList :: Strategy a -> Strategy [a]
evalList strat [] = return []
evalList strat (x:xs) = do
x' <- strat x
xs' <- evalList strat xs
return (x':xs')
parList :: Strategy a -> Strategy [a]
parList strat = evalList $ rparWith strat
parMap :: (a -> b) -> [a] -> [b]
parMap f xs = map f xs `using` parList rseq
Documentation
Haddock is the documentation system that is most prevalent in the Haskell community. Documentation can be generated using the haddock
command or more commonly cabal haddock
.
Declarations can be annotated by beginning comments with -- |
, which applies the documentation to the following declaration in the source file. It’s also possible to place annotations after a given declaration, in which case the caret ^
is used instead of the |
to denote an annotation.
data T a b
-- | the C1 constructor is used for foo
= C1 a b
-- | the C2 constructor is used for bar
| C2 a b
data T a b
= C1 a b -- ^ the C1 constructor is used for foo
| C2 a b -- ^ the C2 constructor is used for bar
Annotations can span multiple lines until the first non-comment line is encountered. It’s also possible to use multi-line comments by opening them with {-|
.
-- | this is a comment
-- that spans multiple lines
f :: Int -- ^ the int
-> Float -- ^ the float
-> IO () -- ^ the return value
Chunks of documentation can be given a name with $name
and then included elsewhere.
module Foo (
-- $doc
)
-- $doc
-- this is a large chunk of documentation
-- that spans many lines
Haddock Markup
One or more blank lines separates two paragraphs. Emphasis is denoted by surrounding text with a forward-slash /
, whereas bold text is denoted by surrounding the text with two underscores __
Monospace text is denoted by surrounding it with @
. Other markup is valid inside each of these, for example, @'f'
will hyperlink the identifier f
within the monospace text.
Links can be inserted using <url label>
syntax, although Haddock automatically links free-standing URLs. It’s also possible to link to other parts of the same page with #anchor#
syntax.
Images can be embedded with <<path.png title>>
syntax.
It’s possible to link to Haskell identifiers that are types, classes, constructors, or functions by surrounding them with single quotes. If the target is not in the scope, they may be referenced by fully qualifying them.
-- | This module defines the type 'T'
-- It has nothing to do with 'M.T'
Alternatively, it’s possible to link to a module entirely by surrounding the name with double quotes.
-- | This is a reference to the "Foo" module
Code blocks may be inserted by surrounding the paragraph with @
signs, where its content is interpreted as normal markup. Alternatively, it’s possible to do so by preceding each line with a >
, in which case the text is interpreted literally.
-- | this is some documentation that includes code
--
-- @
-- f x = x + x
-- @
--
-- > g x = x * 42
It’s possible to denote REPL examples with >>>
, followed by the result.
-- | demonstrating the REPL example syntax
--
-- >>> fib 10
-- 55
Unordered lists are possible by simply preceding the paragraph with a *
or -
. Ordered lists are possible by preceding each item with (n)
or n.
.
Haddock Options
Haddock accepts some comma-separated list of options that affect how it generates documentation for that module, much like LANGUAGE
pragmas in Haskell.
{-# OPTIONS_HADDOCK hide, prune #-}
Option | Effect |
---|---|
hide | omits module; doesn’t affect re-exported definitions |
prune | omits definitions with no annotations |
ignore-exports | ignore export list; all top-level declarations are exported |
not-home | module shouldn’t be considered to be home module |
show-extensions | include all language extensions used in the module |
GHC Extensions
Haskell is a PLT playground, and as a result GHC has available a multitude of language extensions. I found a series of articles that cover some of the more popular extensions.
Named-Field Puns
Record puns allows for the easy creation of bindings of the same name as their field.
{-# LANGUAGE NamedFieldPuns #-}
greet IndividualR { person = PersonR { firstName = fn } } = "Hi, " ++ fn
greet IndividualR { person = Person { firstName } } = "Hi, " ++ firstName
Record WildCards
This extension allows the use of two dots ..
to automatically create bindings for all fields at that location, named the same as the fields they bind.
{-# LANGUAGE RecordWildCards #-}
greet IndividualR { person = PersonR { .. } } = "Hi, " ++ firstName
Tuple Sections
Tuple sections are a straightforward extension that allows tuples to be used in sections.
{-# LANGUAGE TupleSections #-}
(1, "hello",, Just (),) == \x y -> (1, "hello", x, Just (), y)
Package Imports
This allows one to explicitly specify the package from which to import a module, which is useful when needed for disambiguation purposes.
{-# LANGUAGE PackageImports #-}
import "package-one-0.1.01" Data.Module.X
import qualified "containers" Data.Map as Map
Overloaded Strings
This allows string literals to be polymorphic over the IsString
type class which is implemented by types like Text
and ByteString
, making it less of a pain to construct values of those types.
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text.IO as T
main = T.putStrLn "Hello as Text!"
Lambda Case
This provides a shorthand for a lambda containing a case match.
{-# LANGUAGE LambdaCase #-}
\x -> case x of == \case
Multi-Way If
This allows guard syntax to be used with if
expressions to avoid excessive nesting and repeating of the if
expression.
{-# LANGUAGE MultiWayIf #-}
if | x == 1 -> "a"
| y < 2 -> "b"
| otherwise -> "d"
Bang Patterns
This allows to evaluate to a value to weak head normal form before matching it.
{-# LANGUAGE BangPatterns #-}
strictFunc !v = ()
View Patterns
A view pattern e -> p
applies the view e
to the argument that would be matched, yielding a result which is matched against p
.
{-# LANGUAGE ViewPatterns #-}
eitherEndIsZero :: [Int] -> Bool
eitherEndIsZero (head -> 0) = True
eitherEndIsZero (last -> 0) = True
eitherEndIsZero _ = False
Pattern Guards
Pattern guards are a generalization of guards which allow pattern guardlets aside from the more common boolean guardlets. Pattern guardlets are of the form p <- e
which is fulfilled (i.e. accepted) when e
matches against p
. Further guardlets can be chained with commas. As usual, guardlets are tested in order from top to bottom before the guard as a whole is accepted.
{-# LANGUAGE PatternGuards #-}
func :: [Int] -> Ordering
func xs | 7 <- sum xs -- if the sum is 7
, n <- length xs -- bind the length to n for further checks
, n >= 5 -- and the length is >= 5
, n <= 20 -- and <= 20
= EQ -- then evaluate to EQ
| otherwise
= LT
Explicit Universal Quantification
This allows to explicitly annotate universal quantification in polymorphic type signatures. On its own this extension isn’t very useful. Instead, it’s required by many other extensions.
{-# LANGUAGE ExplicitForAll #-}
map :: forall a b. (a -> b) -> [a] -> [b]
Scoped Type Variables
This allows type variables to have scope so that nested scopes’ type signatures can close over them and refer to them. Without the extension, the a
in the type signature of sorted
and nubbed
would not only be different from func
’s type signature, but also different from themselves.
{-# LANGUAGE ScopedTypeVariables #-}
func :: forall a. Ord a => [a] -> [(Char, a, a)]
func xs = zip3 ['a' .. 'z'] sorted nubbed
where sorted :: [a]
sorted = sort xs
nubbed :: [a]
nubbed = nub xs
Liberal Type Synonyms
This relaxes restrictions on type synonyms so that type signatures are checked until after all type synonyms have been expanded, so that something like this is possible:
{-# LANGUAGE LiberalTypeSynonyms #-}
type Const a b = a
type Id a = a
type NatApp f g i = f i -> g i
func :: NatApp Id (Const Int) Char
-- Id Char -> Const Int Char
-- Char -> Int
func = ord
Rank-N Types
This allows the nesting of explicit universal quantifications within function types and data definitions. In the following example, the difference between the signatures is that the first signature specifically applies the universal quantification to the passed in function, whereas the second signature applies the universal quantification to the whole signature.
What this means is that the second signature would accept any function from n -> n
that applies for some Num n
, but the first signature requires a function from n -> n
that applies for every Num n
.
For example, (+1)
would be valid for both signatures, since every Num
defines (+)
. However, (/5)
would only be valid for the second signature because it’s a function from n -> n
that only applies for some Num n
, particularly the subset of Num n
that also implements Fractional
. Since not every Num n
implements Fractional
, that function is not valid for the first signature.
Note: This extension deprecates the less general extensions Rank2Types
and PolymorphicComponents
.
{-# LANGUAGE RankNTypes #-}
rankN :: (forall n. Num n => n -> n) -> (Int, Double)
rankN f = (f 1, f 1.0)
-- compare to
rankN :: forall n. Num n => (n -> n) -> (Int, Double)
When this extension is used in conjunction with LiberalTypeSynonyms
, it allows the universal quantification and/or constraints within type synonyms as well as the application or partial application of a type synonym to a type containing universal quantification and/or constraints.
type Const a b = a
type Id a = a
type Indexed f g = forall i. f i -> g i
type ShowIndexed f g = forall i. (Show i) => f i -> g i
type ShowConstrained f a = (Show a) => f a
type FunctionTo a b = b -> a
func1 :: Indexed Id (Const Int)
-- forall i. Id i -> Const Int i
-- forall i. i -> Int
func1 _ = 2
func2 :: ShowIndexed Id (Const Int)
-- forall i. (Show i) => Id i -> Const Int i
-- forall i. (Show i) => i -> Int
func2 = length . show
func3 :: forall a. ShowConstrained (FunctionTo Char) a
-- forall a. (Show a) => FunctionTo Char a
-- forall a. (Show a) => a -> Char
func3 = head . show
type ShowConstrained2 f = forall a. (Show a) => f a
type EnumFunctionTo b a = (Enum a) => a -> b
func :: ShowConstrained2 (EnumFunctionTo Char)
-- forall a. (Show a) => EnumFunctionTo Char a
-- forall a. (Show a, Enum a) => a -> Char
func = head . show . succ
Empty Data Declarations
This allows the definition of types with no constructors, which is useful for use as a phantom parameter to some other type.
data Empty
data EmptyWithPhantom x
type EmptyWithEmpty = EmptyWithPhantom Empty
Phantom Types
Phantom types are parameterized types where some of the parameters only appear on the RHS. They are often used to encode information at the type level to make code much more strict, usually paired with empty data types.
Consider an API for sending encrypted messages. It’s possible to encode—at the type-system level—that only encrypted messages may be sent using the send
function, thus preventing plain-text messages from being sent. This is further enforced by making the Message
constructor private and exposing a single constructor to build a plain-text message, such as newMessage
.
data Message a = Message String
data Encrypted
data PlainText
send :: Message Encrypted -> Recipient -> IO ()
encrypt :: Message PlainText -> Message Encrypted
decrypt :: Message Encrypted -> Message PlainText
newMessage :: String -> Message PlainText
newMessage s = Message s
-- this would not type-check
send (newMessage "test") "recipient@server"
Resources
- Categories from Scratch
- Category Theory on Youtube
- What I Wish I Knew
- Learning Haskell
- CS 194
- Performance profiling with ghc-events-analyze
- Haskell Wikibooks - Category Theory
- Getting it Done with Haskell