Scala
I’ve been meaning to learn Scala for some time. Haskell has really left me wanting to program in a very functional style, and it seems like Scala is used more in the industry than Haskell. Scala seems to provide a decent compromise in a language that mixes object-oriented programming with functional programming characteristics. Furthermore, the JVM is battle-tested and probably the most robust virtual machine of any language out there at the moment.
sbt
sbt is the Scala Build Tool. A project’s configuration is defined in the file build.sbt. A project/ directory can contain helper objects and one-off plugins, such as Dependencies.scala.
The sbt version can be specified in a project/build.properties file:
sbt.version=1.2.8
A given build definition may look like:
lazy val root = (project in file("."))
.settings(
name := "Hello",
scalaVersion := "2.12.7"
)
Settings
Settings consist of setting expressions which are of the form someKey := { expression }
. Keys can be:
Type | Purpose |
---|---|
SettingKey[T] |
computed once on subproject load |
TaskKey[T] |
(aka task) has to be recomputed each time |
InputKey[T] |
key for task with CLI args as input |
In the shell, typing the name of a setting key prints its value, while typing the name of a task key runs that task but doesn’t print its return value, although this can be achieved with show taskName
. The inspect taskName
command can be used to display information about a task key.
TaskKey[T]
defines a task, which is an operation such as compile
and package
which may return Unit
, or for example package
is of type TaskKey[File]
because it returns the path to the JAR file it creates.
Built-in keys are fields from an object sbt.Keys
whose fields are implicitly imported with import sbt.Keys._
so that e.g. sbt.Keys.name
can be referred to as name
. There is also an implicit import sbt._
.
Custom keys can be defined with corresponding methods settingKey
, taskKey
, inputKey
, and they each require an explicit type of the value associated with the key and a description:
lazy val hello = taskKey[Unit]("An example task")
After defining the task or setting, it can be given a value through a setting either on a project or at the top-level, aka “bare style.”
lazy val hello = taskKey[Unit]("An example task")
lazy val root = (project in file("."))
.settings(
hello := { println("Hello!") }
)
// "Bare style"
ThisBuild / version := "1.0"
Note that lazy val
s are typically used to avoid initialization order problems.
val
s and def
s in an .sbt file are evaluated before settings regardless of where they are in a file.
Placing a tilde ~
before a command causes it to re-execute on file changes.
The reload
command causes build.sbt
to be re-read and its settings applied.
The ~testQuick
command can be used to run incremental tests continuously. Press Enter to stop.
Subprojects can be listed with projects
, and a specific project can be invoked for example with someproject/compile
.
If a subproject is added to a parent project’s aggregate
, then any command sent to the parent project will be broadcast to the subproject.
One subproject can depend on another via .dependsOn
.
Batch mode allows passing sbt commands from the command line, such as:
$ sbt clean "testOnly HelloSpec"
Some sbt operators include:
+=
appends to a key’s old value%
constructs an Ivy module ID
Projects
A project is defined with a lazy val
of type Project
. The name of the project’s base directory can be omitted if it’s the same as the val
name.
lazy val util = (project in file("util"))
Common settings can be set through the ThisBuild
top-level “bare style” configuration, or by defining a Seq
of the settings and passing it to the Project.settings
method.
One project can depend on another through the .dependsOn
method. This implies that the compile
task in one depends on the compile
task in another. This can be written explicitly with a dependency specifier "taskA->taskB"
meaning taskA in the current project depends on taskB in the dependency. Omitting the right-hand task dependency implies "compile"
. Multiple dependencies can be separated by semicolons.
lazy val core = project.dependsOn(util)
// Equivalent:
lazy val core = project.dependsOn(util % "compile->compile")
// core "test" depends on util "compile"
lazy val core = project.dependsOn(util % "test")
// core "test" depends on util "test"
// Enables placing test utilities in util
lazy val core = project.dependsOn(util % "test->test")
// Multiple dependencies
lazy val core = project.dependsOn(util % "test->test;compile->compile")
Dependencies
Dependencies take the form:
libraryDependencies += groupID % artifactID % revision
// Or:
libraryDependencies += groupID % artifactID % revision % configuration
libraryDependencies += groupID % artifactID % revision % "test"
libraryDependencies += groupID % artifactID % revision % Test
// Adds Scala version via %%:
libraryDependencies groupID %% artifactID % revision
// Equivalent when running Scala 2.11.1:
libraryDependencies += "org.scala-tools" % "scala-stm_2.11" % "0.3"
libraryDependencies += "org.scala-tools" %% "scala-stm" % "0.3"
Basics
The syntax for a function definition is:
def max(a: Int, y: Int): Int = if (x > y) x else y
If a function only takes one parameter, it can be called without parentheses. Functions can use special characters such as +
and -
, and can transform:
1 + 2
(1).+(2)
In fact, any function that takes at least two parameters can be called in infix operator notation. If it takes more than two parameters, the right hand side should apply those within parentheses.
A unary operator can be defined by defining a method with unary_
prepending the name.
If the function ends with :
, then the code is transformed in reverse:
a :: b
b.::(a)
A class can be defined as:
class MyClass(index: Int, name: String)
This also defines a constructor that takes integer and string parameters.
Values can be defined in two ways: val
are immutable values and var
are mutable variables.
Arrays are indexed using parentheses, not brackets as in other languages. They’re defined as:
val strings = new Array[String](3)
strings(0) = "Hello"
strings(1) = "World"
for (i <- 0 to 2)
print(strings(i))
Lambdas are of the form:
args.foreach((arg: String) => println(arg))
Though Scala allows partial function application, so the above can be shortened to:
args.foreach(println)
When parentheses are applied to a variable with one or more parameters, Scala transforms the code into an invocation of the apply
method on that variable:
strings(i)
strings.apply(i)
On the other hand, when a variable with applied parentheses and arguments is assigned to, the code is transformed to a call to the update
method:
strings(i) = "assignment"
strings.update(i, "assignment")
In Scala, Arrays are mutable while Lists are immutable. With Lists, ::
is cons and :::
is concatenation.
Tuples are declared in the usual form, as in Haskell. However, elements are accessed using _n
where n is the 1-based number of the element.
The ->
method returns a two-element tuple of the left and right parameters, and is used for example to insert items into a Map
:
val treasureMap = Map[Int, String]()
treasureMap += (1 -> "Go to island")
The Unit
type is equivalent to Haskell’s ()
or void
in C++.
Scala has symbols similar to Lisp’s, defined as:
val someKey = 'symbol
The yield
keyword can be used in conjunction with for
loops to generate new collections:
val shouts = for (arg <- args) yield arg + "!"
Raw strings are possible by using three successive double quote delimiters:
val rawstr = """\d+"""
Classes and Objects
Objects have members consisting of fields (data) and methods (code). Fields can be declared private.
If you assign an object to a val
, you can mutate the object but not reassign the val
to another object. Unit methods can leave off the result type and the equals sign:
def add(b: Byte): Unit = sum += b
def add(b: Byte) { sum += b }
The semicolon insertion rules state that a semicolon is inserted at the end of a line if:
- the line ends in a word that would not be legal as the end of a statement (e.g. period or infix operator)
- next line begins with a word that cannot start a statement
- line ends while inside parentheses or brackets, because these cannot contain multiple statements anyway
Singleton objects are defined using the object
keyword and their members are as if static
members of C++ classes. These cannot be instantiated.
object Singleton {
private val num = 0
}
Since an object
is a “stable path”, its members can be imported with the import
statement.
When a singleton object is named after an existing class, it is referred to as the class’ companion object 1. They must both be defined in the same source file. They can access each other’s private members.
A class is defined using the class keyword, and it can take parameters:
class Rational(num: Int, den: Int)
Any code inside the class body will be put inside the primary constructor.
Fields, accessible publicly, are created by defining class-level values.
class Rational(num: Int, den: Int) {
val numer: Int = num
val denom: Int = den
}
Fields and methods can be defined private using the private
keyword.
Auxiliary constructors are additional constructors that simply directly or indirectly delegate object construction to the primary constructor. They are named after this
, and their first action must be invoking another constructor:
class Rational(num: Int, den: Int) {
def this(n: Int) = this(n, 1)
}
Literal identifiers are ones that are enclosed in backticks, so that any string that would be accepted by the runtime will result in an identifier.
Implicit conversions can be defined so that values of a certain type are implicitly converted to another type in order for an operation to go through. For example, if there’s an addition function for Rational
that takes a Rational
and an Integer, we can do r + 1
but not 1 + r
since that would attempt to invoke the function on the Integer itself. To solve this, we can define an implicit conversion to Rational
on Integer:
implicit def intToRational(x: Int) = new Rational(x)
Functions
Partial Application
It’s not possible to assign methods or nested functions to values, or pass them as arguments to other functions. If this is intended, one can use the explicit partial application syntax:
val partial = some.method _
This syntax creates an ephemeral class 2 that defines an apply
method that takes the appropriate amount of arguments. When the partially applied value is called, the arguments are forwarded to the underlying function and its result is returned.
This can also be done for specific arguments, but the type must be stated explicitly, presumably to disambiguate overloads:
val partial = func(1, _ : Int, 3)
If passing a partially applied function, where all parameters are unapplied, as an argument to another function, partial application can be written implicitly:
someNumbers.foreach(println _)
someNumbers.foreach(println)
Closures capture values by reference.
Variable arguments can be specified with an asterisk *
, and are treated as an Array
inside the function:
def echo(args: String*) = for (arg <- args) println(arg)
echo("one", "two", "three")
Conversely, an Array
can be expanded into multiple arguments using the _*
symbol:
val arr = Array("one", "two", "three")
echo(arr: _*)
It’s possible to pass named parameters to functions:
def speed(distance: Float, time: Float): Float = distance / time
speed(time = 10, distance = 100)
It’s possible to define default parameter values:
def printTime(out: java.io.PrintStream = System.out) =
out.println("time = " + System.currentTimeMillis())
When used in combination with named parameters, it’s possible to leave out all parameters except one.
Scala supports tail recursion.
Currying can be made explicit in a function definition by adding multiple parameter lists. This makes it possible to curry functions without having to specify the type of the holes as in partial application above.
def curriedSum(x: Int)(y: Int) = x + y
val three = curriedSum(1)(2)
def equivalentSum(x: Int) = (y: Int) => x = y
val three_ = equivalentSum(1)(2)
This also makes it possible for the compiler to infer the type of a function’s parameters as in the case of:
def foldLeft[B](z: B)(op: (B, A) => B): B
// Types of m and n are needed here
numbers.foldLeft(0, (m: Int, n: Int) => m + n)
// Compiler can infer parameters because 0 infers type param B
// and _ refers to B
numbers.foldLeft(0)(_ + _)
Explicitly defining multiple parameter lists allows using braces for the last parameter:
val braces = curriedSum(1) { 2 }
Parameters can be passed “by-name” so that they can be passed to a function and not evaluated until explicitly used within the function. The syntax is simply to prepend the parameter type with =>
3.
def byNameAssert(predicate: => Boolean) =
if (assertionsEnabled && !predicate)
throw new AssertionError
byNameAssert(5 > 3)
Composition and Inheritance
Abstract classes are defined using the abstract keyword:
abstract class Element {
def contents: Array[String]
def height: Int = contents.length
def width: Int = if (height == 0) 0 else contents(0).length
}
Methods themselves are abstract if they have no defined implementation.
Parameter-less methods are those that take no parameters and their parentheses are omitted.
The convention is to omit the parentheses when there are no parameters and the method accesses mutable state only by reading its class’ fields.
Empty parentheses in function calls can be omitted. The convention is to only omit them if they have no side-effects. I/O calls for example should always explicitly contain the parentheses.
Classes can be extended by other classes. This allows the sub-class to inherit all non-private members of the parent class, and makes the sub-class a sub-type of the parent type.
All classes implicitly extend from scala.AnyRef
which is equivalent to Java’s java.lang.Object
:
class ArrayElement(conts: Array[String]) extends Element {
def contents: Array[String] = conts
}
Class fields and methods belong to the same namespace, which makes it possible for a subclass to override one with the other:
class ArrayElement(conts: Array[String]) extends Element {
val contents: Array[String] = conts
}
Parametric fields are fields that are initialized through the primary constructor:
class ArrayElement(
val contents: Array[String]
) extends Element
It’s possible to invoke superclass constructors by simply supplying the parameter in the extends section:
class LineElement(s: String) extends ArrayElement(Array(s)) {
override def width = s.length
override def height = 1
}
Overriding must be explicitly denoted with the override
keyword.
The final
keyword can be used to prevent subclass overriding on a particular method:
class A extends B {
final override def method() { println "and that's final" }
}
The final
keyword can also be used to prevent subclassing of a particular class entirely:
final class A extends B {
override def method() { println "final class" }
}
Hierarchy
Every class is a direct or indirect subclass of Any
, which defines a variety of “universal methods” available to all subclasses.
The ==
method is final and simply calls equals
, which is the method that subclasses should override.
Any
has two direct subclasses. AnyVal
is the parent class of every built-in value class in Scala: Byte
, Short
, Char
, Int
, Long
, Float
, Double
, Boolean
, and Unit
. These can’t be instantiated with new
because they are defined as final and abstract. AnyRef
is the base class of all reference classes, it’s just an alias for java.lang.Object
.
There are two “bottom” types, which are subclasses of every kind of type. Null
is the type of the null
reference and it inherits from any class that inherits from AnyRef
. Nothing
is a type that simply signifies abnormal termination. For example, the error
function is of type Nothing
. Since it’s a subclass of any class, the error
function can be called from within any other function regardless of its type.
Traits
Traits encapsulate methods and fields which can be mixed into other classes 4. Traits are defined as:
trait SomeTrait {
def someMethod() {
println("printing")
}
}
A trait can subclass a certain class, which essentially specifies a constraint on the trait such that it can only be mixed into classes that are or subclass the extended class.
Traits can be mixed in using either the extends
or with
keywords, multiple mixins are simply chained using with
. They also define types, so a value of a certain trait type can be set to any object whose class mixes-in the particular trait.
Traits can override methods implemented in the classes they’re mixed into. This is accomplished with the abstract override
directive. This combination of specifiers conveys the fact that it overrides a method that it’s mixed into (hence override
), and therefore that method must be defined in the class it’s mixed into (hence abstract
). Methods with these qualifiers can use super
to access the class they’re mixed into, specifically the same method they’re overriding:
class BasicIntQueue extends IntQueue {
private val buf = new ArrayBuffer[Int]
def get() = buf.remove(0)
def put(x: Int) { buf += x }
}
trait Doubling extends IntQueue {
abstract override def put(x: Int) { super.put(2 * x) }
}
Multiple mixins can be “stacked,” in which case the same function is called in reverse order of the mixin list. That is, the right-most trait is treated as if a “superclass” of the one to its left, and super
has that effect.
Packages
The package
statement is used as in Java to specify that what follows is to be part of the specified package. However, it can also be used with braces to only insert the contained code within the package.
package users {
package administrators {
class NormalUser
}
package normalusers {
class NormalUser
}
class OtherUser
}
The import
statement can be used to import packages and symbols in different ways:
def showFruit(fruit: Fruit) {
import fruit._
println(name + "s are " + color)
}
// only Apple and Orange
import Fruits.{Apple, Orange}
// everything, but rename Apple to McIntosh
import Fruits.{Apple => McIntosh, _}
// everything except Pear
import Fruits.{Pear => _, _}
Every Scala file implicitly imports the following packages:
import java.lang._
import scala._
import Predef._
Testing
Assertions can be made with assert
and enabled in the JVM using the command options -ea
(enable assertions) and -da
.
An alternative function is ensuring
, which also takes a predicate. The predicate is a function that takes a so-called “result type” and returns a Boolean. If the predicate returns true, then ensuring
returns the “result type,” but if the predicate returns false, then ensuring
throws an AssertionError
.
Unit testing is possible with tools such as ScalaTest and ScalaCheck. ScalaTest tests are functions with names prefixed by test
inside classes that extend Suite
:
import org.scalatest.Suite
import Element.elem
class ElementSuite extends Suite {
def testUniformElement() {
val ele = elem('x', 2, 3)
assert(ele.width == 2)
}
}
ScalaTest has another style of testing with the FunSuite package:
import org.scalatest.FunSuite
import Element.elem
class ElementSuite extends FunSuite {
test("elem result should have passed width") {
val ele = elem('x', 2, 3)
assert(ele.width == 2)
}
}
The assertions used in the previous two ScalaTest suites would only show an error that the assertion failed, without showing what the two values were. ScalaTest also provides a ===
operator that is more descriptive:
assert(ele.width === 2) // "3 did not equal 2"
There is also an expect
method that differentiates between the two values:
expect(2) {
ele.width
} // "Expected 2, but got 3"
Method intercept
can verify that an exception is thrown:
intercept[IllegalArgumentException] {
elem('x', -2, 3)
}
Behavior-Driven Development
The FlatSpec
is one of many traits, such as FunSpec
, that allow for behavior-driven development (BDD). These traits can be mixed with other traits such as ShouldMatchers
which allow the use of the should
function to write tests very naturally as in RSpec:
import org.scalatest.FlatSpec
import org.scalatest.matchers.ShouldMatchers
import Element.elem
class ElementSpec extends FlatSpec with ShouldMatchers {
"A UniformElement" should "have a width equal to the passed value" in {
val ele = elem('x', 2, 3)
ele.width should be (2)
}
it should "have a height equal to the passed value" in {
val ele = elem('x', 2, 3)
ele.height should be (3)
}
it should "throw an IAE if passed a negative width" in {
evaluating {
elem('x', 2, 3)
} should produce [IllegalArgumentException]
}
}
Property Testing
ScalaCheck is another testing tool that is similar to Haskell’s QuickCheck, which essentially randomly generates test input data to test with. There is an implication operator ==>
that takes a function that places a constraint on the test data and a condition that must hold true for all data that fits that constraint:
import org.scalatest.WordSpec
import org.scalatest.prop.Checkers
import org.scalacheck.Prop._
import Element.elem
class ElementSpec extends WordSpec with Checkers {
"elem result" must {
"have passed width" in {
check((w: Int) => w > 0 ==> (elem('x', w, 3).width == w))
}
}
}
Pattern Matching
Case Classes are like stub classes which contain and are mainly about publicly accessible data. They’re very similar to Haskell’s algebraic data types (ADT). Consider the following domain specific language (DSL):
abstract class Expr
case class Var(name: String) extends Expr
case class Number(num: Double) extends Expr
case class UnOp(operator: String, arg: Expr) extends Expr
case class BinOp(operator: String, left: Expr, right: Expr) extends Expr
These case classes define:
- factory methods to avoid having to explicitly write
new
, through anapply
method - fields for the data
- simple
toString
implementations, - a
copy
method that is very similar to Haskell’s record syntax
For example, the following creates a copy of the op
BinOp
with the operator
field changed to "-"
:
op.copy(operator = "-")
This is equivalent to Haskell’s record syntax:
op { operator = "-" }
The main use of case classes is pattern matching. For example, a mathematical expression expressed using the Expr
DSL constructed above can be simplified as follows:
def simplifyTop(expr: Expr): Expr =
expr match {
case UnOp("-", UnOp("-", e)) => e
case BinOp("+", e, Number(0)) => e
case BinOp("*", e, Number(1)) => e
case _ => expr
}
This is very similar to other pattern matching features available in Haskell for example.
case
clauses can use:
-
constructor patterns
-
constant patterns
-
variable patterns, which begin with lower case characters, unless surrounded with backticks
-
sequence patterns, where the
_*
operator can be used to specify “the rest of the sequence,” like so:expr match { case List(0, _*) => println("found it") case _ => }
-
type patterns to match the type of the expression:
def generalSize(x: Any) = x match { case s: String => s.length case m: Map[_, _] => m.size case _ => -1 }
Type Erasure means that information about type arguments is not maintained at runtime. This is an artifact of the erasure model of generics that Java uses.
As a result, it’s not possible to pattern match on the type of Map
, and the following code will return true
for any Map
:
def isIntToIntMap(x: Any) = x match {
case m: Map[Int, Int] => true
case _ => false
}
It’s possible to use @
for variable binding in pattern matching, just as in Haskell:
expr match {
case UnOp("abs", e @ UnOp("abs", _)) => e
case _ =>
}
Pattern guards are additional conditions placed on pattern matches. For example, to simplify an addition expression with identical operands to a multiplication by two, we can use:
def simplifyAdd(e: Expr) = e match {
case BinOp("+", x, y) if x == y =>
BinOp("*", x, Number(2))
case _ => e
}
A sealed class is one that restricts subclassing of that class to the file it’s defined in. This way the compiler can guarantee and enforce exhaustive pattern matching, which it couldn’t do otherwise because the class could be extended in another file:
sealed abstract class Expr
If there’s a function that is known to only handle a specific subset of subtypes, a catchall case can be added, or the @unchecked
annotation can be used:
def describe(e: Expr): String = (e: @unchecked) match {
case Number(_) => "a number"
case Var(_) => "a variable"
}
The Option
type is equivalent to Haskell’s Maybe
type. It can take on either a parameterized Some
value or None
.
A case sequence is a function literal specifically defined as a pattern match where each case is an entry point to the function:
val withDefault: Option[Int] => Int = {
case Some(x) => x
case None => 0
}
This is why it’s possible to pass a pattern match directly to the react
function in the actors library:
react {
case (name: String, actor: Actor) => {
actor ! getip(name)
act()
}
case msg => {
println("Unhandled message: " + msg)
act()
}
}
If a case sequence doesn’t provide exhaustive patterns, it is considered a partial function since it doesn’t provide an output for every input. If it’s applied to a value that it can’t match, it throws a run-time exception.
The parameterized type PartialFunction
can represent partial functions:
val second: PartialFunction[List[Int],Int] = {
case x :: y :: _ => y
}
The function isDefinedAt
can then be used to determine if the function is defined for a particular value:
scala> second.isDefinedAt(List(5, 6, 7))
res0: Boolean = true
scala> second.isDefinedAt(List())
res0: Boolean = false
The partial function above gets translated to:
new PartialFunction[List[Int],Int] {
def apply(xs: List[Int]) = xs match {
case x :: y :: _ => y
}
def isDefinedAt(xs: List[Int]) = xs match {
case x :: y :: _ => true
case _ => false
}
}
The actors library’s react
function for example uses partial a partial argument function, since it’s defined only for the messages the caller wants to handle.
Tuples
Tuples are the same as in Haskell, C++11, Python etc. They are indexed using ._n
where n
it the nth tuple element. A difference from something like Python is that the following:
val word, index = longest
Where longest
is a tuple, ends up assigning the tuple to both word
and index
, contrary to what happens in Python. To assign the correct elements as in C++11’s std::tie
, use parentheses to pattern match as in Haskell.
Stateful Objects
Scala has support for something similar to C# properties. When a non-private var
is defined in a class, a pair of getter and setters is automatically generated for that variable. For example, given the following declaration:
class Time {
var hour = 12
var minute = 0
}
Changes the variable to be private[this]
so that it’s only accessible from the object itself, and the getters and setters take on the visibility of the original variable, this in effect restricts all external access to its generated getters and setters. The setter takes the form x_=(arg)
5.
class Time {
private[this] var h = 12
private[this] var m = 0
def hour: Int = h
def hour_=(x: Int) { h = x }
def minute: Int = m
def minute_=(x: Int) { m = x }
}
These getters and setters can be defined explicitly in order to encode things such as input validation:
class Time {
private[this] var h = 12
private[this] var m = 0
def hour: Int = h
def hour_=(x: Int) {
require(0 <= x && x < 24)
h = x
}
def minute: Int = m
def minute_=(x: Int) {
require(0 <= x && x < 60)
m = x
}
}
It’s also possible to define getters and setters that aren’t backed by a variable, which is particularly useful for converter functions, for example. Note that in the following example, _
is used to give the celsius
variable a default value (0 for numeric types) 6.
class Thermometer {
var celsius: Float = _
def fahrenheit = celsius * 9 / 5 + 32
def fahrenheit_=(f: Float) {
celsius = (f - 32) * 5 / 9
}
override def toString = fahrenheight + "F/" + celsius + "C"
}
Type Parameterization
It’s possible to hide the primary constructor by making it private. This makes its type usable but not its constructor, which can only be used from the class itself, such as through an auxiliary constructor, or its companion object.
class Queue[T] private (
private val leading: List[T],
private val trailing: List[T]
)
For example, a factory method can be created in the companion object:
object Queue {
def apply[T](xs: T*) = new Queue[T](xs.toList, Nil)
}
Another way is to define a generic trait, i.e. one that is parameterized, and hide the implementation inside the companion object, along with a factory method in the companion object:
trait Queue[T] {
def head: T
def tail: Queue[T]
def enqueue(x: T): Queue[T]
}
object Queue {
def apply[T](xs: T*): Queue[T] = new QueueImpl[T](xs.toList, Nil)
private class QueueImpl[T](
private val leading: List[T],
private val trailing: List[T]
) extends Queue[T] {
def mirror = ...
def tail = ...
def enqueue = ...
}
}
Variance
Variance refers to inheritance relationships of parameterized types, such as whether Set[String]
is a subtype of Set[AnyRef]
.
For example, if S
is a subtype of T
and Queue[S]
is considered a subtype of Queue[T]
, then Queue
is covariant in its type parameter T
. This would mean for example that we could pass Queue[String]
to a method that accepted types Queue[AnyRef]
.
However, generic types are nonvariant by default, meaning that there would be no such subtype relationship. It’s possible to annotate the type parameter as being covariant by prepending the type parameter with +
:
trait Queue[+T] { ... }
Contravariance would mean that if T
is a subtype of S
, then Queue[S]
is a subtype of Queue[T]
. It’s also possible to annotate the type parameter to be contravariant by using the -
annotation:
trait Queue[-T] { ... }
In order to make Queue
covariant, it’s necessary to specify a lower bound on the enqueue
method and make it polymorphic as well. The lower bound enforces the requirement that U
be a supertype of T
.
class Queue[+T] (
private val leading: List[T],
private val trailing: List[T]
) {
def enqueue[U :> T](x: U) = new Queue[U](leading, x :: trailing)
}
This means that given supertype Fruit
and subtypes Apple
and Orange
, an Orange
can be appended to a Queue[Apple]
, yielding a Queue[Fruit]
result.
There are also upper bounds which enforce the requirement that a type be a subtype of another, in the example below, it means that T
must be a subtype of Ordered[T]
:
def orderedMergeSort[T <: Ordered[T]](xs: List[T]): List[T] = ...
Abstract Members
A member of a class or trait is abstract if it doesn’t have a complete definition within the class. The implementations are meant to be defined in subclasses. Unlike other object-oriented languages, it’s possible to declare abstract fields of val
and var
, methods, and even abstract types:
trait Abstract {
type T
def transform(x: T): T
val initial: T
var current: T
}
This can then be implemented in a subclass:
class Concrete extends Abstract {
type T = String
def transform(x: String) = x + x
val initial = "hi"
var current = initial
}
It’s possible to override abstract methods with a concrete val
, since the val
will yield the same value every time, whereas the reverse can’t be guaranteed.
Abstract var
s provide implicit getters and setters for certain values.
Class parameter arguments are evaluated before being passed to the class constructor, but concrete val
definitions are evaluated after the superclass is initialized. The following yields an error for failing to satisfy the requirement, since at the time the requirement is checked, the value of denomArg
is still 0, since it’s defined in the subclass.
trait RationalTrait {
val numerArg: Int
val denomArg: Int
require(denomArg != 0)
}
val x = 2
// defines an anonymous class that mixes in RationalTrait
// with its definition enclosed
new RationalTrait {
val numerArg = 1 * x
val denomArg = 2 * x
}
// yields java.lang.IllegalArgumentException: requirement failed
Pre-initialized fields are one way to solve this problem by defining certain subclass fields before the superclass is called:
// anonymous class
new {
val numerArg = 1 * x
val denomArg = 2 * x
} with RationalTrait
// object definition
object twoThirds extends {
val numerArg = 2
val denomArg = 3
} with RationalTrait
The other way to solve this problem is using lazy val
s, which defers the evaluation of the val
’s expression until the first time it the val
is used.
trait LazyRationalTrait {
val numerArg: Int
val denomArg: Int
lazy val numer = numerArg / g
lazy val denom = denomArg / g
private lazy val g = {
require(denomArg != 0)
gcd(numerArg, denomArg)
}
}
Abstract types are useful when there are abstract methods which take on parameters of types specific to the subclass. For example, it wouldn’t make sense to have the following because it would mean that we could pass any type of Food
to any Animal
subclass.
class Food
abstract class Animal {
def eat(food: Food)
}
Instead of defining the eat
method as taking a Food
parameter, we could define it to take some abstract type with an upper bound enforcing that it is a subclass of Food
. This way, a subclass of Animal
could explicitly specify the type of Food
it eats.
class Food
abstract class Animal {
type SuitableFood <: Food
def eat(food: SuitableFood)
}
A path-dependent type is one that depends on its path, i.e. some.object.Type
. Such a type can be instantiated using that syntax, since it implies a reference to Type
’s outer object, particularly object
. The syntax Outer#Inner
can’t be used to instantiate Inner
since it doesn’t refer to any instance of Outer
.
Structural Subtyping
Structural subtyping is when two types get a subtyping relationship because they have the same members 7. This is contrasted to the more traditional nominal subtyping, where each type has a name they have an explicitly declared subtyping relationship. Structural subtyping is achieved in Scala using refinement types.
For example, to create a Pasture
class full of Animal
s that eat Grass
, we can define:
Animal { type SuitableFood = Grass }
class Pasture {
var animals: List[Animal { type SuitableFood = Grass }] = Nil
}
It can also be used to generalize a using
method, such as Python’s with
syntax, so that it works on any object that has a close
method. The first attempt at generalization wouldn’t work because T
could be any type, even one that doesn’t have a close
method. This can be fixed by specifying an upper bound consisting of a refinement type. Note that if no base type is specified, like Animal
preceding the braces above, then Scala uses AnyRef
def using[T, S](obj: T)(operation: T => S) = {
val result = operation(obj)
obj.close() // type error
result
}
def using[T <: { def close(): Unit }, S](obj: T)(operation: T => S) = {
val result = operation(obj)
obj.close()
result
}
Enumerations
Enumerations in Scala aren’t defined at the language-level. Instead there is a class scala.Enumeration
that can be used to define enumerations, which works due to path-dependent types. This means that Color.Value
would be different from Direction.Value
because their parts differ:
object Color extends Enumeration {
val Red, Green, Blue = Value
}
object Direction extends Enumeration {
val North = Value("North")
val East = Value("East")
val South = Value("South")
val West = Value("West")
}
for (d <- Direction.values) print (d + " ")
Direction.East.id == 1
Direction(1) == Direction.East
Implicit Conversions and Parameters
Implicit conversions 8 can be used to make two independent libraries interoperate in a simple manner. For example, using Swing in Scala would look something like this:
val button = new JButton
button.addActionListener(
new ActionListener {
def actionPerformed(event: ActionEvent) {
println("pressed!")
}
}
)
However, code written this way reflects Java’s limitations and is therefore not idiomatic Scala. Using implicit conversions it can be possible to write it something like this:
button.addActionListener(
(_: ActionEvent) => println("pressed!")
)
This is done using an implicit conversion function like this one. What happens is that Scala compiles it normally and encounters a type error in the above code, so it checks if there’s an implicit conversion function of the correct type, ActionEvent => Unit
, and if it works then it continues compilation:
implicit def function2ActionListener(f: ActionEvent => Unit) =
new ActionListener {
def actionPerformed(event: ActionEvent) = f(event)
}
There are a variety of rules concerning implicit definitions.
- Only functions marked as
implicit
are tried. - The implicit conversion must be in the scope as a single identifier, i.e. not
some.convert
. This is why some libraries include aPreamble
object which often contains useful implicit conversions which can be imported withimport Preamble._
- The exception to this rule is that the compiler also looks inside the companion object of the source or target types of the conversion.
- The compiler only attempts one implicit conversion, i.e. it won’t attempt converting
x + y
intoconvert1(convert2(x)) + y
.
Implicit conversions are also used on the receiver of a selection. For example, "abc".exists
is converted to stringWrapper("abc").exists
.
Implicit conversions are often used for simulating new syntax, such as the ->
in a Map
, which is defined as:
package scala
object Predef {
class ArrowAssoc[A](x: A) {
def -> [B](y: B): Tuple2[A, B] = Tuple2(x, y)
}
implicit def any2ArrowAssoc[A](x: A): ArrowAssoc[A] =
new ArrowAssoc(x)
}
Implicit Parameters
Implicit parameters are those that can optionally be provided by the compiler. Note that the implicit
applies to the entire last parameter list. Also, we didn’t use a direct String
since the compiler selects implicit parameters based on their types, so this should lower the chances that another type is used to fulfill the implicit parameter. Finally, implicit parameters must be available as single identifiers, which is why they are usually declared in an object which is imported.
class PreferredPrompt(val preference: String)
class PreferredDrink(val preference: String)
object Greeter {
def greet(name: String)(implicit prompt: PreferredPrompt, drink: PreferredDrink) {
println("Welcome " + name + ", have some " + drink.preference)
println(prompt.preference)
}
}
object Preferences {
implicit val prompt = new PreferredPrompt("$ ")
implicit val drink = new PreferredDrink("tea")
}
import Preferences._
Greeter.greet("Bob")(prompt, drink)
Greeter.greet("Bob") // or implicitly
Implicit parameters are often used to provide information about a type in a preceding, explicit parameter list. For example, it can be used to pass a compare function to a function that yields the largest element in a list. Scala actually provides many orderer
functions in the standard library, which makes this function usable with many standard types without explicitly providing an orderer
. Note that again, types are made as specific as possible to reduce ambiguity to the developer and to restrict the options available to the compiler:
def maxList[T](elements: List[T])(implicit orderer: T => Ordered[T]): T =
elements match {
case List() => throw new IllegalArgumentException("empty")
case List(x) => x
case x :: rest =>
val maxRest = maxList(rest) // (ordered) is implicit
if (x > maxRest) x // ordered(x) is implicit
else maxRest
}
Also note that the implicit parameter can be used as an implicit parameter and conversion in the body, as a result, ordered
doesn’t appear anywhere in the function body. This is a very common thing to do, and since the name of the implicit parameter isn’t used anywhere, it’s possible to use a view bound, denoted by <%
.
For example, the following code essentially enforces the requirement that T
can be treated as an Ordered[T]
, where treated would mean that there is an implicit conversion available. If T
is already an Ordered[T]
, then an identity function is used as the implicit conversion:
def maxList[T <% Ordered[T]](elements: List[T]): T = ...
If multiple conversions apply for an implicit conversion, Scala generally refuses to insert a conversion. However, since Scala 2.8, it now does something similar to C++ where it’ll choose the most specific conversion available, where being more specific entails:
- the argument type is a subtype of another conversion’s argument type
- both conversions are methods and the enclosing class extends the other conversion’s enclosing class
Lists
The left fold is possible with /:
and foldLeft
and the right fold with :\
and foldRight
. The /:
and :\
names represent the way the fold tree leans.
def sum(xs: List[Int]): Int = (0 /: xs) (_ + _)
def flattenRight[T](xss: List[List[T]]) =
(xss :\ List[T]()) (_ ::: _)
An efficient way to append lists in constant time is to use ListBuffers
9:
import scala.collection.mutable.ListBuffer
val buf = new ListBuffer[Int]
buf += 1
3 +=: buf
buf.toList
An ArrayBuffer
is similar to an std::vector
in that it automatically resizes itself to fit its contents.
Lists are implemented as a covariant, abstract class List
for which there are subclasses ::
and Nil
. The covariant property allows a List[Int]
to be assigned to a List[Any]
:
package scala
abstract class List[+T]
val xs: List[Any] = List(1, 2, 3)
The Nil
object inherits from List[Nothing]
so that Nil
can be assigned to any List
. The methods isEmpty
, head
, and tail
are implemented for Nil
, where the first returns true
and the latter two throw an exception, as in Haskell.
case object Nil extends List[Nothing] {
override def isEmpty = true
def head: Nothing = throw ...
def tail: List[Nothing] = throw ...
}
The ::
, “cons” class is defined so that x :: xs
is treated as ::(x, xs)
where ::
is a case class defined as:
final case class ::[T](head: T, tail: List[T]) extends List[T] {
override def isEmpty: Boolean = false
}
List operations are defined so that the result type is “widened” to accommodate the types of all list elements. This is done by placing a lower bound
def ::[U >: T](x: U): List[U] = new scala.::(x, this)
abstract class Fruit
class Apple extends Fruit
class Orange extends Fruit
val apples = new Apple :: Nil
val fruits = new Orange :: apples
It turns out that the cons class is actually defined such that the tail is a mutable var, but only accessible within the scala
package:
final case class ::[U](hd: U, private[scala] var tl: List[U]) extends List[U] {
def head = hd
def tail = tl
override def isEmpty: Boolean = false
}
A ListBuffer
then works by directly modifying the tail of the last cons cell in the list. This means that the tails in the following two variables are shared to avoid copying 10:
val ys = 1 :: xs
val zs = 2 :: xs
For Expressions
All for
expressions that yield
are translated by the compiler into combinations of map
, flatMap
, and withFilter
, and those that don’t yield
into combinations of withFilter
and foreach
. This means that the following are equivalent:
persons withFilter (p => !p.isMale) flatMap (p =>
(p.children map (c => (p.name, c.name))))
for (p <- persons; if !p.isMale; c <- p.children)
yield (p.name, c.name)
for
expressions are of the following form, where seq is a sequence of generators, definitions, and filters delimited by semicolons:
for (seq) yield expr
// for example: seq = generator; definition; filter
for (p <- persons; n = p.name; if (n startsWith "To"))
yield n
Generators are of the form pat <- expr
where the pattern pat
is matched for each element in the list. If the match succeeds, the variables are bound to the pattern components. If the match fails, the element is discarded from iteration.
The following are examples of how for
expressions are translated into combinations of map
, flatMap
, and withFilter
.
// case 1: one generator
for (x <- expr1) yield expr2
expr1.map(x => expr2)
// case 2: generator and filter
for (x <- expr1 if expr2) yield expr3
// transform into case 1
for (x <- expr1.withFilter(x => expr2)) yield expr3
expr1.withFilter(x => expr2).map(x => expr3)
// case 3: two generators
for (x <- expr1; y <- expr2; seq) yield expr3
expr1.flatMap(x => for (y <- expr2; seq) yield expr3)
// case 4: tuple pattern
for ((x1, ..., xn) <- expr1) yield expr2
expr1.map { case (x1, ..., xn) => expr2 }
// case 5: arbitrary pattern
// first filter by successful match, then map.
// guarantees no MatchError exception
for (pat <- expr1) yield expr2
expr1.withFilter {
case pat => true
case _ => false
}.map {
case pat => expr2
}
// case 6: contains definitions
// expr2 is evaluated each time x is generated
for (x <- expr1; y = expr2; seq) yield expr3
for ((x, y) <- for (x <- expr1) yield (x, expr2); seq)
yield expr3
// case 7: side-effect loop
for (x <- expr1) body
expr1.foreach(x => body)
for (x <- expr1; if expr2; y <- expr3) body
expr1.withFilter(x => expr2).foreach(x =>
expr3.foreach(y => body))
Generalized For Expressions
It’s possible to add support for for
expressions to any data type by defining map
, flatMap
, withFilter
, and foreach
. Depending on which of these functions are implemented, the following features of for
expressions become available:
map
: expressions with a single generatorflatMap
andmap
: expressions with multiple generatorsforeach
: loops with single/multiple generatorswithFilter
: filter expressions
Type checking is performed after translation occurs. Given a parameterized type C
denoting a collection, the following type signatures are generally used for the required methods. A standard technique to optimize withFilter
is to not return an entire new object but to return a wrapper object that remembers that elements need to be filtered when they’re processed later:
abstract class C[A] {
// monad methods
def map[B](f: A => B): C[B]
def flatMap[B](f: A => C[B]): C[B]
def withFilter(p: A => Boolean): C[A]
def foreach(b: A => Unit): Unit
}
Collections API
The Traversable
trait is at the top of the collections hierarchy and defines the following abstract operation, which is meant to traverse all elements of a collection and apply the operation f
to each element:
def foreach[U](f: Elem => U)
The Iterable
trait defines an abstract iterator
method, which Iterable
uses to define foreach
of Traversable
it extends from:
def foreach[U](f: Elem => U): Unit = {
val it = iterator
while (it.hasNext) f(it.next())
}
Sequences
The sequence traits Seq
, IndexedSeq
, and LinearSeq
represent iterables that have a length and whose elements have fixed indexed positions starting from 0.
Buffers are a sub-category of sequences that allow element insertions, removals, and appending operations. Common buffers are ListBuffer
and ArrayBuffer
.
Sets
Sets are iterables with no duplicate elements. There are two subtraits SortedSet
and BitSet
. Ordering is preserved in SortedSet
, and is backed by an ordered binary tree, with immutable.TreeSet
being a red-black tree that keeps the tree balanced. BitSet
uses an array of Long
values to efficiently represent a set of packed bits, much like C++’s bitset.
Maps
Maps’ get
method returns an Option[Value]
, like lookup
would return a Maybe a
in Haskell. The getOrElseUpdate
function facilitates the use of Maps as caches.
Streams
Streams have elements that are computed lazily. The #::
function is used to construct streams. Notice that only the head has been computed so far:
val str = 1 #:: 2 #:: 3 #:: Stream.empty
// Stream(1, ?)
def fibonacciFrom(a: Int, b: Int): Stream[Int] =
a #:: fibonacciFrom(b, a + b)
val fibs = fibonacciFrom(1, 1).take(7).toList
// List(1, 1, 2, 3, 5, 8, 13)
Vectors
Vectors are effectively constant time, random access sequences represented as broad, shallow trees where every tree node contains up to 32 elements of the vector or 32 other tree nodes. The updated
method can be used to update particular elements and is also effectively constant time, since only the node that contains the element and every node that points to it must be copied. These are currently the default implementation of immutable indexed sequences, collection.immutable.IndexedSeq
.
Ranges
Ranges can be defined as follows:
1 to 3 // => Range(1, 2, 3)
5 to 14 by 3 // => Range(5, 8, 11, 14)
1 until 3 // => Range(1, 2,)
Arrays
Scala arrays correspond to Java arrays such that Array[T]
in Scala is a T[]
in Java. Scala arrays are compatible with sequences, and provide all operations that sequences provide. This is facilitated through implicit conversions to scala.collection.mutable.WrappedArray
. There’s also an implicit conversion to ArrayOps
which supports various methods available to sequences, without actually turning the array into a sequence.
Java doesn’t allow generic arrays T[]
, but this is made possible in Scala by creating an array of Objects
. Creating generic arrays in Scala through Array[T]
requires a run-time hint, since the information about the type T
gets erased at runtime. This is done with a class manifest of type scala.reflect.ClassManifest
, which is a type descriptor object that describes the top-level class of a type. The compiler can be instructed to generate code to construct and pass a class manifest, this is done via an implicit parameter. This way, the compiler looks for an implicit value of type ClassManifest[T]
so that the correct type of array can be constructed at run-time:
def someMethod[T](xs: Vector[T])(implicit m: ClassManifest[T]): Array[T] = ...
// or with a context bound
def someMethod[T: ClassManifest](xs: Vector[T]): Array[T] = ...
Views
Transformer methods are ones such as map
and filter
and come in strict and non-strict varieties. A strict transformer constructs a new collection on the spot. A non-strict transformer construct a proxy for the result collection such that its elements are constructed on demand.
def lazyMap[T, U](coll: Iterable[T], f: T => U) =
new Iterable[U] {
def iterator = coll.iterator map f
}
Most collections are strict by default in their transformers except for Stream
. It’s possible to turn a collection into a lazy one and vice versa through collection views, which represent a base collection with the difference that the transformers are lazy. The view
method is used to make the transformers lazy, and force
is used to go back to strict transformers.
(v.view.map(_ + 1).map(_ * 2)).force
Views can provide a simple way to optimize otherwise costly operations 11:
def isPalindrome(x: String) = x == x.reverse
def findPalindrome(s: Seq[String]) = s find isPalindrome
// creates 1000000 element sequence
findPalindrome(words take 1000000)
// creates no copy
findPalindrome(words.view take 1000000)
Views can also be used as subwindows into mutable sequences 12.
val subarr = arr.view.slice(3, 6)
// can now perform operations on that slice only
// i.e. negate all elements in the slice
for (i <- 0 until subarr.length) subarr(i) = -subarr(i)
Iterators
Iterators are affected by operations on them, such that for example a map
called on an iterator leaves the iterator’s position at the end of the sequence, so that an extra call to next
will throw a NoSuchElementException
. This is mitigated by duplicating the iterator with duplicate
13.
A BufferedIterator
provides an extra method head
that returns its first element without advancing the iterator.
Scala provides implicit conversions for the major collection types in Java through the JavaConversions
object.
Most of the collection operations are implemented in terms of traversals and builders.
Builders
Builders are in charge of building new collections. The result
method yields the collection that has been constructed thus far, and the builder can be reset to a clean slate to construct another collection with the clear
method. The mapResult
method can be used to return a result of a different type.
package scala.collection.generic
class Builder[-Elem, +To] {
def +=(elem: Elem): this.type
def result(): To
def clear()
def mapResult(f: To => NewTo): Builder[Elem, NewTo] = ...
}
Implementation Traits
The code is kept DRY by using implementation traits which are named with a Like
suffix, such as TraversableLike
. These traits implement concrete methods and are parameterized using the collection’s element type and its representation type, i.e. Seq[I]
or List[I]
. For example, filter
is implemented here such that it creates a new builder for the representation type and appends elements to it if they satisfy the predicate, then the builder’s result is returned.
package scala.collection
class TraversableLike[+Elem, +Repr] {
def newBuilder: Builder[Elem, Repr]
def foreach[U](f: Elem => U)
...
def filder(p: Elem => Boolean): Repr = {
val b = newBuilder
foreach { elem => if (p(elem)) b += elem }
b.result
}
}
A problem presents itself when we want to return a different type of sequence from the one that is being operator one, for example:
Map("a" -> 1, "b" -> 2) map { case (x, y) => (y, x) }
// Map(1 -> "a", 2 -> "b")
Map("a" -> 1, "b" -> 2) map { case (x, y) => y }
// List(1, 2)
BitSet(1, 2, 3) map (_.toFloat)
// Set(1.0, 2.0, 3.0)
This is mitigated with an implicit parameter on the function so that a builder factory may produce the correct type of builder by passing it the implicit parameter.
def map[B, That](p: Elem => B)
(implicit bf: CanBuildFrom[B, That, This]): That = {
val b = bf(this)
for (x <- this) b += f(x)
b.result
}
package scala.collection.generic
trait CanBuildFrom[-From, -Elem, +To] {
def apply(from: From): Builder[Elem, To]
}
For example, in the case of a BitSet
, the companion object for BitSet
would define a builder factory of type CanBuildFrom[BitSet, Int, BitSet]
, with a more general fallback that converts to a regular Set
with CanBuildFrom[Set[_], A, Set[A]]
. Scala will then choose the more specific one when choosing the implicit instance.
Finally, collections are kept the same dynamic type. This is achieved through virtual dispatch by passing the source collection to the builder factory, which forwards the call to a genericBuilder
method available on all generic, non-leaf classes, which itself calls the builder that belongs to the collection on which it is defined.
Creating Collections
Creating a new collection type T
can be done by using traits IndexedSeq[Elem]
and IndexedSeqLike[Elem, T]
. The latter requires the implementation of newBuilder
. It’s also necessary to implement appropriate CanBuildFrom
implicits.
Extractors
Extractors provide a way to define patterns that are decoupled from an object’s representation. This is done using an unapply
method that matches a value and deconstructs it:
object EMail {
def unapply(str: String): Option[(String, String)] = {
val parts = str split "@"
if (parts.length == 2) Some(parts(0), parts(1)) else None
}
}
This is analogous to an apply
method that constructs an object:
object EMail {
def apply(user: String, domain: String) = user + "@" + domain
}
This can be made more explicit by inheriting from the function type:
// equivalent to extends Function2[String, String, String]
object EMail extends ((String, String) => String) { ... }
Pattern matching in Scala checks for an unapply
method to deconstruct the object. If unapply
returns None
, then the pattern doesn’t match and it moves on to the next pattern, throwing a MatchError
if there are no other patterns that match. Remember that a single-tuple parameter can omit the set of parentheses, i.e. func((a, b)) -> func(a, b)
.
selectorString match { case EMail(user, domain) => ... }
It’s generally a good idea to ensure that the following property holds:
Obj.unapply(Obj.apply(a, b)) == Some(a, b)
In the case of a single pattern variable, that single variable is wrapped in an Option
value.
It’s also possible for the pattern to not bind any variables, in which case a Boolean
is returned to indicate whether the pattern matched:
object UpperCase {
def unapply(s: String): Boolean = s.toUpperCase == s
}
It’s also possible to define variable argument extractors, to be able to match on an arbitrary number of variables. This is done using the unapplySeq
function:
object Domain {
def unapplySeq(whole: String): Option[Seq[String]] =
Some(whole.split("\\.").reverse)
}
This allows matching like this:
domain match {
case Domain("org", "acm") => println("acm.org")
case Domain("net", _*) => println("some .net domain")
}
Compared to Case Classes
Modifying the case classes has an effect on client code, whereas extractors provide a layer of indirection between the data representation and the way it’s viewed by clients. However, case classes to have advantages over extractors. They’re easier and simpler to define. Case classes can also end up generating more efficient pattern matches because the compiler can optimize them since they’re defined at the language level. Finally, case classes derived from a sealed
base class can allow the compiler to check for pattern match exhaustiveness.
Regular Expressions
Regular expressions can be constructed using the Regex
constructor or with a r
method on a string. It’s possible to pattern match on a regular expression match using a predefined extractor. The matching is done by binding every matched group. If a group didn’t match, it’ll bind the variable to null
:
val Decimal = """(-)?(\d+)(\.\d*)?""".r
val Decimal(sign, integerpart, decimalpart) = "-1.23"
// sign = "-", integerpart = "1", decimalpart = ".23"
Annotations
Annotations are like meaningful comments for the compiler. They can support generation of documentation (Scaladoc), for example. Annotations can be placed on any kind of declaration or definition, as well as expressions:
@deprecated def oldMethod() = ...
(expr: @unchecked) match {
// non-exhaustive cases
}
The general form of annotations is as follows. The @
prefixed to an annotation can read as new
since underneath the compiler is simply instantiating a class named after the annotation. Passing an annotation as argument to another has the consequence that it must use new
instead of @
.
@annotation(exp1, exp2, ...)
The @deprecated
annotation can mark something as deprecated, which elicits a compiler warning if some code uses it. It accepts an optional message:
@deprecated("use newOne() instead")
def oldOne() = ...
The @volatile
annotation informs the compiler that the variable in question will be used by multiple threads. This makes reads/writes slower, but accesses from multiple threads behave more predictably.
The @serializable
annotation marks a class as serializable. The @SerialVersionUID(id)
annotation marks the current version of a class. The @transient
annotation marks fields that should not be serialized.
The @scala.reflect.BeanProperty
annotation generates automatic get and set methods, i.e. age
generates getAge
and setAge
. This is useful for Java-centric frameworks. The generated methods are available only after compilation, i.e. not within one’s own code.
The @tailrec
annotation designates that a method should be tail-recursion optimized. If the optimization cannot be performed, a warning is emitted.
The @unchecked
annotation tells the compiler not to check for exhaustive match cases.
The @native
annotation means that the method’s implementation is supplied by the runtime, via the Java Native Interface (JNI), rather than in Scala.
@native
def nativeMethod() { /* empty body required */ }
XML
Scala allows XML literals anywhere that an expression is valid. The type of an XML literal is Elem
. Class Node
is the abstract superclass of all XML node classes. Class Text
is a node holding just text. Class NodeSeq
corresponds to a sequence of nodes, in fact, Node
extends from NodeSeq
, so a Node
can be thought of as a one-element NodeSeq
.
It’s possible to interpolate Scala code within XML literals using braces {}
. The interpolated code can itself contain XML literals, effectively allowing arbitrary nesting of XML code. Two braces in a row are used to print literal braces.
var res = <a> {3 + 4} </a>
// scala.xml.Elem = <a> 7 </a>
var yearMade = 1955
var res2 = <a> { if (yearMade < 2000) <old>{yearMade}</old>
else xml.NodeSeq.Empty} </a>
// <a> <old>1955</old> </a>
Scala’s XML support can be leveraged to provide object serialization:
abstract class SomeClass {
val description: String
val year: Int
def toXML =
<someclass>
<description>{description}</description>
<year>{year}</year>
</someclass>
}
There are various ways to deconstruct XML in Scala. The text
method retrieves all of the text from the entire element tree, flatMap
style. Scala also supports XPath through the methods \
(for sub-elements) and \\
(for recursive/deep search). Attributes can be extracted using the @
sign before the attribute.
var res = <a><b><c>hello</c></b></a> \ "b"
// <b><c>hello</c></b>
var res2 = <a><b><c>hello</c></b></a> \\ "c"
// <c>hello</c>
val employee = <employee name="Joe" rank="officer" serial="IG-88"/>
val res3 = employee \ "@name"
// "Joe"
It’s also possible to define a fromXML
method to deserialize an object.
object SomeClass {
def fromXML(node: scala.xml.Node): SomeClass =
new SomeClass {
val description = (node \ "description").text
val year = (node \ "year").text.toInt
}
}
It’s possible to save a file from an XML node using, and conversely, load it:
xml.XML.save("serialized.xml", node)
val loadednode = xml.XML.loadFile("serialized.xml")
It’s possible to pattern match on XML. This is done by using XML literals where instead of interpolating expressions, variables are interpolated in order to bind to the matched data. In the context of XML, the _*
is interpreted as matching any sequence of nodes down the XML tree.
def proc(node: scala.xml.Node): String =
node match {
case <a>{contents @ _*}</a> => "It's an a: " + contents
case <b>{contents @ _*}</b> => "It's a b: " + contents
case _ => "Something else."
}
// proc(<a>a <em>b</em> c</a>) => "It's an a: ArrayBuffer(a, <em>b</em>, c)"
Modular Programming
When a module grows too large for a single file, it may be useful to split the module up into separate traits defined in separate files, which can then be mixed into the original module.
A problem can arise when one such compartmentalized trait wants to refer to another trait that ultimately gets mixed into the same class. This can be circumvented by specifying the self type which essentially defines the value of type of this
for whenever it’s referred to in the class. For example, assuming trait Second
needs to refer to trait First
’s method count
, we can do:
trait First {
def count = 5
}
trait Second {
this: First =>
def printCount { println(count()) }
}
It’s also possible to select types at runtime:
val db: Database =
if(args(0) == "student")
StudentDatabase
else
SimpleDatabase
Sometimes the compiler won’t be able to determine that two types are the same, as in the following case:
object Obj {
val db: Database = SomeDatabase
object browser extends Browser {
// compiler doesn't know `database` is
// same type as `db`
val database = db
}
// error: type mismatch
// found: db.SomeCategory
// required: browser.database.SomeCategory
for (category <- db.allCategories)
browser.displayCategory(category)
}
This can be resolved by using a singleton type through the type
property, which is an extremely specific type that refers to the type of the specific object 14:
object browser extends Browser {
val database: db.type = db
}
Object Equality
To recap, Scala’s equality operators are equivalent to Java’s equals
methods, that is, unlike Java’s equality operators which refer to object identity for reference types. Object identity is available through the eq
method though it’s rarely used.
There are four main pitfalls when defining equals methods:
- wrong signature
- changing it without also changing
hashCode
- defining it in terms of mutable fields
- failing to define it as an equivalent relation
Wrong Signature
Given a simple type like the following, the accompanying equals
method is too naive. What happens is that when it’s added to a collection, it’s static type becomes Any
, which would trigger the equals
method for Any
since overloading in Scala and Java is based on the static type. It turns out that the equals
method in Any
is simply object identity.
class Point(val x: Int, val y: Int) {
def equals(other: Point): Boolean =
this.x == other.x && this.y == other.y
}
A more robust equals
operator would assume that the parameter is of static type Any
, and then perform a pattern match for its dynamic type:
override def equals(other: Any) = other match {
case that: Point => this.x == that.x && this.y == that.y
case _ => false
}
It’s also a common mistake to want to define the ==
operator directly, which is impossible since it’s defined as final
:
final def == (that: Any): Boolean =
if (null eq this) {null eq that} else {this equals that}
However, if this method is defined with a different parameter type, the compiler would regard it as an overloaded variant, and would allow the definition to occur, but the same problem would occur as above, where the parameter type isn’t Any
.
Hash Code
When the equals
method is redefined, it’s also logically necessary to redefine the hashCode
method, which by default is defined in AnyRef
to be some transformation of the object’s address. In fact, if two objects are determined to be equal as per the equals
method, then they must return the same hashCode
.
The hashCode
method can be defined for Point
so that it only references fields that were used in equals
for equality determination.
override def hashCode = 41 * (41 + x) + y
Mutable Fields
Making the hashCode
and as a result the equals
method depend on mutable fields can have bad consequences. Adding an item to a collection and then changing that item’s mutable field, which itself aids in equivalence determination, will mean that the collection will no longer find the object, since the collection (a HashSet
) will keep looking in the wrong bucket, where it would expect to find the item based on its new data representation. Instead it may have been better to avoid redefining hashCode
and create a separate comparison function such as equalContents
.
Equivalence Relation
The equals
method should implement an equivalence relation on non-null objects, that is:
-
it’s reflexive: for any non-null value x
x.equals(x) == true
-
it’s symmetric: for non-null values x and y
x.equals(y) == true == y.equals(x)
-
it’s transitive: for non-null values x, y, and z
x.equals(y) && y.equals(z) == true == x.equals(z)
-
it’s consistent: for non-null values x and y, multiple invocations of
x.equals(y)
should consistently return the same value provided no information used in equality determination is modified -
x.equals(null)
should return false
Subtype Equality
There’s a mistake that can be made given a subtype that redefines its equals
method. If a supertype is compared to a subtype and appears on the left hand side, it would invoke the supertype’s equals
method which would only compare those properties common to both types, meaning that the equality would not be symmetric.
class ColoredPoint(x: Int, y: Int, val color: Color.Value)
extends Point(x, y) {
override def equals(other: Any) = other match {
case that: ColoredPoint =>
this.color == that.color && super.equals(that)
case _ => false
}
}
val p = new Point(1, 2)
val cp = new ColoredPoint(1, 2, Color.Red)
p equals cp == true
cp equals p == false
A way to solve this is to define a canEqual
method in classes that override equals
and hashCode
, which determines whether or not an object can be compared for equality with the given class. This allows subclasses of the class that redefined canEqual
to continue to compare for equality with objects of the superclass. This is especially important for the anonymous class instantiation syntax:
class Point(val x: Int, val y: Int) {
override def hashCode = 41 * (41 + x) + y
override def equals(other: Any) = other match {
case that: Point =>
(that canEqual this) &&
(this.x == that.x) && (this.y == that.y)
case _ =>
false
}
def canEqual(other: Any) = other.isInstanceOf[Point]
}
class ColoredPoint(x: Int, y: Int, val color: Color.value)
extends Point(x, y) {
override def hashCode = 41 * super.hashCode + color.hashCode
override def equals(other: Any) = other match {
case that: ColoredPoint =>
(that canEqual this) &&
super.equals(that) && this.color == that.color
case _ =>
false
}
override def canEqual(other: Any) =
other.isInstanceOf[ColoredPoint]
}
Java Interop
The Scala compiler will attempt to use direct mappings to Java value types, such as an Int
. However, sometimes this assumption can’t be made, as in a collection, in which case it’ll use wrapper classes like java.lang.Integer
.
Singleton objects
the compiler creates a Java class with suffix $
which contains all of the methods and fields of the Scala singleton object as well as a MODULE$
static field that holds the singleton instance created at runtime. If the singleton object has no companion class then a class without the $
suffix is created which has a static forwarder method for each method in the singleton object.
Traits get compiled down to Java interfaces if they only contain abstract methods.
Existential Types
Existential types in Scala are mainly used to facilitate Java’s wildcard types and raw types. They take the form:
type forSome { declarations }
For example, the following is equivalent to Java’s Iterator <?>
and reads as it being an Iterator
of T
’s for some type T
. It’s also possible to use placeholder syntax which is similar to the one used in patterns. For each underscore present in the type a type parameter is added to the forSome
clause:
Iterator[T] forSome { type T }
// equivalently: placeholder syntax
Iterator[_]
Similarly, Java’s Iterator<? extends Component>
can be represented as follows in Scala, where it reads as being an Iterator
of T
for some type T
that is a subtype of Component
:
Iterator[T] forSome { type T <: Component }
// equivalently: placeholder syntax
Iterator[_ <: Component]
Given a Java class with wildcards like this:
public class Wild {
Collection<?> contents() {
Collection<String> stuff = new Vector<String>();
stuff.add("a");
stuff.add("b");
stuff.add("see");
return stuff;
}
}
If we want to create a new Set
from the contents of this collection, we’ll have trouble naming the type:
import scala.collection.mutable.Set
val iter = (new Wild).contents.iterator
val set = Set.empty[???] // what type?
while (iter.hasMore)
set += iter.next()
The solution to this is to create an abstract class with methods for each of the types in the forSome
clause:
abstract class SetAndType {
type Elem // required for defining `set`
val set: set[Elem]
}
def java2scala[T](jset: Collection[T]): SetAndType = {
val sset = Set.empty[T] // T can be used, inferred from parameter
val iter = jset.iterator
while (iter.hasNext)
sset += iter.next()
return new SetAndType {
type Elem = T
val set = sset
}
}
Compiling Heterogenous Projects
Usually one would compile the Java code and add the output to the classpath when building the Scala code. However, this wouldn’t work if the Java code references the Scala code, or vice versa. For this reason, Scala allows processing of Java source files as if they were Scala files. They won’t be compiled, but they’ll be scanned to determine what they contain. This is done with this sequence of commands:
$ scalac -d bin FileAnalysis.scala FileItem.java File.java
$ javac -cp bin -d bin File.java FileItem.java FileManagement.java
$ scala -cp bin FileManagement
Actors and Concurrency
Scala uses actors similar to Erlang’s as its primary concurrency primitive. Note that this actors library is deprecated; Akka actors will be covered later and are the default actors library since Scala 2.10.
import scala.actors._
object PrintActor extends Actor {
def act() {
for (i <- 1 to 5) {
println("acting")
Thread.sleep(1000)
}
}
}
SillyActor.start() // prints "acting" five times
// equivalent: .start()s as soon as it's defined
val PrintActor2 = actor {
for (i <- 1 to 5) {
println("acting")
Thread.sleep(1000)
}
}
Actors can be sent messages using the !
method. The following actor prints out the messages it receives:
val echoActor = actor {
while (true) {
receive {
case msg => println("received: " + msg)
}
}
}
echoActor ! "testing"
// received: testing
The way this works is that the receive
method accepts a partial function, which in this case is specified as a partial function literal case sequence. The actor will block until a message is received that matches a pattern in the partial function, and non-matching messages are silently ignored.
It’s also possible to treat “native” threads as actors through Actor.self
, which yields an actor representing the current thread. This is useful for debugging, by specifically setting a receive
function:
import scala.actors.Actor._
self ! "hello"
self.receive { case x => x}
// res1: Any = hello
// 1 second timeout to avoid blocking repl
self.receiveWithin(1000) { case x => x }
Every actor is given its own thread so that they can each perform the act
method. This is resource heavy, so a react
method can be used that is similar to receive
. The react
method doesn’t return after it finds and processes a message, instead it only evaluates the message handler. This means that the implementation doesn’t need to preserve the call stack of the current thread, allowing it to reuse the thread for the next actor that wakes up.
Since react
doesn’t return, it’s usually common to return the information (if any) by sending the message to another actor, then recursing the message handler act
to continue to process messages. This recursion pattern is common enough that there’s a method loop
that can wrap around the react
method so that it loops infinitely.
object Resolver extends Actor {
import java.net.{InetAddress, UnknownHostException}
def act() {
react {
case (name: String, receiver: Actor) =>
receiver ! getIp(name)
act()
case "EXIT" =>
println("exiting") // doesn't recurse again
case msg =>
println("unhandled msg: " + msg)
act()
}
}
def getIp(name: String): Option[InetAddress] = {
try {
Some(InetAddress.getByName(name))
} catch {
case _:UnknownException => None
}
}
}
Resolver.start()
Resolver ! ("www.scala-lang.org", self)
self.receiveWithin(0) { case x => x }
// Some(ww.scala-lang.org/<some-ip>)
Actor’s shouldn’t block, such as while processing a message, since this can even cause deadlocks while actors wait on each other. If an actor needs to perform an operation that blocks in order to process a message, a separate actor should be created that performs the blocking operation and then notifies the original actor when its work is complete. In the following example, the actor will continue to accept requests even while it continues to perform blocking operations, which it delegates to a sub-actor.
val someActor = actor {
def emoteLater() {
val mainActor = self
actor {
Thread.sleep(1000)
mainActor ! "Emote"
}
}
var emoted = 0
emoteLater()
loop {
react {
case "Emote" =>
println("acting")
emoted += 1
if (emoted < 5)
emoteLater()
case msg =>
println("received: " + msg)
}
}
}
/*
acting
acting
someActor ! "hello"
Received: hello
acting
acting
*/
It’s very important to communicate with actors only via messages. However, unlike Erlang’s actors, Scala allows mixing the traditional shared data & locks model with actors. For example, if multiple actors were to share a common mutable map, there are two approaches that can be taken to achieve this. The first would entail creating an actor that owns the map, defining messages for every kind of map operation that would be required such as getting and setting values. An alternative approach, however, would be to pass a thread-safe map such as ConcurrentHashMap
.
It’s also a good idea to keep actors self-contained. A good idea is to send contextual information about the request (if not the request itself) along with the response, to the original actor, since some time may have passed since the actor performed the request. It’s also more readable to create case classes when possible:
case class LookupIP(name: String, respondTo: Actor)
case class LookupResult(name: String, address: Option[InetAddress])
object Resolver extends Actor {
def act() {
loop {
react {
case LookupIP(name, actor) =>
actor ! LookupResult(name, getIp(name))
}
}
}
def getIp = ...
}
-
Reminds me of Ruby’s ‘EigenClasses’, but I’m not quite sure yet if it’s indeed similar, or if companion objects truly are just a separation for specifying class-wide values/methods. ↩︎
-
Reminds me of
std::bind
in C++11, which does the same thing by creating a functor, or function object (not to be confused with the category theory Functor more common in Haskell). ↩︎ -
The mnemonic I use is that it reminds me of the pattern in Rust to accomplish something similar by passing a closure to be called to evaluate the actual parameter to use. ↩︎
-
Reminds me of Ruby’s modules that can be included. ↩︎
-
This is of course very similar to setters in Ruby, which take the form
attr=(arg)
. ↩︎ -
This reminds me of a Monoid’s identity,
mempty
in the Haskell typeclass, but I doubt that_
is backed by a Monoid typeclass because a Monoid would also require an associative binary operation. Perhaps it’s simply more like the default package’s default typeclass. ↩︎ -
This reminds me of Go’s interfaces. ↩︎
-
Now that I’ve learned what implicit conversions are in Scala, I have formed an opinion about them. Implicit conversions are one of the parts that make C++ very complex. Scala’s implicit conversions don’t seem as complex as C++’s, here they’re implicit at the site of use, but explicit at the site of definition. In C++ it feels that they’re implicit on both ends, since you can have overloaded conversion operators and conversion constructors on either type, which is further made ambiguous with arithmetic type conversions. ↩︎
-
I wonder if these are similar to Haskell’s difference lists. ↩︎
-
It seems like Scala uses mutability to leave optimization up to the developer. Contrast this with Haskell, where everything is immutable at the language level and the optimizations are done underneath at compile time or at the runtime level. The cons
:
in Haskell for example would also reuse the tail, creating a persistent linked list, but the developer doesn’t have to worry about how to best implement something like this for efficiency. Scala of course affords one more flexibility in how they implement something, but it expects that every developer be mindful of how best to implement things in the unusual functional and imperative environment. ↩︎ -
Something like this would be implicit in Haskell due to its non-strict nature. ↩︎
-
This is of course very much like Go’s slices and Rust’s slices. ↩︎
-
This reminds me of open file descriptions which record the file offset and status flags. Duplicate file descriptors share this information. To avoid this, it’s necessary to perform a separate
open
call. ↩︎ -
This definitely reminds me of Ruby’s EigenClasses. ↩︎