Developing Flowdock, a real-time environment allowing teams to work together seamlessly, we needed a database able to scale horizontally for huge amount of write operations. Cassandra‘s data model and performance characteristics made it a perfect fit for our needs. It is a distributed database with a data model slightly like Google’s BigTable. Cassandra was originally developed by Facebook to run their in-site messaging system, Inbox. If you want to learn more about Cassandra, Evan Weaver’s up and running with Cassandra is a good place to start.

In the beginning of July when we started to use Cassandra, there wasn’t any usable bindings for  Scala, which we use for Flowdock back-end, and hence we decided to develop our own. Two months later, the Thrift API is barely recognizable, but still bag of hurt in Scala. Others have also noticed this and developed their own bindings for Scala, but they all have slightly different approaches.

In Flowdock we use Scalandra to handle three tasks: wrap the Thrift API in a more painless interface, manage connections efficiently using a pool and to serialize and deserialize byte arrays, which are used in Cassandra to store data. In Scalandra you can use three levels of serialization to support all possible combinations of different data types in Cassandra data model.

Scalandra has two levels of interaction. At the lower level, it provides a similar API to the Thrift interface with commands for get, slice, insert, remove and count operations. Client API is meant to be used together with connection pooling to make sure that only a reasonable amount of sockets is opened. Unfortunately for our purposes, Scala doesn’t have RAII and because of that connections have to be managed more explicitly.

Assuming that we had a Cassandra instance running at localhost and it is configured to contain a standard column family “Users” in keyspace “Keyspace1”, we could use the Scalandra API to insert and read data like the following example.

import com.nodeta.scalandra._
import com.nodeta.scalandra.serializer.StringSerializer
import com.nodeta.scalandra.pool.StackPool

val pool = StackPool(ConnectionProvider("localhost", 9160))

pool { connection =>
// Let's create a client which assumes everything is a String
val client = Client(connection, "Keyspace1", StringSerializer)

// Let's insert some data to Cassandra
ColumnParent[String]("Users", "jsmith"),
Map("first" -> "John", "last" -> "Smith")

// Slice all results from Column Parent
// None parameters are lower and upper bounds for slice
ColumnParent[String]("Users", "jsmith"),
// Returns data from previous insert

client.get(ColumnPath("Users", "jsmith", "first"))
// Returns Some("John")

client.get(ColumnPath("Users", "jsmith", "nonexistent"))
// Returns None

As the Casssandra data model is basically a multi-dimensional map, we’ve built a Map-like interface to Cassandra. It is not very polished in its current state. Mutations work, but are not always intuitive to work with because they might require a Scalandra Map object as parameter. Using the previously inserted data, the map model could be used like this:

import com.nodeta.scalandra.ColumnParent

class User(protected val connection : Connection, key : String
) extends StandardRecord[String, String] {
protected val path = ColumnParent[Any]("Standard1", key)
protected val keyspace = "Keyspace1"
protected val columnSerializer = StringSerializer
protected val valueSerializer = StringSerializer

pool { connection =>
val user = new User(connection, "jsmith")

// User is basically the same map as inserted before
user("first") // returns "John"

// But also, slice actions exist
user.slice("bar", "foo")
// Returns Map("first" -> "John")

// Insert value
user("age") = "53"

// John doesn't need to have a last name so let's remove it
user -= "last"

Scalandra Git repository is located at Github. We’ve also generated Scaladocs to help with usage information.

If you have any comments or use this, please drop us a line. Also, patches are welcome :)