About

Nodeta is a software development company that focuses on web software. We employ a highly agile and effective process. We have worked both on light independent projects and in the environment of large global enterprises.

Our Products

Flowdock

Streamline your team's tasks, feeds and communication. Organize with tags. Sign up for beta!


APIdock

APIdock provides a rich and usable interface for searching, perusing and improving the documentation of projects that are included in the app.

Categories
Archives
-->

Why Flowdock migrated from Cassandra to MongoDB

Otto Hilska July 26th, 2010

Flowdock is a modern web-based team messenger. All software developers should be using it instead of their Campfires, Skype Chats, IRCs, etc. because it better supports their actual workflow.

Last weekend we completed a transition from Flowdock’s database of choice, Cassandra, to another NoSQL alternative, MongoDB. Since our technology stack has always generated some interest, I’ll now try to justify our decision in public.

Some of our users might remember this:

Twitter screenshot: having some database problems

At some point we started to have some stability issues with Cassandra. All nodes would go into an infinite loop, running GC and trying to compact the data files – occasionally falling off the cluster. We were unable to solve the problem, except that restarting and then compacting a node usually settled it down for a while. Other people had reported similar problems. Last couple of weeks our Cassandra nodes always ate all the resources they were given, slowing down Flowdock.

This was not the first time we had run into problems because of our bleeding edge database choice. When upgrading from 0.4 to 0.5, we had to shut down the cluster, only to find out that it hadn’t flushed everything to the disk (even though we explicitly flushed it, as instructed). Thus we ended up having a couple of minutes of discussions lost, and our custom-built indices were miserably out of date and needed to be rebuilt. I think it was 4 AM when we finally got to leave the office.

The NoSQL scene has evolved since we made our original decision to go with Cassandra. MongoDB is changing rapidly, and the latest addition of auto-sharding and replica sets made it a compelling alternative to Cassandra. So I decided to give it a try.

It took me a day to write a conversion script for our data. Within a week or so we were able to run Flowdock purely on MongoDB. It was tested internally for a couple of weeks before it was deployed to production.

Now that we have done the change, I’m happy to see that we got some benefits (very well known in most databases) in addition to the performance and reliability characteristics:

  1. Smart (multikey) indices. Manually maintaining indices was always tedious, and MongoDB can index everything we need out-of-the-box. For example, our messages have tags, implying a document format like this:
    { content: "Write a blog post about #mongodb.",
      workspace: 'myflow',
      tags: ["mongodb", "todo", "@Otto"] }
    

    Now when looking for my own tasks, Flowdock backend only needs to do this query:

    db.messages.find({
      workspace: 'myflow',
      tags: { $all: ["todo", "@Otto"] }
    })
    
  2. Queries. No matter how simple your data model is, every once in a while you need to perform a query that you didn’t plan in advance. MongoDB lets you construct complex queries directly from the console, pretty much like an SQL database. It will then perform a sequential scan, which is still much faster and more convenient than processing millions of rows manually, on the client-side.
  3. Map-Reduce. It’s great for stuff like analytics. MongoDB’s Map-Reduce support is not perfect, but at least it’s easy to use.
  4. GridFS makes storing our files very easy. The storage capabilities expand together with the rest of our MongoDB cluster.

We have faced only some minor limitations:

  1. We found a bug in JSON parsing that got fixed in 10 minutes.
  2. Dots are not allowed in BSON document keys. Typically it might not be a problem, but we had to work around it in our data migration.
  3. Document size is limited to 4 megabytes. It’s not a problem with our data model, but since MongoDB supports fantastic atomic in-place updates, you have to be careful not to grow your documents above this limit.
  4. Adding new nodes is not as easy as it is with Cassandra. However, Cassandra has its own problems load-balancing them.

So far it’s been a very smooth ride. Development and database administration just got a whole lot easier.

» Now give the new and better Flowdock a try!

Flowdock On The Desktop

Mikael Roos March 16th, 2010

Flowdock, the messenger for teams, has been made with some razor bleeding edge technologies. Cassandra in the back end, HTML5 on the front with comet in between. Built to run in the browser, you can use Flowdock on any modern computer. However, there are things some might consider shortcomings when it comes to using Flowdock in the browser. That’s why we’ve provided a new guide for running Flowdock on the desktop as an SSB using Fluid or Prism.

» Jump to the Guide (OS X, Windows, Linux)

So, what exactly are the benefits of running Flowdock on the desktop?

Notifications

When used on the desktop, Flowdock uses native desktop notifications to show you what is happening in your flow. This way, you stay up-to-date with no delay.

App of its own

You get a cool Flowdock icon on your desktop, and with Fluid. A cool way of visualizing what you’ve missed, are the unread badges in OS X, which Flowdock supports.

Meet Scalandra: Scala wrapper for Cassandra

Ville Lautanala August 28th, 2009

Developing Flowdock, a real-time environment allowing teams to work together seamlessly, we needed a database able to scale horizontally for huge amount of write operations. Cassandra’s data model and performance characteristics made it a perfect fit for our needs. It is a distributed database with a data model slightly like Google’s BigTable. Cassandra was originally developed by Facebook to run their in-site messaging system, Inbox. If you want to learn more about Cassandra, Evan Weaver’s up and running with Cassandra is a good place to start.

In the beginning of July when we started to use Cassandra, there wasn’t any usable bindings for  Scala, which we use for Flowdock back-end, and hence we decided to develop our own. Two months later, the Thrift API is barely recognizable, but still bag of hurt in Scala. Others have also noticed this and developed their own bindings for Scala, but they all have slightly different approaches.

In Flowdock we use Scalandra to handle three tasks: wrap the Thrift API in a more painless interface, manage connections efficiently using a pool and to serialize and deserialize byte arrays, which are used in Cassandra to store data. In Scalandra you can use three levels of serialization to support all possible combinations of different data types in Cassandra data model.

Scalandra has two levels of interaction. At the lower level, it provides a similar API to the Thrift interface with commands for get, slice, insert, remove and count operations. Client API is meant to be used together with connection pooling to make sure that only a reasonable amount of sockets is opened. Unfortunately for our purposes, Scala doesn’t have RAII and because of that connections have to be managed more explicitly.

Assuming that we had a Cassandra instance running at localhost and it is configured to contain a standard column family “Users” in keyspace “Keyspace1″, we could use the Scalandra API to insert and read data like the following example.

import com.nodeta.scalandra._
import com.nodeta.scalandra.serializer.StringSerializer
import com.nodeta.scalandra.pool.StackPool

val pool = StackPool(ConnectionProvider("localhost", 9160))

pool { connection =>
  // Let's create a client which assumes everything is a String
  val client = Client(connection, "Keyspace1", StringSerializer)

  // Let's insert some data to Cassandra
  client.insertNormal(
    ColumnParent[String]("Users", "jsmith"),
    Map("first" -> "John", "last" -> "Smith")
  )

  // Slice all results from Column Parent
  // None parameters are lower and upper bounds for slice
  client.slice(
    ColumnParent[String]("Users", "jsmith"),
    None,
    None,
    Ascending
  )
  // Returns data from previous insert

  client.get(ColumnPath("Users", "jsmith", "first"))
  // Returns Some("John")

  client.get(ColumnPath("Users", "jsmith", "nonexistent"))
  // Returns None
}

As the Casssandra data model is basically a multi-dimensional map, we’ve built a Map-like interface to Cassandra. It is not very polished in its current state. Mutations work, but are not always intuitive to work with because they might require a Scalandra Map object as parameter. Using the previously inserted data, the map model could be used like this:

import com.nodeta.scalandra.map.StandardRecord
import com.nodeta.scalandra.ColumnParent

class User(protected val connection : Connection, key : String
 ) extends StandardRecord[String, String] {
  protected val path = ColumnParent[Any]("Standard1", key)
  protected val keyspace = "Keyspace1"
  protected val columnSerializer = StringSerializer
  protected val valueSerializer = StringSerializer
}

pool { connection =>
  val user = new User(connection, "jsmith")

  // User is basically the same map as inserted before
  user("first") // returns "John"

  // But also, slice actions exist
  user.slice("bar", "foo")
  // Returns Map("first" -> "John")

  // Insert value
  user("age") = "53"

  // John doesn't need to have a last name so let's remove it
  user -= "last"
}

Scalandra Git repository is located at Github. We’ve also generated Scaladocs to help with usage information.

If you have any comments or use this, please drop us a line. Also, patches are welcome :)