In Why You Should Never Use MongoDB, Sarah Mei explores the problem of querying activity streams:

Users have friends, friends have posts, posts have comments and likes, each comment has one commenter and each like has one liker. Relationship-wise, it’s not a whole lot more complicated than TV shows. And just like with TV shows, we want to pull all this data at once, right after the user logs in. Furthermore, in a relational store, with the data fully normalized, it would be a seven-table join to get everything out.

Seven-table joins. Ugh. Suddenly storing each user’s activity stream as one big denormalized nested data structure, rather than doing that join every time, seems pretty attractive.

Even in a relational store, aggregating social data becomes prohibitively slow very quickly. (In MongoDB it was even worse) All the data you need lives across numerous documents, and pulling them together takes time. If we could pull together all the related information into a single item – that is, denormalize it – we could query it in one operation rather than seven or more.

Common approaches to denormalization involve just duplicating data, which can get mighty messy when anything changes:

Updating a user’s data means walking through all the activity streams that they appear in to change the data in all those different places. This is very error-prone, and often leads to inconsistent data and mysterious errors, particularly when dealing with deletions.

Instead, let’s use a normalized schema as the authoritative representation of our data, and let workers denormalize it in the background for querying. Then, you would never have to deal with data duplication, and you can still query for complex objects like activity streams in one request. How?



Build Coverage


orc-denorm is a Node.js utility that continually denormalizes the contents of an Orchestrate collection into a different but related collection. It examines every item in the collection for fields named like [collection]_key, uses their value to find the item they refer to, replaces those [collection]_key fields with the item itself, and then creates a new document in a collection prefixed with denorm_. Now, automatically, you can query complex objects in one request.

As a demonstration, consider this document from a like collection:

    user_key: ‘…‘,
    post_key: ‘…‘

By default, orc-denorm will turn it into this:

    user: {
    post: {

… with the same key as the original, but in the denorm_like collection. By keeping the collections separate, we get the benefits of normalization and denormalization without the troubles of data duplication.

To get orc-denorm running, you’ll need node.js. Then, just install and run:

npm install -g orc-denorm

You can also customize how it denormalizes data. If you use Graph Relations to relate likes and posts, and Events to store comments, you could configure orc-denorm to get those items:

var orc_denorm = require('orc-denorm')();
var kew = require('kew'); // for working with promises

// the custom denormalization function
orc_denorm.denormalize = function (db, path, item) {
    // db is an authenticated orchestrate.js client
    // path == { collection: '...', key: '...', ref: '...'}
    // item == { /* the item's value */ }

    // add a post's comments and likes to the post object
    var promises = [
            .from(path.collection, path.key)
            .from(path.collection, path.key)

    return kew.all(promises)
    .then(function (results) {
        item.comments = results[0].body.results;
        items.likes = results[1].body.results;
        return item;
    // save the denormalized post
    .then(function (item) {
        var collection = ['denorm', path.collection].join('_');
        return db.put(collection, path.key, item);
    .then(function () {
        return item;

// run orc-denorm's CLI
// or just start the process with
// orc_denorm.start({ collection: '...', api_key: '...' });

Now the denorm_post collection contains posts and all their likes and comments. To get someone’s activity stream, then, is just one search query on the denormalized collection:

var friends = [/* a list of user keys */];'denorm_post', 'user.path.key:(' + friends.join(' OR ') + ')')
.then(function (res) {
    var activity_stream = res.body.results;

What originally would have taken numerous queries, we’ve made to take only one.

Stayin’ Alive

The trouble with this approach is, if your orc-denorm process ever dies, then all your denormalized data stops getting updated. orc-denorm does its best to not let errors kill the party. For example:

  • Related object is missing? That field in the denormalized item is now null, while scanning continues.
  • Related object yields some other wacky error? Skip denormalizing that object, while scanning continues.
  • Retrieving collection listing yields some 4xx or 5xx error? Scanning doesn’t even flinch~

orc-denorm works hard to stay alive, but to make sure orc-denorm really never dies, use forever:

forever orc-denorm -u YOUR_API_KEY -c COLLECTION

This will restart orc-denorm if it ever halts unexpectedly.

Coming Soon

orc-denorm solves querying problems I’ve seen pop up for users of numerous databases, but as a very young project, there’s plenty left to be done. Particularly, I’ll be optimizing the way orc-denorm watches collections, so it processes larger datasets more efficiently. If you want to contribute, or if you run into problems, check out the issues page!

Happy coding!