The growing trend of companies running multiple databases in production was the inspiration for Orchestrate. This polyglot persistence suggests choosing utility over simplicity. One of the examples we’ve followed early on is Klout. The application runs many different types of databases to store, analyze and query its social media data. However, Klout doesn’t just throw new databases without good reason, according to Director of Software Engineering/Operations Ian Kallen.
Above is a glimpse into the complexity of Klout’s architecture. The company ingests 250 TB of data every day, with a 150 node MapReduce cluster that is busy 24 hours per day.
Klout’s technology stack includes Node.JS, Redis, Play Framework, HBase, MySQL, MongoDB, Memcached, ElasticSearch, Hadoop MapReduce, HDFS, Hive, Sqoop, Kafka, RabbitMQ, Spark and Storm. Combined, they produce a simple Klout score for each user, tag users with topics, offer content suggestions, and support the greater Klout platform.
“Each individual database has separate and different management concerns,” Kallen said. “Having the right mix of administrative talent is difficult.”
Kallen says database options, both the ones Klout already runs and doesn’t run, causes cognitive overhead for developers. Young developers, especially, aren’t sure what to use. For example, one might naively “run a bunch of MongoDB replica sets,” Kallen said. “Most of those are the wrong tool for the job because MongoDB scales very poorly. ‘Just shard it and buy more’ doesn’t work.”
Even experienced developers can have a tough time deciding on the right tool for the job, Kallen said:
“We have arguably too many tools in the tool belt. We could drive ourselves crazy trying to get every specialized tool for every specialized use case. There are only so many backends we can support.”
Klout’s infrastructure is necessarily complex, built to bring in lots of data, crunch it and create insights. Despite the large list of technologies that Klout supports, it still feels the impact of adding another technology. There would be considerable pushback, Kallen said, so there would need to be a really good reason. The right tool for the job must be more than the sum of its parts.
At Orchestrate, we strongly identify with the complexity of operating multiple databases. The technologies we use are similar to Klout on the database front. We’ve taken the best of NoSQL and exposed the queries via an API that’s scalable and easy to use. Sign up today and get 50,000 free queries per month.