Long before I had ever met a “distributed database,” I already had a great relationship with my relational database. I loved it for its BTree indexes, and I admired it for its UNIONs and JOINs. But the feature I used most often was probably the humble auto-incrementing primary key: a quick, simple way to guarantee unique keys for any table.
These days, with most of my work now using distributed databases, I miss having those automatically generated, guaranteed-unique, ascending keys.
If you’ve worked with distributed databases, you’ve probably scratched your head more than once trying to create system-wide unique primary keys for all your objects. Are natural keys better than synthetic keys? And how do you guarantee uniqueness without locking up the whole cluster?
UUIDs are tempting, but they can be so bulky. Can we get the same benefit of uniqueness in half the size?
In situations like this, some developers may end up using “probably-unique” randomized keys. For example, a sixteen-character randomized string using the characters A – Z and 0 – 9 would have 3616 (7.95 x 1024) possible unique values. The chances of a collision are pretty slim for any individual object, but what if the system needs to start generating billions of objects with these kinds of keys? How long before the inevitable collision? And how would you even know that a collision had occurred?
It’s easy to make collisions rare. It’s very hard to make them impossible.
That’s why we’re introducing Server Generated Keys as a new feature of the Orchestrate API. Now you can submit new objects into your collections, and Orchestrate will generate a 64-bit key that’s guaranteed to be unique across our entire cluster.
Not probably unique. Guaranteed.
To use this new functionality, execute an HTTP POST request directly against your collection URL:
curl -i "https://api.orchestrate.io/v0/$collection" \ -XPOST \ -H "Content-Type: application/json" \ -u "$api_key:" \ -d "$json"
The server will respond with a Location header, containing both the newly generated ID and the canonical ref for the newly inserted object:
In this example, the value
036ea872f9011a7c is the newly generated ID, and
fab82eac8414ded3 is the canonical
ref for this particular version of the object. Of course, If we update the object later, the updated object will keep the same ID but get a new ref value.
By contrast, when you don’t need the server to generate a key, you can continue to use the existing REST endpoint. Just execute an HTTP PUT request, with your key already included in the URL path:
curl -i "https://api.orchestrate.io/v0/$collection/$key" \ -XPUT \ -H "Content-Type: application/json" \ -u "$api_key:" \ -d "$json"
Using the new API should be easy and convenient to use on any of your Orchestrate collections.
For those of you interested in what’s going on behind the scenes, here’s how it works:
Each of our API servers has its own unique ID internally. When you request a new Server Generated Key, the server combines its own ID with a millisecond-granularity timestamp and a sequence number. Since no two machines have the same ID, it’s impossible to generate duplicate IDs anywhere in the cluster, even during the same millisecond.
The best part about these IDs is that they have a natural sort-order according to their underlying timestamp. For example, here are a few IDs generated this morning (during the same millisecond), from two different servers:
|MACHINE 0||MACHINE 1|
In this 64-bit structure, the first 40 bits (the first 10 hex characters) are used for the timestamp, the next 12 bits (3 chars) are used for the machine ID, and the final 12 bits (3 chars) are used for the sequence number. (By the way, if this sounds a lot like Twitter Snowflake, that’s because we took a lot of inspiration from their work when we designed and built our solution.)
We use a different epoch than the standard library (our timestamps begin in 2014 instead of 1970 because we need the extra bits for machine IDs and sequence numbers), so you probably shouldn’t try to convert these values back into standard timestamps. But you can rely on the timestamps to create approximate lexicographical ordering, according to their insertion chronology.
Any individual API server can guarantee correct ordering down to a single millisecond (with approximate ordering at the sub-millisecond level). But since each API call might be load-balanced to a different server in our cluster, and since the hardware clock on those individual servers can drift, relative to one another, by a few milliseconds in either direction, it’s important not to rely on strict ordering of these ID values.
So don’t use the ordering of these IDs as a critical component of your new high-frequency-trading app. But their approximate chronological ordering makes them great for blog posts or chat messages or friend requests or game events, or a zillion other useful things.
We hope you love the new functionality and that it helps grease the wheels as you develop your next breakout application!
And, as always, let us know what you think by dropping us a line at UserVoice. This feature grew directly from user feedback, and your feedback will continue to drive our development priorities.