So you’ve grown up with SQL, and learned well how to organize your data in sane, relational fashions. Now there’s all this NoSQL business talking about brave new worlds of horizontal scaling and fault tolerance — does what you know about data normalization, robust schemas, and optimizing queries still stand?
The short answer, yes. The long answer follows.
In a database like PostgreSQL or MySQL, tables contain rows, and you have one table for each schema your rows exhibit, containing all the rows that fit that schema. With Orchestrate‘s NoSQL database API, collections contain items, but it’s the same! Say you’ve got a blog, with users, articles, and comments. In a relational database, you’d have one table for each. In Orchestrate? Three collections! It’s the same principle: separate items into collections by schema.
Who determines that schema? You! How? However your heart fancies! Or, nearly. Here are some tips:
- Don’t use variables as field names. Keep field names static! You can’t query against field names, so having dynamic field names like
email@example.com leave that data unable to be queried. Instead, have static field names like email.
- Only use alphanumeric characters (A-Z, 0-9) in field names. Characters like : and . will make those fields more difficult to query.
- Normalize your data! Documents with different schemas should go in different collections. Later, once you know what kinds of queries your users are making together, you can denormalize to optimize for those requests.
- If you need to ensure an item is unique, give it a key related to its contents. If usernames are unique, for example, use usernames as keys to avoid redundancy of user items. If you want to make sure the item isn’t overwriting anything when you store it, use the If-None-Match header to store the item only if nothing is already using that key.
We manage the schema of your items behind the scenes to automatically index them for querying, while you still have complete freedom to organize and describe your data however you please.
How do we relate data together? Y’know, one-to-one, one-to-many, many-to-many — how do we do that? Relations!
Say you’re working in Python, using porc, and you want to create a comment on a blog article. Let’s see how we would relate the comment to a given article:
import porc # create our client client = porc.Client(API_KEY) # create the comment collection = client.collection('comments') res = collection.post(COMMENT).result() comment_key = res.path['key'] # relate to the article key = client.collection('articles').key(ARTICLE_KEY) relation = key.relation() response = relation.put('commented_by', 'comments', comment_key).result()
Then, to retrieve all comments for a given article:
import porc # create our client client = porc.Client(API_KEY) # get all comments comments = client.collection('articles').key(ARTICLE_KEY).relation().get('commented_by').result()
There you go! All comments, one request.
Say you want every article and comment by a particular user. That’s two different collections; how do we get it in a timely fashion? Parallelism!
By performing requests in parallel, waiting for every one of them to finish only takes as long as the slowest request, which would have had to run anyway. Let’s see it:
import porc # create our client client = porc.Client(API_KEY) # begin requests in parallel article_future = client.collection('articles').search(q='author:%s' % AUTHOR) comment_future = client.collection('comments').search(q='author:%s' % AUTHOR) # wait for results article_res = article_future.result() comment_res = comment_future.result()
Two requests as fast as the slowest of the two. Yay parallelism!
That said, we recognize that as powerful as item and graph searches are, they leave much to be desired in comparison to the power and sophistication of SQL, and we’re working on it. We at Orchestrate want to make databases effortless, and if you have opinions about how to do that, let us know! We prioritize features based on your feedback, so ask and it shall be given :D