Making the transition from relational databases to NoSQL can feel a little weird. While there are some similarities it’s a fairly big shift in thinking. Having grown up on the relational model I’ve been a bit resistant to give NoSQL solutions a fair shake but in the past few years it’s become difficult to ignore.

Being a professional software developer, it didn’t take long before I was curious. Can I use NoSQL solutions in place of relational ones? Where do I put the foreign keys? Is the seemingly total lack of data structure manageable? Let’s find out.

Define the Problem

To illustrate the difference between SQL and NoSQL approaches to modeling data, let’s build a movie catalog. We will start with a small data set and add complexity as we go. Initially our goal is to maintain a catalogue of movies with a title, description, genre tags, lead actors and year released.

  • title: Tremors
  • released: 1990
  • description: Natives of a small isolated town defend themselves against strange underground creatures which are killing them one by one.
  • genres: comedy, horror, sci-fi
  • actors: Kevin Bacon, Fred Ward, Finn Carter
  • title: A Few Good Men
  • released: 1992
  • description: Neo military lawyer Kaffee defends Marines accused of murder; they contend they were acting under orders.
  • genres: crime, drama, mystery
  • actors: Tom Cruise, Jack Nicholson, Kevin Bacon
  • title: Apollo 13
  • released: 1995
  • description: Three astronauts must devise a strategy to return to Earth safely after their spacecraft undergoes massive internal damage.
  • genres: adventure, drama, history
  • actors: Tom Hanks, Bill Paxton, Kevin Bacon

Relational Solution

Taking the simplest approach we could write a schema might look like this:

movie
--------
movie_id    int auto_increment
title       varchar(50)
description text
released    date
genres      varchar(255)
actors      varchar(255)

It stores all the data we need and fulfills our movie catalogue requirements. However, for anyone that’s written software professionally you’re probably already shaking your head. What happens when an actor changes their name? What if we decide we want to view all movies by a specific actor or genre? While it’s possible to accomplish some of those goals the solutions are far from ideal. Using a normalized approach, a more flexible schema might look like this:

movie
--------
movie_id    int auto_increment
title       varchar(50)
description text
released    date

genre
-----
genre_id    int auto_increment
genre       varchar(20)

actor
-----
actor_id    int auto_increment
name        varchar(255)

genre_movie
-----------
movie_id    int
genre_id    int

actor_movie
-----------
actor_id    int
movie_id    int

While slightly more complicated this allows us to access our data in a variety of different ways and allows for changes to an actors name without having to update every instance in the movie table.

We can find every movie by Tom Hanks:

SELECT movie.title
FROM movie
JOIN actor_movie ON (movie.movie_id = actor_movie.movie_id)
JOIN actor ON (actor.actor_id = actor_movie.actor_id)
WHERE actor.name='Tom Hanks';

We can select every actor by genre:

SELECT actor.name
FROM actor
JOIN actor_movie ON (actor.actor_id = actor_movie.actor_id)
JOIN movie ON (movie.movie_id = actor_movie.movie_id)
JOIN genre_movie ON (movie.movie_id = genre_movie.movie_id)
JOIN genre ON (genre.genre_id = genre_movie.genre_id)
WHERE genre.genre = ‘history’

In order to build flexibility into our application we had to take our tables from 1 to 5. Everything is normalized but what happens when we want to create a list of related movies for each movie?

NoSQL Solution

Let’s use that same data set and take a NoSQL approach. There are a number of different types of NoSQL solutions the most common being a key/value document store. You create a document using XML, YMAL, JSON, etc. and store it in the database under a key or id. For the sake of this article we’ll be using JSON.

  • collection: movie
  • key: 1
  • value:
{
    "title": "Tremors",
    "released": "1990",
    "description": "Natives of a small isolated town defend themselves against strange underground creatures which are killing them one by one.",
    "genres": [
        "comedy",
        "horror",
        "sci-fi"
    ],
    "actors": [
        "Kevin Bacon",
        "Fred Ward",
        "Finn Carter"
    ]
}

There is no common interface for NoSQL solutions so let’s just assume we’ve stored our value as a JSON document with a unique key in a collection (AKA table). Also assume we can recall that document by its key.

Now, at first glance it might not make a whole lot of sense. There is no schema, and no tables to create. You just say I want to store this value, in this collection using this key. In a lot of cases you don’t even have to make the collection ahead of time. The big shift here is the database is no longer responsible for what your data looks like. You can add and remove properties of your JSON document programmatically. You don’t have to save or manage a schema because your application models do that for you. Still with me?

So, we can put data in, but how do we get this data back out and how can we query for the same relationships like we did above? To answer that, let’s use Orchestrate.

Orchestrate Solution

As previously mentioned, there are a number of NoSQL solutions each one tackling different problems. What’s great about Orchestrate is we get all of those solutions rolled up into one API. Before we get too far down that road, let’s put our data into Orchestrate (if you haven’t yet set up a free account).

You can access Orchestrate via it’s RESTful API or a language specific library. For this article we’ll just stick with curl. First, sign in and create a new application called [your_username]_movie. Then, to put our sample data into Orchestrate, run do this:

export OI_TOKEN=<your orchestrate token here>

curl https://api.orchestrate.io/v0/movies/1 \
-X PUT \
-i \
-H 'Content-Type:application/json' \
-u "$OI_TOKEN:" \
-d '{"title": "Tremors","released": "1990","description": "Natives of a small isolated town defend themselves against strange underground creatures which are killing them one by one.", "genres": ["comedy","horror","sci-fi"],"actors": ["Kevin Bacon","Fred Ward","Finn Carter"]}'

curl https://api.orchestrate.io/v0/movies/2 \
-X PUT \
-i \
-H 'Content-Type:application/json' \
-u "$OI_TOKEN:" \
-d '{"title": "A Few Good Men","released": "1992","description": "Neo military lawyer Kaffee defends Marines accused of murder; they contend they were acting under orders.","genres": ["crime","drama","mystery"],"actors": ["Tom Cruise","Jack Nicholson","Kevin Bacon"]}'

curl https://api.orchestrate.io/v0/movies/3 \
-X PUT \
-i \
-H 'Content-Type:application/json' \
-u "$OI_TOKEN:" \
-d '{ "title": "Apollo 13", "released": "1995", "description": "Three astronauts must devise a strategy to return to Earth safely after their spacecraft undergoes massive internal damage.", "genres": [ "adventure", "drama", "history" ], "actors": [ "Tom Hanks", "Bill Paxton", "Kevin Bacon" ] }'

Because Orchestrate incorporates Lucene text search we can now query for every movie featuring Tom Hanks:

curl https://api.orchestrate.io/v0/movies?query=actors:"Tom%20Hanks" \
-X GET \
-i \
-u "$OI_TOKEN:"

Using a similar query you can get a list of movies by genre and programmatically extract the actors.

This simple example of NoSQL with Orchestrate puts us nearly on par with the more complex relational solution. We’re still vulnerable to updating every entry in our app if an actor changes their name and how do we solve that problem of related movies?

Graph

A graph database is another type of NoSQL solution designed to keep track of relationships between objects. Thankfully Orchestrate comes with a graph solution built right in. Let’s update our application data to remove the actors property from the movies document and create a new collection for actors:

export OI_TOKEN=<your orchestrate token here>

curl https://api.orchestrate.io/v0/movies/1 \
-X PUT \
-i \
-H 'Content-Type:application/json' \
-u "$OI_TOKEN:" \
-d '{"title": "Tremors","released": "1990","description": "Natives of a small isolated town defend themselves against strange underground creatures which are killing them one by one.", "genres": ["comedy","horror","sci-fi"]}'

curl https://api.orchestrate.io/v0/movies/2 \
-X PUT \
-i \
-H 'Content-Type:application/json' \
-u "$OI_TOKEN:" \
-d '{"title": "A Few Good Men","released": "1992","description": "Neo military lawyer Kaffee defends Marines accused of murder; they contend they were acting under orders.","genres": ["crime","drama","mystery"]}'

curl https://api.orchestrate.io/v0/movies/3 \
-X PUT \
-i \
-H 'Content-Type:application/json' \
-u "$OI_TOKEN:" \
-d '{ "title": "Apollo 13", "released": "1995", "description": "Three astronauts must devise a strategy to return to Earth safely after their spacecraft undergoes massive internal damage.", "genres": [ "adventure", "drama", "history" ]}'

You’ll notice we used the exact same request. With NoSQL you post the entire document as you want to see it instead of picking and choosing which columns to update.

Next, let’s add our actors:

export OI_TOKEN=<your orchestrate token here>

curl https://api.orchestrate.io/v0/actors/1 \
-X PUT \
-i \
-H 'Content-Type:application/json' \
-u "$OI_TOKEN:" \
-d '{"name": "Kevin Bacon"}'

curl https://api.orchestrate.io/v0/actors/2 \
-X PUT \
-i \
-H 'Content-Type:application/json' \
-u "$OI_TOKEN:" \
-d '{"name": "Fred Ward"}'

curl https://api.orchestrate.io/v0/actors/3 \
-X PUT \
-i \
-H 'Content-Type:application/json' \
-u "$OI_TOKEN:" \
-d '{"name": "Finn Carter"}'

curl https://api.orchestrate.io/v0/actors/4 \
-X PUT \
-i \
-H 'Content-Type:application/json' \
-u "$OI_TOKEN:" \
-d '{"name": "Tom Cruise"}'

curl https://api.orchestrate.io/v0/actors/5 \
-X PUT \
-i \
-H 'Content-Type:application/json' \
-u "$OI_TOKEN:" \
-d '{"name": "Jack Nicholson"}'

curl https://api.orchestrate.io/v0/actors/6 \
-X PUT \
-i \
-H 'Content-Type:application/json' \
-u "$OI_TOKEN:" \
-d '{"name": "Tom Hanks"}'

curl https://api.orchestrate.io/v0/actors/7 \
-X PUT \
-i \
-H 'Content-Type:application/json' \
-u "$OI_TOKEN:" \
-d '{"name": "Bill Paxton"}'

Then, we’ll create relationships between our actors and movies:

curl -i "https://api.orchestrate.io/v0/movies/1/relation/actor/actors/1" -u "$OI_TOKEN:" -X PUT
curl -i "https://api.orchestrate.io/v0/movies/1/relation/actor/actors/2" -u "$OI_TOKEN:" -X PUT
curl -i "https://api.orchestrate.io/v0/movies/1/relation/actor/actors/3" -u "$OI_TOKEN:" -X PUT
curl -i "https://api.orchestrate.io/v0/movies/2/relation/actor/actors/1" -u "$OI_TOKEN:" -X PUT
curl -i "https://api.orchestrate.io/v0/movies/2/relation/actor/actors/4" -u "$OI_TOKEN:" -X PUT
curl -i "https://api.orchestrate.io/v0/movies/2/relation/actor/actors/5" -u "$OI_TOKEN:" -X PUT
curl -i "https://api.orchestrate.io/v0/movies/3/relation/actor/actors/1" -u "$OI_TOKEN:" -X PUT
curl -i "https://api.orchestrate.io/v0/movies/3/relation/actor/actors/6" -u "$OI_TOKEN:" -X PUT
curl -i "https://api.orchestrate.io/v0/movies/3/relation/actor/actors/7" -u "$OI_TOKEN:" -X PUT

With that done we’ve created a link between movies, and actors. You can get a list of actors for a movie like this:

curl -i "https://api.orchestrate.io/v0/movies/1/relations/actor" -u "$OI_TOKEN:"

To get the reverse relationship (actor to movies) you would need to create another relationship. A similar process can be used for tags.

Conclusion

Using Orchestrate’s all-in-one NoSQL solution we were able to recreate our relational example with fewer collections. In fact, because of the graph API we can create deep relationships between any of the objects in our application.

One of the major differences between NoSQL solutions and relational is you’re moving the burden of work from the database to the application. The applications data models perform validation, manage relationships between objects, enforce constraints and can add and remove properties/columns; no more CREATE statements, or juggling ALTERs. With this shift in responsibility, it’s up to the developer to think ahead and understand their data.

NoSQL isn’t the solution to every problem, but I’m finding it very easy to work with and keeping it in mind when I start my new projects.