There is estimated to be over 295 exabytes of data stored in the world. That’s 295 billion gigabytes. Creating relationships between pieces of data is needed, so that we can create order out of the chaos. There are some standalone services you can use to graph relationships in your data, but Orchestrate, a database-as-a-service (DBaaS) provider, powers graph queries alongside full-text search, events, and key/value storage. Orchestrate’s Graph API focuses on simplicity and ease of use.

Let me show you.

Lets say you have a collection called “movies” and a collection called “users” and you wanted to say that user “bob” has “watched” the movie “The Amazing Spiderman”. Using orchestrate.js, creating that relationship looks like this:

var db = require(“orchestrate”)(token);
 db.newGraphBuilder()
    .create()
    .from(“users”, “bob”)
    .related(“watched”)
    .to(“movies”,”The Amazing Spiderman”);

Then the relationship between the user bob and the movie The Amazing Spiderman has been created. Now if you wanted to say that the movie was watched by Bob, you would just switch the from and to parameters:

db.newGraphBuilder()
    .create()
    .from((“movies”,”The Amazing Spiderman”);
    .related(“watched_by”)
    .to(“users”, “bob”);

So now we have info linked to each other. This is only good if you can access it, so how do we do that? Like this:

db.newGraphReader()
    .get()
    .from(“users”, “bob”)
    .related(‘watched’)
    .then(function (results){
      console.log(results);
    });

That is the simplicity of graphing using Orchestrate. Now lets take a look at graphing relatonships in the Enron emails. So with all of this information how could we organize it? The answer is simple: We graph who sent the emails and who received them. While importing the data, I mapped the email addresses together. Let’s walk through how.

First off, what’s in our data? Each object looks something like this:

{
   "Content-Transfer-Encoding":"7bit",
   "From":"[email protected]",
   "X-Folder":"\\Beck, Sally\\Beck, Sally\\Inbox",
   "Cc":[
      "[email protected]"
   ],
   "X-bcc":"",
   "X-Origin":"BECK-S",
   "Bcc":[
      "[email protected]"
   ],
   "X-cc":"Abel, Chris\r\n </O=ENRON/OU=NA/CN=RECIPIENTS/CN=NOTESADDR/CN=47C1B49-A28949D7-8625676F-7566DE>",
   "To":[
      "[email protected]",
      "[email protected]",
      "[email protected]",
      "[email protected]",
      "[email protected]",
      "[email protected]",
      "[email protected]",
      "[email protected]",
      "[email protected]",
      "[email protected]"
   ],
   "parts":[
      {
         "content":"Attached is the ERMS Discounting Analysis as of May 25, 2001.\r\n\r\nPlease call Chris Abel at X33102 or Anita Luong at X-36753 if you have questions.\r\n\r\nThanks\r\n\r\n \r\n\r\n",
         "contentType":"text/plain"
      }
   ],
   "X-FileName":"Beck, Sally.pst",
   "Mime-Version":"1.0",
   "X-From":"Luong, Anita\r\n </O=ENRON/OU=NA/CN=RECIPIENTS/CN=NOTESADDR/CN=93DC14F0-DCA47D04-862564CD-6EEA4D>",
   "Date":{
      "$date":991409909000
   },
   "X-To":"Beck, Sally </O=ENRON/OU=NA/CN=RECIPIENTS/CN=SBECK>, Earnest,\r\n Scott </O=ENRON/OU=NA/CN=RECIPIENTS/CN=SEARNES>, Fondren,\r\n Mark </O=ENRON/OU=NA/CN=RECIPIENTS/CN=MFONDRE>, Hickerson,\r\n Gary </O=ENRON/OU=NA/CN=RECIPIENTS/CN=GHICKER>, Hodges,\r\n Georganne </O=ENRON/OU=NA/CN=RECIPIENTS/CN=GHODGES>, Glover, Sheila\r\n </O=ENRON/OU=NA/CN=RECIPIENTS/CN=NOTESADDR/CN=B82501A-33153D8A-86256499-6C6463>,\r\n Loibl, Kori </O=ENRON/OU=NA/CN=RECIPIENTS/CN=KLOIBL>, Ruffer, Mary Lynne\r\n </O=ENRON/OU=NA/CN=RECIPIENTS/CN=NOTESADDR/CN=4CA96A86-FF0622D1-862569D1-7961C3>,\r\n Carrington, Clara </O=ENRON/OU=NA/CN=RECIPIENTS/CN=CCARRI1>, Prejean,\r\n Frank </O=ENRON/OU=NA/CN=RECIPIENTS/CN=FPREJEA>",
   "Message-ID":"<[email protected]>",
   "Content-Type":"text/plain; charset=us-ascii",
   "Subject":"ERMS Discount Memo as of May 25, 2001"
}

Since the “To”, “Cc”, and “Bcc” fields are all arrays, then to relate them, we’ll just loop over each and create relationships between the sender and the recipient. We’ll use the kew library to manage the promises that orchestrate.js returns:

var kew = require('kew');
var fs = require('fs');

var promises = [];
var data = fs.readFileSync("./enron2001-06.json", "utf8");
data.split("\n").forEach(function (item, index) {
  if (item !== "") {
    var item = JSON.parse(item);
    var from = item.from;
    var to = {};

    ['Bcc', 'To', 'Cc'].forEach(function (field) {
      item[field].forEach(function (name) {
        to[name] = true;
      });
    });

    Object.keys(to).forEach(function (name) {
      var promise = db.newGraphBuilder()
        .create()
        .from('enron-email-address', from)
        .related('sent_email_to')
        .to('enron-email-address', name);
      promises.push(promise);
    });
  }
});

kew.all(promises)
.then(function (results) {
  // everything worked!
})
.fail(function (err) {
  // something went wrong!
});

After you have run this and waited for and survived the zombie apocalypse, you now have information stored and graphed in Orchestrate. Now we can create some express routes to make this data usable. Lets make a route at “/emails” which will return the response from Orchestrate.

Here is what that looks like, and I should note I am only responding with 50 emails for demonstration purposes. I named my collection:

router.get('/emails', function (req, res){
  db.newSearchBuilder()
    .collection('enron-email-address')
    .limit(50)
    .offset(0)
    .query(“*”)
    .then(function (result) {
      res.json(result.body)
    })
});

Before we hook the front end into the the route, let’s setup the route that will fetch the emails sent to a given email address. I called it “/sent_to”. Now I decided to pass the email address via a query string, but you could add it to the route like so: “/sent_to/:email”

The next thing is using orchestrate.js to fetch the graphing results from our collection ‘enron-email-address’ with the provided email. Now, I used a visualization library so you will see that I added a field to the response body to make sure I had the information I needed.

router.get('/sent_to', function (req, res){
  db.newGraphReader()
    .get()
    .from('enron-email-address', req.param("email"))
    .related('sent_email_to')
    .then(function (result){
      result.body.emailFrom = req.param("email");
      res.json(result.body);
    }).fail(function (err){
      res.json({results: []});
    });
});

Now that all of that information is being sent to my requests, our application’s front end will handle the rest. You can check out this data, visualized for your viewing pleasure, here. Enjoy!