I love CouchDB. It’s a NoSQL database that, like Orchestrate, lets you work with your data over HTTP. Many data sets, like the entirety of NPM‘s metadata and every law in Massachusetts, are stored in CouchDB. With the release of orchestrate-couchdb, you can now replicate CouchDB databases into Orchestrate collections. Why would you do that? Why, maybe you want to take advantage of Orchestrate’s Full-Text Search and Graph queries, which automatically index data for querying. No need to manually write indexes. Let’s see it in action.
Adding comprehensive full-text search to bodies of law enables legal firms and advocacy groups alike to effortlessly uncover information critical to campaigns and clients. For language nerds like me, we can map the evolution of different legal terms over time, witnessing the birth of new concepts as they’re incorporated into law. Today, we’ll just be importing the data itself. (Tomorrow, the world!)
To start, use NPM to install
sudo npm install -g orchestrate-couchdb
Then, we’ll add our settings:
export ORCHESTRATE_API_KEY=YOUR_ORCHESTRATE_API_KEY export COUCHDB_URL="http://macode.org" export COUCHDB_DATABASE=api
Then, start the daemon!
…That’s it! You’re now importing all of Massachusett’s laws into Orchestrate. :D
Since it’ll take a while to import all of Massachusett’s laws, let’s talk about how the importer works.
How It Works
CouchDB exposes a feed of every change that’s occurred to each database. Using follow, we just replay those changes onto Orchestrate, so we get both the current state of every document and its version history as
refs. Since follow works hard to never die, our daemon can continue to follow changes forever. It doesn’t stop when it runs out of changes to process; it just waits for more. This lets you stay up to date with a CouchDB dataset that’s still accepting reads and writes.
The worker I demonstrated above runs on your local machine, but say you want to offload it to another machine so you don’t have to babysit it. So, let’s deploy our worker to Heroku. For this, we’ll need to download the source code:
git clone [email protected]:orchestrate-io/orchestrate-couchdb.git cd orchestrate-couchdb heroku create
Then, let’s set our environment variables on Heroku:
heroku config:set ORCHESTRATE_API_KEY=YOUR_ORCHESTRATE_API_KE heroku config:set COUCHDB_URL="http://macode.org" heroku config:set COUCHDB_DATABASE=api
Now, let’s push the project to Heroku, and spin up a Dyno for it:
git push heroku master heroku ps:scale worker=1 web=0
Bam! Your worker is now importing Massachusetts law from Heroku. Any time our source CouchDB adds more documents, our worker will sync them to Orchestrate. Easy, huh?
This dataset was, in many ways, small fries. The documents we’re importing from CouchDB have a consistent schema, and there are only 25k of them. Next time, we’ll be importing the entirety of NPM’s metadata, and how to handle when your data targets don’t have consistent schemas.
Lastly, much love to Calvin Metcalf for making Massachusetts law available in a CouchDB database. CouchDB is an amazing piece of tech, and its developer community is even lovelier.