While discussing some operational topics with a friend recently, one subject struck up particular interest – how do you manage to perform seamless restarts of Unicorn under Upstart? He thought the answer was novel and recommended that I throw it up somewhere on the internet for others to see.

For some background, Orchestrate uses Ubuntu on our servers, and therefore naturally fell into using Upstart for process management. Upstart is, until the next release at least, the core process manager and initd replacement for Ubuntu. One of our applications is a Ruby on Rails app that runs under Unicorn. Unicorn is a http server for rails applications that manages a pool of backend rails processes.

Upstart and Unicorn work with different models for process management. Upstart starts a process, and then watches that pid to see when a given service has terminated. If the process dies, then it will automatically restart the process to ensure that it is running. Unicorn can do seamless restarts by starting a new master and then restarting each worker one by one, eventually terminating the old master. When Unicorn kills the old master, Upstart thinks the whole application is dead, so it attempts to restart. Since it can’t listen to the port it crashes, which then prompts Upstart to try again, etc.

The easiest solution to this is to use a shell process as the Upstart job, which in turn starts the dashboard job. This is not complete though, as the restart returns control back to the shell process after the first seamless reload. Typically, the solution at this point is to watch the pid file and ensure that the expected process still exists in a loop with a sleep. This is problematic as it introduces latency to Upstarts ability to detect a failure and correct it. It also introduces the need to spend CPU cycles executing a process and checking the results.

A better solution is to use file locking in order to be notified when all the Unicorn processes have terminated. This is cheap, requires no polling, and adds no latency via a sleep. In our unicorn.conf.rb file we added the following block to the before_fork block:

before_fork do |server, worker|
  # Attempt to open and hold a file open, locked in read only mode. This file
  # will remain locked so long as the master or children are running which
  # allows upstart to detect when the process exits.
  f = File.open("#{server.config[:pid]}.lock", 'w')
  exit unless f.flock(File::LOCK_SH)
end

In our upstart script we attempt to obtain a file lock on the lock file. The running unicorn processes maintain a lock on the file, so until all of the unicorns have exited that call will block. This is a very cheap solution which eliminates polling in the upstart script all together.

script
    bundle exec unicorn -c unicorn.conf.rb -E production
    flock -x 0 < /path/to/pidfile.lock
end script