In our last post from this series, we covered why we chose ASP.NET 5 for Tinman, the core application managing CenturyLink's Bare Metal servers. In this post, we'll share how we failed to realize a wild dream: running a .NET web application inside Docker.
The Beginning: Minds. Blown.
Like any good story of failure, this one starts with excitement and optimism. We started the Bare Metal effort, created our new ASP.NET 5 project, and started having crazy thoughts.
Up until this point, .NET development was tightly coupled to Windows. If you were writing .NET code, you were running it on Windows. But things were different now. With the new DNX runtime, Microsoft demonstrated a serious commitment to cross-platform .NET. Windows was no longer the only option for .NET applications.
This was music to our ears. We prefer to run services on Linux. We also like C#. Linux and C# were no longer mutually exclusive - we could have both! We started asking questions that previously weren't valid. What if we ran Tinman on Linux? And even crazier, what if we ran it inside a Docker container? Great ops and C#? Does not compute!
So we tried it. At the time we were using beta3, so Mono was the only option for Linux. We looked around online and found Mark Rendle's precursor to the Docker files listed at Microsoft's aspnet Docker repo. We built our own container on Mark's file and in a few days had the Tinman API running in Docker and servicing requests.
We were pretty honking excited. This was a brave new world!
The Middle: Something's Afoot
The first few weeks with Docker seemed great. We enjoyed how easy Docker made deployments. We liked writing C#. We implemented zero-downtime deployments using NGINX + Docker. All appeared to be well. Except for one thing.
Sometimes, albeit infrequently, Tinman would die. When infrequent and difficult-to-troubleshoot problems happen, the temptation is to chalk it up to random events and move on. We unfortunately succumbed to that temptation for a little while, but eventually saw the seriousness of this problem. After figuring out how to debug logs in Docker containers (a somewhat frustrating side-effect of working in Docker), we discovered Mono was crashing. Worse, there was no managed stack trace for the problem. It looked something like this:
Stacktrace: Native stacktrace: mono() [0x4b23dc] mono() [0x508a0e] mono() [0x428fad] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f2487658340] [0x7f2484155b30] Debug info from gdb: ================================================================= Got a SIGSEGV while executing native code. This usually indicates a fatal error in the mono runtime or one of the native libraries used by your application. ================================================================= Aborted (core dumped)
This looked serious. A managed stack trace would mean that Tinman, or even something in the .NET framework, was crashing. But this was lower-level. Mono's guts were crashing.
The End: Adios, Mono
Our next step was to reproduce the problem. Our working theory was that HAPROXY's regular health checks against the API were overloading Mono. This theory seemed absurd because the health checks generate a negligible load, but we didn't have any better ideas. So we pointed Apache Benchmark at Tinman using a tiny load of 5 parallel requests a second. The results were disheartening.
Average response times for a light-weight health check were between 500ms and 1000ms. After just a short time running the tool, Tinman would die. We could now consistently kill Tinman by running Apache Benchmark with a tiny amount of load. The inevitable question came up: should we move back to Windows?
To answer that question, we deployed Tinman to a Windows machine and ran the same benchmarks. It wasn't even a contest. We quadrupled the load and saw requests returned in half the time. We ran it for a long time with no hint of crashing. The decision was obvious: Tinman should run on Windows.
We got spoiled by our zero-downtime deployment running on Linux, so we used Ansible to implement zero-downtime deployments on Windows. Our next post will cover how we accomplished this.
Epliogue: Will We Try Again?
A question we still ask is whether we'll try Linux again. We eventually want to run Tinman on Linux, but we have a few conditions. We won't run on Mono given its track record. We will wait for Microsoft's production-ready cross-platform CoreCLR (Release Candidate set for November 2015). Since we will only run on CoreCLR, we also need our 3rd-party libraries to run on CoreCLR. This will take time, so we will be running on Windows for at least another six months.
Dreams are Delayed, but Never Dashed!