We at CenturyLink Cloud never like to see fellow cloud providers experience downtime as it hurts the reputation of our industry. Following an outage by a couple major cloud providers last week, many pundits came out of the woodwork to scold the customers of these cloud services that experienced corresponding downtime. Why? It’s become “common knowledge” that if a user of cloud services experiences downtime, then they haven’t properly architected their apps for the cloud. I wonder why we assume that every business has the engineering prowess of cloud pioneers like Netflix. Cloud users are rightly encouraged to build and deploy distributed applications that can withstand the failure of any component(s), but the reality is that this doesn’t always happen because of one or more of these reasons:
They Don't Know Any Better
While many of us have spent years in the cloud, it’s easy to forget that this is an entirely new domain for the vast majority of enterprise customers. To be sure, principles of good architecture and highly available systems have been around for decades, but we recognize that cloud computing introduces its own wrinkles to those existing patterns. It’s up to all of those in the industry to help educate others on the right architecture, tools, and infrastructure that are needed to build truly cloud-scale applications.
Guess what? Very few organizations have the in-house architects, developers and operations pros to plan and build multi-tier, globally distributed cloud applications. Doing this requires advanced knowledge of modern web technologies, database repositories, storage systems, and networking configuration. So in some cases, this awesome rush to the cloud has left organizations without the technologists they need to build highly available, scalable cloud apps.
Also, enterprise IT shops have data centers full of modern and legacy commercial-off-the-shelf (COTS) software that is not built for the cloud. While nearly any credible COTS product has a reference architecture for a highly available deployment, we see plenty of such products that (a) still have single points of failure, (b) only operate efficiently when housed physically together in the same data center, and (c) have complex disaster recovery procedures that don’t easily support an instant failover. These cloud customers may simply not be able to refactor their existing systems to survive an outage in the data center that hosts it.
It’s often said that cloud customers can get whatever availability they want to pay for. That is almost certainly true, but it brings to the surface a point that often seems lost on those who batter those businesses that go offline in an outage: a comprehensive DR plan isn’t cheap.
Some businesses go offline during a cloud outage because they’ve made the conscious choice to run that risk. Running a hot backup that is a complete mirror of production requires constant synchronization at (often) double the overall cost. Many organizations choose to incur this cost because uptime is their top priority, while other businesses accept downtime as an occasional fact of life. Just because someone actively chooses to save money and tolerate downtime doesn’t mean that they don’t “get the cloud.”
While CenturyLink Cloud has strong SLAs based on the reliability of our platform, we can’t protect users against their own design decisions. But we can abstract a lot of the complexity away from the customer, so that many architecture best practices “come for free” with our platform. We have made a strategic choice to engineer a platform that makes life a little easier for enterprises that don’t have the resources or types of applications that are a perfect fit for cloud computing.
How do we help organizations that may not have the personnel skills or types of applications that are cloud-ready?
- We run on enterprise-class hardware. While most cloud vendors freely advertise that they run commodity hardware that may fail unexpectedly, CenturyLink Cloud has invested in powerful hardware at each layer of our stack. While no infrastructure is infallible and failures WILL happen, our infrastructure investment have proven to give our customers a more reliable experience. This is especially true for their applications that cannot scale on dozens of cheap commodity servers.
- Services like load balancing are built-in at no additional cost. We strive to make it as easy as possible for enterprises to avoid single points of failure, and redundancy is pervasive in the CenturyLink Cloud architecture. We surface some of these capabilities up to our customers, including free access to our load balancing software. This makes it simpler to design and deploy highly available web software.
- Customers get built-in backup and recovery services for virtual machines at no additional cost. Every CenturyLink Cloud customer gets VM-level snapshots taken automatically on a daily basis and stored for up to 14 days. The snapshots for any given data center are stored in an alternate data center to ensure that customers can quickly stand up a new environment that mirrors the previous one (if one didn’t exist already). While customers can lose up to a day’s worth of data by relying solely on our automated snapshots, CenturyLink Cloud still provides a level of protection against unexpected failures.
Organizations building cloud apps should demand that developers carefully design fault-tolerant software that can take advantage of the scale and distributed nature of the cloud. Likewise, you need an operations staff with the automation in place to regularly test their cloud infrastructure and be able to quickly recover from failures. But CenturyLink Cloud thinks it should be radically easier to reduce your risk when hiccups to occur. We’re not at a point where every enterprise has such capabilities, and CenturyLink Cloud is here to make that transition to cloud software easier.