So far in this series of articles, we’ve looked at how a software provider can deliver their product in a Software-as-a-Service (SaaS) manner using the CenturyLink Cloud Enterprise Cloud Platform. While provisioning and deployment of solutions is an exciting topic, the majority of an application’s life will be spent in maintenance mode. In this article, we will look at how a CenturyLink Cloud cloud user can efficiently manage and monitor their SaaS environment.

Defining Customer Capacity Thresholds

You may recall from the last article that our fictitious SaaS application is targeted at candidates for political office. In this scenario, the application developer chose to create individual pods of servers for each customer instead of co-locating the customers on the same application or database server.

Each of the pods of servers go into a CenturyLink Cloud Group which creates a logical segmentation of servers. Each Group can have its own permissions, maintenance schedule, performance monitors and much more. From the CenturyLink Cloud Control Portal, we can browse the individual server groups and have at-a-glance visibility into the resources being used by each server.

In an upcoming article we will look at how to allow SaaS customers to increase server resources to handle greater loads, but what if the SaaS provider wanted to limit how much capacity each customer could consume? Each Group has settings that control its capacity thresholds and software providers, or customers, can use these settings to ensure that they don’t provision more resources than they are willing to pay for. The Group Capacity settings, which are inherited from the parent group by default, define the maximum CPU, memory and storage for a given group.

Depending on whether the SaaS provider is passing resource utilization costs directly to their customers (or simply charging a single recurring fee for the software), it can be very useful to put these governor limits in place to prevent consumption from growing too rapidly.

Monitoring Performance and Scaling Environments

While we hope to initially provision servers that can withstand whatever load we send its way, there often comes a time to reassess the resources that have been assigned to the application. The CenturyLink Cloud Enterprise Cloud Platform offers a range of performance monitors that paint a picture of a server’s health. Monitors can be set at both the Group and individual server level. Let us consider our scenario and assume that our web servers should not exceed 90% CPU for a long duration. From the Control Portal, our SaaS administrator can view the monitors for an individual web server and override the inherited settings that come from the group.

The administrator can then change the alert threshold to 90%, and, define which user in the account will receive the email notification when the threshold is exceeded for an extended duration. While alert messages are used for proactive notification, an administrator can also view reports that track usage compared to monitor thresholds. The Reports view is available at both the Group and server level. Here, users get a visual representation of their server performance and can observe historical activity for up to one year.

The SaaS administrator observes a sustained spike in usage and needs to expand the application footprint, they have multiple options. First, there are two manual ways to increase capacity. An administrator has the option of scaling horizontally by adding new servers to the Group. This can be done via the typical server provisioning process or through an existing blueprint. The other manual option involves scaling the server vertically by expanding the server’s available resources. For some operating systems, like Windows Server 2008 R2, this capacity adjustment will occur without taking the server offline.

Both of these manual options are available to administrators and can relieve the pressure from an overtaxed application. However, if we assume that this SaaS application becomes extremely popular, our SaaS administrator will become overtaxed in their attempt to monitor and scale each customer’s environment! That sort of model would be unsustainable. Fortunately, CenturyLink Cloud will soon be releasing an intelligent Auto Scale capability which can do both real-time and predictive scaling of the servers in a Group. The real-time scaling occurs when usage load exceeds a particular threshold for a sustained period. More capacity is automatically added to the server and the customer is notified of this change. The predictive scaling uses historical trends to foresee potential problems and scale the server before an overload actually occurs. In all cases, the SaaS administrator has full control over how scaling will occur and can prevent runaway provisioning by setting the maximum (and minimum) number of servers that a group can have.

How does it work? In the case of group or application scaling, the administrator sets preferences for when scaling occurs (i.e. the “aggression level”) and the preferred method (i.e. “horizontal” or “vertical”). For horizontal scaling, the administrator provides a template that defines the type of server to add to the cluster.

Individual servers can also have their own Auto Scale policies that define resource ranges and approved maintenance windows for applying the new resource allocations.

If a SaaS provider decides to use a limited multi-tenancy model and dedicate environments to each customer, then they absolutely need a way to create a self-healing environment that can ebb and flow automatically based on utilization.

Taking Server Snapshots

Server snapshots provide a way to capture the current state (i.e. memory, disk content) of a server and have it available for restoration in the case of failed upgrades or corrupted configurations. CenturyLink Cloud offers this capability for application owners who want a way to perform configuration changes with the confidence that they can quickly revert back to a particular point in time if necessary. To be sure, snapshots are NOT a substitute for a backup and shouldn’t be used as such. Rather, they are best for short term rollback of changes to a server.

If our SaaS provider decides to try upgrading a server to the latest version of the web application, they could choose to first snapshot the web server in case the update fails. Snapshots may be applied at the individual server level.

However, it is also useful to be able to perform this action in bulk from the Group’s perspective.

Once a snapshot is created and saved, an administrator can restore to a snapshot via the CenturyLink Cloud API or by opening a ticket with our expert Network Operations Center (NOC). These capabilities help application owners safely and consistently apply changes to their environments without risk of causing irreversible damage.

If you would like to understand more about how snapshots work, please take a look at this overview from VMware.

Performing Data Backups

One of the biggest fears of an application owner is experiencing a major crash and realizing that critical data is lost and unrecoverable. Given that in our scenario, each customer has their own environment, the need for a consistent and comprehensive backup strategy is even more imperative.

All servers in the CenturyLink Cloud Enterprise Cloud platform automatically undergo a nightly backup procedure. These backups contain the full state and data of the server. Depending on what service a customer has requested, there can be a rolling fourteen day backup. All of this ensures that customers can rest easy knowing that they only face a minimal data loss in the case of a catastrophic event.

For many modern web applications, the applications servers themselves don’t maintain much (any?) state and can be added and taken offline with little impact. The real heart of most applications is the data repository. If a CenturyLink Cloud customer wants to do database-level backups, they could choose to manually log into the database server and backup the critical data. However, that is extremely time consuming and inefficient. Customers can have custom database backups configured for them by the CenturyLink Cloud NOC, or, create instantiate backups through the robust CenturyLink Cloud web API.

Summary

While operational support is not the most thrilling aspect of the application lifecycle, it is most certainly the longest! If your cloud hosting provider cannot provide a strong set of automated management controls, then administration of a SaaS application will be an overwhelming and never-ending set of tasks. The CenturyLink Cloud cloud includes an ever-growing set of self-service mechanisms for governing, monitoring, scaling and providing durability for your applications. Next up, we will take a look at how to use the CenturyLink Cloud API to build an application management application for SaaS customers.