Businesses and their employees demand seamless IT experiences – the ability to have as close to 100% availability as possible of key systems and applications. Such availability is absolutely crucial for enterprises today, which count on IT to be both an enabler of worker productivity and a competitive differentiation to set a business apart from its rivals.
Yet everyday IT management issues make such availability hard to achieve – upgrades, overloads, hardware and software issues, system crashes and other common hitches make some downtime a reality. The key is to minimize it as much as possible. For instance, according to vendor CloudEndure’s annual disaster recovery survey, 77% of companies have a goal of 99.9% availability – a high standard. Yet 57% of respondents had at least one outage in the past three months, with a typical cost to the organization of more than $10,000 per day or higher.
What can enterprises do to avoid unplanned downtime? One route more businesses are employing is to take advantage of the cloud and the concepts of on-demand, streaming IT to help reach their IT availability goals. While cloud availability stats can be hard to come by, the Gartner-owned CloudHarmony tracks outages at the major public cloud providers and found high availability at most providers – led by Amazon Web services with just 2 hours and 30 minutes of downtime in 2015.
While the very nature of the cloud itself – and the strong incentive for cloud providers to delivery strong service quality – can generally guarantee high levels of availability, there are additional steps IT admins can take to maximize the uptime of their cloud-delivered systems and applications.
The first key to attaining high availability in the cloud is understanding exactly what that means. IT departments always strive for the highest levels of system and application availability – it’s the reason they exist in the first place. In traditional IT environments, however, attaining high levels of availability can be challenging and costly – the answer to dealing with downtime is often to throw more hardware and software (i.e., money) and people (i.e., resources) at the challenge.
A managed cloud environment can share those costs and resources across an entire client base and maintain a keen focus on keeping systems up and running. Indeed, for a cloud service to attain true high availability, it has to not only be always-on, but guarantee that via a service level agreement (SLA), typically at a multiple-nines level – i.e., 99.99% up time, or about 4 minutes a month or less than one hour per year. Not every business and application demands – nor can afford – such stringent uptime requirements, but working out a satisfactory SLA with your cloud service provider is absolutely crucial, as is redress in case commitments are not met. But don’t count on your SLA alone.
Remember that $10,000 per day cost-per-outage figure? Credits for cloud outages may not be enough to cover the true cost to the business. Even in the cloud, IT must take responsibility for availability into their own hands.
One of the great advantages of the cloud is the ability to more dynamically distribute and manage workloads across servers, distinct data centers (both on premise and in the cloud) and even among different cloud providers. Replicated servers, fail-over clusters and even redundant energy sources are all crucial to maintaining high system availability – and much cheaper to achieve and simpler to manage in a cloud environment.
Indeed, that sort of bulletproof infrastructure was traditionally only available to the very largest enterprises; with the cloud, accessing such high-availability infrastructure is just a button-push away, with much of the IT grunt work of managing system up time left to the cloud service provider. Such cloud platforms not only deliver built-in redundancy and replication, but have additional hardware and software resources standing by at the ready to be dynamically and elastically spun up to meet new IT demands.
While the cloud itself can help address many availability challenges, the cloud and the systems and applications housed there don’t “run themselves.” IT departments have an active and crucial role to play in maintaining cloud availability. Indeed, one of the most important, most long-standing practices for maintaining high availability and ensuring rapid disaster recovery is to simply know that outages are going to happen – and to have a firm plan to anticipate downtime and automate the process of getting back online.
Applications must be designed in a stateless fashion to handle the inevitable server crashes, reboots or resets. They must also be designed to be able to seamlessly scale to peak usage demands. More significant downtime (e.g., as caused by power or network outages) must be planned for, with servers and data replicated across different locations or zones (and if possible, even across different managed service providers – a process made much easier when leveraging cloud infrastructure).
Managing infrastructure distributed across multiple data centers presents its own challenges, including managing DNS (domain name service) resources to dynamically route and balance traffic across multiple physical locations – and away from trouble when necessary. IT admins should take advantage of increasingly sophisticated cloud management systems that make it increasingly easy to visualize and design a high-availability infrastructure, as well as provide tools to test and monitor a set-up and automate crucial back-up and recovery processes in case of inevitable – even in the cloud – system downtime.
A well-constructed availability and recovery strategy not only anticipates unplanned failures but also accounts for and aims to minimize routine disruptions, such as for system upgrades or patches.
Improved availability and decreased downtime and outages are one of the biggest benefits of moving to cloud-based infrastructure and applications. The inherent elastic and managed nature of the cloud can immediately solve many of the headaches of guaranteeing availability, while cloud-savvy system and application design can improve up time odds even further. Availability has been – and always will be – a cost and resource issue, with an infinite budget and endless up time becoming less of a worry. Managed cloud services and savvy cloud management best practices bring those advantages to more enterprises than ever.