The risk of a Cloud shutdown
October 31, 2021
I often see people and companies moving their workloads to the cloud. Speaking with them, they explain that the cloud is cheaper, more flexible, and more reliable than their current infrastructure. To further increase investment return, they often target a specific (single) cloud to reduce management costs and complexity.
By itself, this trend seems a very reasonable one. The risk is that, sometimes, people do not consider the less immediate risks around this move. There are many of those risks, and if there is an appetite for it, I’ll be talking further about the other dangers, but now I would like to focus on a specific one: the risk of a shutdown.
I define the “risk of a Cloud shutdown” as the potential risk that all resources of a single customer go offline. Major events such as data center going down or a global BGP failure can create downtime, but we are not talking about those since those would affect much more than a single customer.
The first reason that a Cloud shutdown can occur is if some customer’s automation goes wrong. If an automation script goes wrong, it can altogether remove every item in the Cloud account. More traditional environments are very automated as well, and, in a way, there is a similar risk. Still, the level is much lower since you can not get rid of physical servers with an automation script. At the same time, it is not that hard to altogether remove all resources (including backups) from a cloud environment. It is possible to use multiple accounts with different credentials for different infrastructure sections to mitigate this risk. As a result, if an automation issue arises, only one section is involved. A typical example would be to have one account for the active infrastructure and one for the backups so that at least one of the two would survive such an event.
Another big reason a Cloud shutdown can occur is if the Cloud provider decides that the user is doing something nefarious. Traditional infrastructures required datacenters, or colocations, renting servers, which required very long and complex contracts and months of dialogue. The cloud simplified the whole process. Now you can go on the portal of the Public Cloud of choice, create an account giving just some basic invoicing information and the credit card information, and you are ready to go. This change has been a considerable improvement from one point of view. Still, it also created a potential issue: the Cloud providers have no direct knowledge of their customers and customers' business. The cloud providers have identified that this situation might be problematic; they have all started to offer support managers or similar connecting people. These services do not entirely solve the problem since, usually, these services carry a price tag and are not available or prohibitively expensive for relatively small customers. It is possible to use multiple clouds to mitigate the risk so that the services can still be provided from another cloud if one blocks the customer’s account.
The third and last reason analyzed in this post is if there are payment issues. Going back to the previous point, what happens if your payment slips since the cloud provider know very little of you? You should always know when your cloud provider of choice will bill you, and you should always promptly check that everything went smoothly, but sometimes it happens that the payment does not go through as expected. Nowadays, clouds are a little bit more flexible on this point if you have an outstanding payments history but could still be critical if your organization is disorganized.
Overall, I would suggest always having your workloads running on multiple clouds in a way that guarantees that all your provided services can stay up and running even if one of your clouds goes offline. If you couple this to properly segmented automation and period bills check, you will risk way less on those fronts.