Editor’s Note: We’re delighted to feature a guest author on Cloud Wars today. Steve Schechter, an IT Director based in Hong Kong, shares the second in his 2-part series on taming runaway cloud costs. You can read Part 1 here.
How much are you spending on cloud every month?
Is it within the budget you set?
Are you able to understand the invoices you receive from your cloud providers? And reliably predict what you’ll spend in the future?
It’s easy enough to answer the first question. But for many companies today, the answer to the others is probably, “No.”
In my previous article, Taming Runaway Cloud Expenses: How I Cut Cloud-Hosting Costs by 32%, I shared the specific steps I took to cut one client’s cloud hosting bill by almost one-third. But every company is different and what worked for one company may not work for another. In order to help you understand how to keep your cloud bill under control, it’s important to know the reasons why things get out of hand in the first place.
Too many companies today are unaware that public cloud requires a different style of governance than for traditional infrastructure. The lack of proper cloud governance leads to results that are the stuff of nightmares.
More specifically, industry analysts estimate that of the $40 billion spent on public cloud IaaS in 2019, 30% or more is wasted. Gartner estimates that public cloud bills are often two to three times higher than expected and that as many as 80% of companies surveyed report that they consistently go over budget on their IaaS spend.
The situation is bad. Yet the concept of “cloud governance” isn’t receiving the attention it deserves. Plenty of attention is given to cloud architecture, cloud migration, cloud security—but cloud governance is rarely mentioned at all. The last cloud conference I attended had dozens of break-out sessions covering popular topics like security, hybrid cloud and multi-cloud; and just one poorly attended 45-minute talk on governance.
1. The Number-One Reason Your Cloud Spending Is Out of Control
The main reason that cloud spending gets out of control is the management of resources and approvals. It’s a matter of capital expense (CapEx) versus operating expense (OpEx).
Think about how you manage your traditional data-center resources. Buying or leasing almost anything for your data center or colocation service represents a commitment of thousands of dollars—often many, many thousands of dollars. These are generally viewed as capital expenses and they usually require several layers of approvals. The workflow probably looks like this:
- Someone needs a new server
- They contact procurement to get a quote from the vendor
- They submit a purchase request for approval
- Depending upon the amount, several levels of approvals might be needed
- Once all approvals are obtained, get a purchase order and send to the vendor
But in the brave new world of public cloud, a compute resource costs just pennies per hour. Pay as you go, right? Just a few dollars here, a few pennies there. So it’s not a capital expense, it’s an operating expense (OpEx). So the workflow might look more like this:
- Someone thinks they need a new server
- They log into the cloud portal, click a few boxes and the server is up and running
However, along with that compute resource they’ve also added some disk space, snapshots and backups, perhaps replication for disaster-recovery purposes, perhaps several servers with load balancing, a new Vnet or subnets, new rules for a web application firewall and so on. Before you know it, this person has created cloud resources that will cost thousands of dollars per year… all without obtaining any approvals or sign-offs and almost definitely without notifying the CTO or CFO about this new financial commitment.
How Cloud Governance Fixes This
Cloud governance is a recent paradigm that recognizes that traditional infrastructure governance doesn’t work for public cloud. It’s a framework that can cover cost optimization, roles and responsibilities, resiliency and possibly security and compliance.
This framework generally strikes a balance between fiscal responsibility and agile innovation. It should recognize that decisions and approvals may no longer be centralized. It may include:
- A new change-management process
- Segmentation of roles and responsibilities
- A global resource-tagging scheme
- A “book” of resource types that can be used
To be effective, cloud governance is best managed using a cloud management platform. While cloud vendors offer increasingly sophisticated tools in this area, third-party solutions are much further ahead in terms of features and, as you might expect, work equally well whether you’re using a single cloud vendor or all of them.
The most-important function of such a platform is its ability to provide automated alerts—and in some cases automated remediations—when policy violations occur, as they invariably will.
2. The Second Reason: If It’s Too Big, Make It Smaller
Another area where you will frequently encounter waste has to do with the sizing of servers. When you buy a physical server, you expect it to last for at least 5 years. You may know how big that server needs to be today—but do you know how big it will need to be three years from now? Five years? Probably not. So you buy the largest server you can afford, cross your fingers and hope for the best.
To some extent this mindset has carried over to the cloud as well. If no one is watching over this properly, DevOps is probably spinning up huge virtual servers with far more vCPU and vRAM than is needed. It completely overlooks one of the most fundamental tenets of cloud: cloud is elastic. Spin up what you know you need today and it’s an almost trivial matter to migrate to something larger (or smaller) in the future.
3. The Third Reason Your Cloud Spending Is Out Of Control
People still don’t “think cloud.” They don’t understand the on-demand nature of cloud. As a result, a lot of savings opportunities are missed.
Here’s the simplest example. Your developers all go home by 7 PM or midnight or whenever. They don’t turn up until 10 AM the next morning. They go home on weekends and holidays. They’re not working, and their cloud servers aren’t working. The servers are just sitting there, not gathering dust but gathering charges on your bill.
In the “good old days,” it’s just a server sitting in your data center and it’s only costing you a few pennies extra each month for electricity. But in the “brave new world” of cloud, it’s easy to power them down and grab those savings.
4. The Last Reason: Make Everyone Care!
I can remember sitting in a client’s meeting room with their Cloud Ops team. The client had just published their annual report, a 50-page 4-color glossy booklet that not only reported on the previous year’s results but also laid out the company’s vision for the future. In a nice touch, the report had lots of photos of the company’s staff hard at work.
“How many people in this room have read the company’s latest annual report?” I asked. I looked at the 10 other people sitting around at the table, and not a single hand went up. Not one person there had bothered to look at the report—because their perception was that it had nothing to do with their daily jobs.
The best system administrators and DevOps people can often operate with blinders on. They know the tasks they have to do and are intensely focused on them, sometimes to the omission of everything else. But—and this is important—this is not their fault.
You might be the greatest manager in the world, but if all you’re doing is assigning tasks, checking progress and putting out status reports, you’ve only done half your job.
It’s the manager’s job to make people care, to communicate the passion of the company’s mission and make every person on the team understand not just the vision but also their role in achieving it.
A big piece of this communication is making teams understand the budget impact of their actions, and how saving money not only benefits the company but will directly benefit them as well. After all, more profits mean higher bonuses and possibly even a greater headcount to share the workload.
Optimizing cloud costs is everyone’s job. Communicating that consistently to the team, making everyone understand their role and their role’s importance, makes it easier for you to do your job, because now everyone around you will make it a priority to contribute to the cost-savings effort.