A pile of dollar bills, to represent the article theme of Cut Cloud-Hosting Costs
A pile of dollar bills, to represent the article theme of Cut Cloud-Hosting Costs

Taming Runaway Cloud Expenses: How I Cut Cloud-Hosting Costs by 32%

2079 0

Editor’s Note: We are delighted to feature a guest author on Cloud Wars today. This is the first in 2-part series by Steve Schechter, an IT Director based in Hong Kong. Part 2 will run on Saturday, March 21.

At a recent client engagement, I was told to cut cloud-hosting costs by 30%. That was a tall order, but since most businesses haven’t yet figured out how to optimize their cloud spending, I figured it could be done.

Then, just to make it even more interesting, the CTO gave me two restrictions. First, I couldn’t commit to long-term deals (“reserved instances” or “enterprise agreements”) that would give the company a nice discount but also lock it in with one vendor. And second, I couldn’t downsized underutilized servers (rightsizing).   

While those restrictions would force me to get more creative, I eagerly accepted the engagement because this is just the kind of challenge I enjoy. In the end, I cut the hosting bill by 32%. Here’s how we did it. 

How I Cut One Company’s Hosting Bill By 32%

The company thought they were in good shape because they had “cloud management”—however, they had no cloud governance. The result was that they weren’t managing their cloud, their cloud was managing them. Every month they received a 50-page invoice from AWS that they couldn’t understand and had no option other than to pay. They had no idea how much they were spending on hosting individual applications and, for each of those applications, no idea of how their spend on production environments compared to their spend on non-production environments.

Before I could start down the road to savings nirvana, I had to take two important preliminary steps.

First, I brought in a relatively new kind of third-party software, a Cloud Management Platform (CMP). CMPs allow a cloud finops analyst to conduct detailed cloud-resource inventory reviews much more quickly and more accurately than using the cloud vendors’ portals. I chose a product called CloudHealth, which is now owned by VMWare.

Next, I introduced a global cloud-resource tagging scheme. In the past, this client had left tagging up to individual developers. Going forward, this global approach made it easier by orders of magnitude to understand each individual cloud resource, how it was used and who owned it. 

Finally, I was ready to dig into the cloud-resource inventory and identify the savings opportunities.

Four Major Steps to Bring This Out-of-Control Cloud-Expense Monster to Heel
1. Apocalypse for the Zombies.

I believe in always beginning with low-hanging fruit, something to deliver a quick win and show management you’re on the right track. So I looked for “zombie assets”—cloud resources that were unused and forgotten but still active, frequently because the people who’d created them were no longer with the company. There were dozens of these. Identifying them, archiving them where appropriate and then killing them represented a path of least resistance, because no one in their right mind would have argued in favor of keeping them. This didn’t represent massive savings—perhaps just a few percentage points off the monthly bill—but it gave me the quick win I wanted.

2. Backup Madness!

Next, I examined backup policies and data retention. To my complete lack of surprise, this was out of control. Production systems had daily backups stretching back more than three months, and in some cases more than six months. It turned out that no one in IT had ever looked at client contracts to see that they only needed two weeks’ retention. Turning this around required coordination with product owners, project managers and legal. After all necessary approvals were secured, cleaning this up released petabytes of hard disk. This resulted in savings of almost 30%. (For this task, it was critical to not just delete the unwanted backups, but also to drive changes to backup configurations for each server, to keep everything reined in for the future.)

3. And More Backup Madness!

Staying with backups, I found that on the non-production side, no one had ever thought to question backing up (and retaining – for months!) every non-production workstation, even though developers were checking their code into a central repository every night and QA was pulling their test cases and test data daily from a centralized repository. Cleaning this up saved another 15%.

4. Paying for Servers to Do Nothing.

Developers don’t work 24 hours a day and, in the case of this company, rarely worked weekend or holidays. Yet non-production servers were up and running 24 hours a day, 7 days a week, 365 days a year, sitting idle for two-thirds of the time. Using CloudHealth, we were able to automate shutting down these servers at the end of the day and on weekends. This change brought another 10% savings. 

Those were not the only changes we made, but they were the most impactful. And just 6 months after we got CloudHealth up and running, I was able to go to the CEO and tell him that his target was achieved. During that time, we were successful in running deep inventory analyses, obtaining validation and approvals on my proposed changes, coordinating across multiple departments, and then driving DevOps and CloudOps to make the needed changes.

Once You’re In Control, How Do You Keep It That Way?

The most important component we put in place was a new change-management procedure that required business justification for any new cloud-resource requests. This procedure included a “fast track” method that could obtain all necessary approvals within an hour, allowing the company’s IT department to remain more than sufficiently agile. 

It was almost equally important to have consistent and clear communications throughout the IT department. Cloud governance works best when it becomes part of a company’s DNA. Everyone needed to understand the starting baseline, how the unexpected large expenses were impacting the company’s bottom line, the reasons for the changes to be made, and each person’s specific role in governance. 

Finally we configured the cloud-management platform, CloudHealth, to provide automated notifications of policy violations, like new resources that were not properly tagged, unattached EBS volumes, snapshots over a certain age, etc. 

Because while it felt good to clean up the AWS bill on a one-time basis, that approach simply isn’t enough. You need to establish a Cloud Governance Framework that ensures your cloud usage includes the “No Surprises” option for the future. 

Although, I must admit, the CTO and CEO were pleasantly surprised when I told them that for this one-time cleanup, we not only met the target of 30% expense reduction on cloud hosting, but actually came in with a final reduction of 32%.

Steve Schechter is the Director of Cloud Services at Velocity Technology. He can be reached at [email protected]

Subscribe to the Cloud Wars Newsletter for in-depth analysis of the major cloud vendors from the perspective of business customers. It’s free, it’s exclusive, and it’s great!