Reflecting on One Year of Cloud Cost Optimization

For the past year I held the unelected position of “Cloud Cost Czar.” I have written about the duties such a role entails in A Day in the Life of a Cloud Cost Czar. Recently I handed over the cost czar responsibility to a colleague who will carry on the daily routines and continue to improve our cloud cost management endeavors. In the handoff process, almost a year to the day of assuming the czar’s responsibilities,  I reflected on the previous twelve months and all the accomplishments the company made as a united team to “tame the cloud.”

I created a graph to visualize the dramatic change over one calendar year. To the right is an area graph that shows subscriber seats (in green) overlaid on subscriber costs (blue, orange and red; our principle costs are cloud compute and two types of cloud storage.)  As subscriber growth increased, costs went up, peaked, and then went down over the course of one year. The rise, peak, and subsequent decline all map to various cost cutting efforts initiated by Sonian engineering and support groups.

Throughout the year we got smarter on how to “purchase” compute time for less than retail, how to store more customer data while consuming less cloud storage, and how to process more customer data using fewer CPU hours. In the cloud, we re-affirmed with a high-five on each improvement, we were in control of our cost destiny. This is when the phrase “infrastructure as code” really means something.

In contemplating the past year I feel accomplishment and relief, and still energized with the zeal to carry on, since there is still so much more we can achieve “playing the cloud” to mutually beneficial outcome. The cloud is a promise of a “grand bargain” between the infrastructure provides (Amazon, Rackspace, IBM, etc.), the cloud application builders (Sonian and others) and the end customers. Customers are told the cloud is less expensive; That cost is one of the primary reasons to move on premises to the cloud and save money. The grand bargain is that cost efficiencies should be shared at all levels. When a cloud vendor finds a better way to manage storage, the application vendor gets a lower unit price. the end customer should be able to share in the savings too. Otherwise the trust between the layers breaks down and the main reason to “cloudify” becomes less compelling. If the cloud is to succeed, this scenario cannot play out.

The graph tells the Sonian story in numbers, lines and colors. But what’s not visible is the behind the scenes work required to give this story a happy ending. You see, Sonian’s success requires us to manage cloud costs with an enthusiasm never before seen in enterprise SaaS. When we started five years ago, we didn’t know how difficult cloud management cost control would be. Fortunately we learned quickly enough to survive and then thrive. We’re enjoying the benefits of five years of platform investment and thinking (positively) differently about how cloud-centric application stacks need to be architected and coded to operate at least cost (without sacrificing customer experience.)

Sonian’s business model sells a fixed-price service to customers that bring a variable amount of data to the cloud. Success requires continually tuning the system to operate at peak efficiency and invest engineering effort to find savings. The strategy has worked well so far. And it hasn’t hurt that cloud providers like Amazon have consistently lowered their prices, coupled with our agile ability to re-design system components to take advantage of these savings. Cloud cost savings might be in the form of a direct price cut, in which case we recognize the savings immediately, but more often than not, the savings requires us to make changes. In all cases the one-time engineering work is worth it because the savings will be realized over the lifetime of operation.

The graph shows our three key metrics: Subscriber counts and unit costs to service the subscribers. We’re SaaS. We look at everything as “units per month.” That means costs and subscribers. The graph is a trailing one year view of key metrics. The actual dollar values and counts have been removed, but the point is the relative change over time of these key metrics. Starting mid-year, Sonian’s incredible Engineering, Support and DevOps teams implemented technical innovations that started a momentum to decrease expense month over month, even as new subscribers were added.

So how did we lower costs by half, and nearly double our subscribers? 5 lessons learned in the “Year of Cloud Cost Cutting”

0. Before even considering the cloud, you need to have the “right use case.”

  1. The entire product team (product management and engineering) practices “cost aware engineering.”
  2. Implement the right core architecture to take advantage of  various cloud pricing mechanisms (spot, reserved, retail).
  3. Measure cloud costs with sufficient granularity to understand near term and long term trends. This includes calculating an “hourly” expense rate, comparing current values with historical trends, and the ability to measure compute, storage and API costs separate. Ignoring API costs can
  4. Create, maintain and follow a reference architecture that must include cost estimates for a production system.
  5. Use a design that allows incremental infrastructure increases so that the system operates at peak efficiency. Idle cloud infrastructure
  6. Bonus: Buy reserved instances. Build a financial model to show how a reasonable upfront cash outlay will lower compute costs by up to 30% (and if you spend more upfront, the savings will increase.)

I’ll write more about how Sonian “games the cloud” to find costs savings for us and our customers.