Blog Series: Gaming the Cloud

Cloud Success Requires Cost-aware Engineering

This is a true story from the “Cloud Cost Czar Chronicles.”

Our S3 “penny per one thousand” API costs started to rise rapidly in the second half of the cloud infrastructure billing period. We have seen this behavior before, and knew this could be attributed to increased usage, a new defect, or a design flaw that rears its head at a scaling tipping point. My job as “cost czar” is to raise the alarm and work with the team to figure out what was going wrong. At the observed rate of increase, the excess charges would push the monthly bill beyond the budget. One thing we have learned in the cloud, is that costs can rise quickly, but take awhile to go down, since the deceleration effect can be out of proportion to the acceleration if trying to manage expense in a single billing period.

When we started using Amazon Web Services S3 (a PaaS object store) back in 2007, we were acutely aware of the three pricing vectors in effect; storage consumed, price for API calls to store and list data and price for API calls to read and delete data. We’ve been using S3 heavily for five years and we tried to model the “all-in” costs as accurately as possible. But “guestimating” costs beyond the raw storage was stretch. PaaS services have an intrinsic “social engineering” element. If you color outside the lines the financial penalty can be significant. But if you master the pricing game, the rewards are equally as significant. So five years ago we thought as long as we point in the right general direction, “we’ll figure it out as we go along.” Some assumptions proved a positive surprise. Raw storage costs went down. Some surprises not so pleasant; continually wrangling the API usage fees, especially the transactions that cost a penny per thousand, proved to be a constant challenge. But I still like my options with S3 compared to buying storage from a hardware vendor and having to incur the administrative overhead. With S3 we can lower our costs by smarter engineering. With storage hardware, the only way to lower costs is to wrangle a better deal from an EMC sales person. As one of the original “cloud pioneers,” Sonian is not alone in this effort, and it’s been a real eye-opener for software designers to have to think about how their code consumes cloud resources (and expense) at scale. Because whether a penny per thousand or penny per ten thousand, when processing hundreds of millions of transactions a month, any miscalculation suddenly brings a dark cloud raining over your project.

Read more…

A 2007 Multi-Cloud Fantasy Becomes a 2012 Reality

Five years ago I wrote a business plan that described an archiving SaaS project built on cloud computing. In 2007 that was an uphill battle to convince prospective investors “the cloud was the future.” And at that time there was really only one cloud, from the e-commerce giant Amazon. Amazon Web Services really started the modern cloud movement. No existing IT provider (IBM, HP, Microsoft, Dell, etc.) would have had the gusto to upset their current business model with a “disruptively priced” cloud option. For the past four years those IT giants fought the cloud momentum until they had a credible cloud themselves. But for a lean start-up getting funded five years ago, it wasn’t a stretch to assume other clouds would appear to take on Amazon.

The graphic above was my crude way to visualize how a cloud-powered digital archive, anticipating someday living on multiple clouds, could in essence become a “cloud of clouds.” A lot of positive breakthroughs would need to occur to be able to successfully operate a single reference architecture software stack across more than one cloud. There was no terminology to describe this desire. We weren’t using terms like “Big Data” or “DevOps” nor many of the acronyms that today are common lingo in our modern cloud-enabled world. The business plan depicted a system designed to manage lots of data, and being an enterprise document archive, the data itself was large in size and numerous in quantity. We probably started one of the worlds first cloud-big-data projects.

In the beginning the multi-cloud goal was a fantasy dream, a placeholder for a future that seemed possible, but the actual crawl, walk, run steps not precisely defined because we didn’t yet know “what we didn’t know.”

So why in 2008 were we thinking about “multi-cloud?” The answer is we wanted to avoid single vendor lock-in and maintain a modicum of control over our infrastructure costs. The notion of an evolving multi-cloud strategy meant the ability to seek lowest cost of goods from multiple cloud vendors. In the pre-cloud IT world, when services were built on actual hardware, pricing flexibility was derived by negotiating better deals with hardware vendors. The customers didn’t know or care that their SaaS app might be powered by HP sever one day or a Dell 1U box the next. Those decisions were up to the discretion of the SaaS provider to get the best infrastructure value by shopping vendors. But in a single cloud, when there is only one choice, there’s no ability to negotiate between multiple vendors, unless you have multi-cloud dexterity.

Multi-cloud capable means the necessary infrastructure and abstraction layer is available to run a single common reference architecture on different clouds at the same time, with one master operator console. Multi-cloud is almost like, but not exactly, the concept of running a common program across IBM, DEC, Control Data mainframes. The clouds today somewhat resemble massive time-sharing mainframes of the previous decades.

Our early start five years ago, and all the hard lessons learned since, allows us to easily assume a commanding position in multi-cloud deployments. Engineering teams just now starting their “cloud journeys” will learn from us pioneers, but there is an old saying; “until you’ve walked a mile in my shoes, don’t claim to know anything otherwise.”

Read more…

The Problem with PaaS Pricing: Total Cost Uncertainty at Scale


Highlights of this post:

  • PaaS costs are difficult to predict at scale
  • IaaS costs are going down due to improved operational proficiency
  • Admin cost differences between IaaS and PaaS are negligible
  • PaaS should be less expensive to get better market traction


Here’s a handy decoder ring for all the acronyms in this post:

  • IaaS = Infrastructure as a Service. On-demand compute and storage typically available as an API call.
  • PaaS = Platform as a Service. On-demand turn-key web service that abstracts scaling, and reliability available as an API call.
  • AWS = Amazon Web Services
  • EC2 = AWS’s Elastic Compute web service
  • DIY = Do it Yourself
  • DevOps = Developer Operations… a new category for cloud systems management
  • GB-month = A pricing mechanism for cloud storage. Amount stored multiplied by hours stored multiplied by unit-cost per month during a billing period.

In the past I have written about the pros and cons facing cloud architects when choosing between an IaaS or PaaS solution for critical application infrastructure. Take a moment and read this post, Balancing Infrastructure as a Service (IaaS) versus Platform as a Service (PaaS), which focuses on the trade-offs between IaaS flexibility and PaaS’s vendor lock-in. There I briefly mention PaaS pricing challenges, so wanted to expand on that topic with a point of view on how current PaaS pricing schemes hinder adoption.

I’ll put my main theme right out here: Most PaaS solutions have a fundamental problem estimating operating costs at production scale.

There’s an implied “grand bargain” for cloud customers who expect an economic advantage for choosing a cloud PaaS service over a comparable cloud IaaS equivalent. From an anecdotal perspective that seems true. When using PaaS you expect lower people and development costs. PaaS is supposed to provide a price advantage because extensive operational efficiencies are supposed to lower costs. This is because massive physical and human expense are spread across many many customers. It’s a text book example of the “economies of scale.”

But wide-spread PaaS adoption is being hindered because cloud architects can’t wrap their minds around reliable cost estimation. Cost calculators, without real-world at scale metrics, give a false economic security.

Read more…

“Playing the Cloud” is the positive alternative to “Gaming the Cloud”

What’s the difference between these two commonplace bumper sticker slogans?

– War is not the answer…. End War
– Peace is the answer…. Make Peace

Both statements have the same good intention, except “end war” puts the emphasis on the relatively negative word “war,” while “make peace” puts the emphasis on a very positive thought. Words matter, just ask any political pollster how they craft their surveys & slogans, and you’ll learn the persuasive word science of Frank Luntz . Written words are the results of thought, thoughts are the results of “energy,” and putting energy out to the Universe creates results.

Internally at Sonian I coined the phrase “gaming the cloud” as a rally cry to describe how we manage the cloud to our benefit. We’re not doing anything unsavory or nefarious, just architecting software and creating processes that take advantage of all the positive attributes of cloud computing. We’re innovating exactly how cloud infrastructure vendors hoped would happen when they welcomed ISV’s to innovate on their platforms. It’s the best of all times to be creating cloud-powered software as a service.

So when I say “gaming the cloud,” what I really mean is seeking our best economic advantage. When we make the right software design decisions, we’re not only getting the best cost of goods, we’re also getting the best reliability. But I realize the phrase “gaming the cloud” carries an unintended negative pall. The mental image is not in the true spirit of our mission.

So what’s the positive alternative to “Gaming the Cloud?” …. Playing the Cloud.

“Playing the Cloud” to create superior economic advantage!”

Deconstructing the new phrase, the pivotal word is “playing” as the antidote to “gaming.” Gaming has a negative connotation, while “playing” is neutral at worst, and can be used in a variety of ways as a play on words. In an economic sense there is “playing the market,” or “playing the ponies,” and in a mission sense there is “playing to win.”

The cloud is a system of pricing rules. Prior to the cloud, system architects thought in terms of “servers” as their building blocks. In the cloud, the building blocks are compute units and API calls. Servers have costs, fairly easily understood since that was the reference standard for the past twenty years. Architects could determine overall system costs by knowing how many servers they need. In the cloud, compute units and API requests also have costs, but priced very differently than a piece of hardware. Cloud architects have a more difficult time figuring out total “infrastructure expense.”

The future looks great since the cloud is all about “infrastructure as code,” cloud-powered systems can be made self-aware of their own internal operating costs. That’s a dramatic paradigm shift from the old co-location days.  And we’re witnessing this shift in real-time as cloud adoption rises.

So when you hear the phrase “gaming the cloud” don’t imagine the dark-side… think about the positive alternative “Playing the Cloud” for superior economic advantage.

Gaming the Cloud: Balancing IaaS versus PaaS

This is the third post in the Gaming the Cloud series. In the first two posts I wrote about having the right use case and the need for cost-aware applications in order to “win in the cloud.” These are important initial steps, at a conceptual level, to adopting a cloud computing model. This post dives deeper into the tools available and how to balance cost versus control and cloud vendor “lock-in.”

A robust discussion about cloud computing should cover the two ways of consuming the cloud: Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). Let’s define these now and then compare and contrast the optimal way to utilize IaaS and PaaS in our pursuit to game the cloud.

Cloud computing in 2011 is mostly used to support large-scale web applications. Over time the cloud will start to power traditional IT, but for now big SaaS apps are where the innovation is occurring. In terms of software and infrastructure required to power software as as service (SaaS) systems, the state of the art thinking has been an evolutionary process as we have seen SaaS delivered by non-cloud co-location morph into cloud in IaaS livery and then finally cloud with PaaS. Cloud PaaS is at odds with human nature’s desire for more control. But having more control also comes at a cost, and the industry is collective reconciling the most efficient way to balance cost versus control. The pros and cons of Platform as a Service are at the center of this debate.

Certainly the cloud is now seen as a credible provider of IT services and the sniping between the no-cloud “Co-Lo” naysayers and cloud supporters is subsiding. But now that we’re firmly in the pure cloud world, another skirmish is brewing. IaaS versus PaaS. IaaS today is what we used to think about Co-Lo just 5 years ago. Accelerating technology evolution will have PaaS becoming the new acceptable standard in the next few years, and the industry will look back on 2011 with the attitude “what was all the fretting over IaaS versus PaaS.” The debate about IaaS versus PaaS can be summarized as:

If all things are equal in terms of raw material costs, and there is no fear of vendor “lock-in,” then what’s the best choice to maximize time and effort?

Comparing IaaS versus PaaS with respect to cost, control and lock-in, along the same continuum, looks like this:

One advantage IaaS has over PaaS is more predictable infrastructure cost estimating. Compute and storage are easier to model when the building blocks are CPU hours and gigabytes consumed per month. Extrapolating PaaS costs is a more challenging exercise because cost models are multi-dimensional. Over time multi-dimensional pricing will be a benefit, since the software written specifically for PaaS systems can operate more efficiently. With PaaS there is a one-time learning curve to master the APIs and operational characteristics. The pay back for this one time investment will yield dividends forever. IaaS also has a learning curve, but it’s less steep than PaaS, and IaaS also has long-term operational costs that do not go away.

PaaS allows teams to focus on core domain expertise and not get bogged down “fighting yesterday’s fires.”

Within a specific cloud eco-system, Amazon Web Services for example, we can game each of the building block components (S3, EC2, EBS, RDS, SDB) to achieve the best cost / performance advantage. Read more…

Gaming the Cloud: You Need “Cost Aware” Applications

This is the second post in my “Gaming the Cloud” series. You can read the first post here.

There are two primary reasons to adopt cloud computing for SaaS applications: One, save money and two, be more reliable (and the great aspect of the cloud is both can be achieved with the same engineering effort.) There is no reason to use cloud computing unless you have a “cost aware” application. If you use the cloud to power traditional enterprise software you won’t save money, and will probably be less reliable too. See my previous post in the “Game the Cloud” series about having the right use case. Go read it now and come back. Do you have the right use case? If so let’s continue. After validating your use case, the next step in your cloud journey is to design your application to be elastic and at the same time “cost aware.”

So what exactly is a cost aware application and why should you care?

In the old world of SaaS, using traditional co-located data centers and co-mingled hardware, it was nearly impossible to figure out at a granular level how much each software component costs itself to run. With the cloud this all changes in a very positive way. As compute and storage are consumed in small units, and each of these units has a cost (for example compute at ten cents an hour or storage at fifteen cents a gigabyte) it’s a requirement to think about software designs that focus on operational efficiency because we can now measure costs at the atomic level. When I started Sonian in 2007, with a mandate to be purely cloud focused, we had access to 1 CPU type. Very quickly, our cloud provider Amazon offered more CPU variety and our reference architecture matured in real-time as we were able to optimize the software to match virtual compute units that had more memory, more cores, or both.

A cost aware application is software with an inherent design to “game the cloud” and be ultra efficient on every transaction. This in essence means granular workload management, the ability to right-size the CPU profile for the task, and take advantage of several long-term cost management features offered by the cloud infrastructure providers.

Read more…

Gaming the Cloud: Start with the Right “Cloud” Use Case

The other day I met with a group of Boston start-up CTOs to share ideas on technology and team building. Typically these discussions veer into helping each other with technical challenges scaling SaaS software.  Of all the companies present, we’re all pushing the envelope with big data, large subscriber bases, and managing lots of cloud infrastructure. Many great ideas were exchanged, but one major theme resonated: The cloud may not be the best place for every kind of software stack. The group represents many different use cases; big data archiving, analytics, social networking for niche audiences, video encoding, application performance analysis, Advertising sales networks, and others.

To understand how we got to this place today, let’s step into the time machine elevator and press the button for “Year 2008.” Down, down we go and the doors open to “the cloud” (and specifically Amazon Web Services) just coming onto the tech scene. All of a sudden software architects and developers could control their own infrastructure (and not have to hassle with hardware sales reps or CFO’s and their purchase orders.)  We technologists, myself included, with big ideas “projected” our hopes and dreams onto the cloud as the panacea solution for all our infrastructure needs. We understood for the most part utilizing the cloud would require new ways of thinking about software architectures. We also had “real world” time-to-market pressures to get our service running quickly and start to prove out business viability. The cloud has tremendous potential to help accomplish great things, but used incorrectly, the cloud could cause a lot of (financial, technical stability) harm.

In the “gold rush” to the cloud we (the nascent start-ups pioneering in the cloud) were all learning at relatively the same time the same “lessons.” How and when to scale, how to analyze costs and efficiencies, the need for different kinds of monitoring and alerting, how to deploy software updates to a cloud-based environment, the list goes on. Many lessons learned, and the theme of mastering the cloud tilted more toward “let’s figure out how to “game the cloud.” Because there’s no reason to use the cloud unless you can make the economics work, and to make the economics work requires a mind set to “game the cloud;” process more data & transactions at least cost. Easy to write, but very technically challenging and rewarding to pull off.

Read more…