Cloud Success Requires Cost-aware Engineering

This is a true story from the “Cloud Cost Czar Chronicles.”

Our S3 “penny per one thousand” API costs started to rise rapidly in the second half of the cloud infrastructure billing period. We have seen this behavior before, and knew this could be attributed to increased usage, a new defect, or a design flaw that rears its head at a scaling tipping point. My job as “cost czar” is to raise the alarm and work with the team to figure out what was going wrong. At the observed rate of increase, the excess charges would push the monthly bill beyond the budget. One thing we have learned in the cloud, is that costs can rise quickly, but take awhile to go down, since the deceleration effect can be out of proportion to the acceleration if trying to manage expense in a single billing period.

When we started using Amazon Web Services S3 (a PaaS object store) back in 2007, we were acutely aware of the three pricing vectors in effect; storage consumed, price for API calls to store and list data and price for API calls to read and delete data. We’ve been using S3 heavily for five years and we tried to model the “all-in” costs as accurately as possible. But “guestimating” costs beyond the raw storage was stretch. PaaS services have an intrinsic “social engineering” element. If you color outside the lines the financial penalty can be significant. But if you master the pricing game, the rewards are equally as significant. So five years ago we thought as long as we point in the right general direction, “we’ll figure it out as we go along.” Some assumptions proved a positive surprise. Raw storage costs went down. Some surprises not so pleasant; continually wrangling the API usage fees, especially the transactions that cost a penny per thousand, proved to be a constant challenge. But I still like my options with S3 compared to buying storage from a hardware vendor and having to incur the administrative overhead. With S3 we can lower our costs by smarter engineering. With storage hardware, the only way to lower costs is to wrangle a better deal from an EMC sales person. As one of the original “cloud pioneers,” Sonian is not alone in this effort, and it’s been a real eye-opener for software designers to have to think about how their code consumes cloud resources (and expense) at scale. Because whether a penny per thousand or penny per ten thousand, when processing hundreds of millions of transactions a month, any miscalculation suddenly brings a dark cloud raining over your project.

Read more…

The Problem with PaaS Pricing: Total Cost Uncertainty at Scale

 

Highlights of this post:

  • PaaS costs are difficult to predict at scale
  • IaaS costs are going down due to improved operational proficiency
  • Admin cost differences between IaaS and PaaS are negligible
  • PaaS should be less expensive to get better market traction

 

Here’s a handy decoder ring for all the acronyms in this post:

  • IaaS = Infrastructure as a Service. On-demand compute and storage typically available as an API call.
  • PaaS = Platform as a Service. On-demand turn-key web service that abstracts scaling, and reliability available as an API call.
  • AWS = Amazon Web Services
  • EC2 = AWS’s Elastic Compute web service
  • DIY = Do it Yourself
  • DevOps = Developer Operations… a new category for cloud systems management
  • GB-month = A pricing mechanism for cloud storage. Amount stored multiplied by hours stored multiplied by unit-cost per month during a billing period.

In the past I have written about the pros and cons facing cloud architects when choosing between an IaaS or PaaS solution for critical application infrastructure. Take a moment and read this post, Balancing Infrastructure as a Service (IaaS) versus Platform as a Service (PaaS), which focuses on the trade-offs between IaaS flexibility and PaaS’s vendor lock-in. There I briefly mention PaaS pricing challenges, so wanted to expand on that topic with a point of view on how current PaaS pricing schemes hinder adoption.

I’ll put my main theme right out here: Most PaaS solutions have a fundamental problem estimating operating costs at production scale.

There’s an implied “grand bargain” for cloud customers who expect an economic advantage for choosing a cloud PaaS service over a comparable cloud IaaS equivalent. From an anecdotal perspective that seems true. When using PaaS you expect lower people and development costs. PaaS is supposed to provide a price advantage because extensive operational efficiencies are supposed to lower costs. This is because massive physical and human expense are spread across many many customers. It’s a text book example of the “economies of scale.”

But wide-spread PaaS adoption is being hindered because cloud architects can’t wrap their minds around reliable cost estimation. Cost calculators, without real-world at scale metrics, give a false economic security.

Read more…

The Evolution of Purchasing Cloud Compute

In the beginning, as it were, just five years ago, purchasing cloud compute was simple because the rules were easy to understand and there were no choices. There was one cloud compute instance type that cost ten cents an hour. And a credit card was the only way to pay your monthly cloud compute bill.

Today there are a myriad of compute instances to choose from and multiple ways to pay for your cloud CPU time.

“Time…”

It’s the key word in the previous statement. The paradigm shift toward cloud computing away from the old dedicated co-lo world is bringing back the concept of purchasing compute “time.” Seasoned IT folks know there is nothing new to the concept of “buying” computer time. In the mainframe era (would it be hurtful to call it the IT Jurassic age?), computer time-sharing was the norm, and developers had to be mindful of how much computer time their programs consumed because mainframe’s were very expensive. In today’s dollars the equivalent of hundreds of dollars per hour.

Steve Wozniak, in his autobiography iWoz, tells a funny-turned-serious anecdote from his University of Colorado at Boulder freshmen year. He couldn’t return for his sophomore year because a program he executed on the University’s timeshare mainframe excessively consumed shared compute resources. The fees charged to his computer science department were astronomical for the era; more than $10,000.

Today’s cloud has the same gotchas: Watch out for excessive consumption. Once an hour has been consumed, there is no “return policy” to get your money back.

The cloud is pulling us back to that “time-sharing” mindset. We’re not buying 1U servers anymore. Instead, we’re buying virtual compute processing time based on haw many hours in a month a CPU runs, regardless of how much work was performed.

In the cloud, time is the unit of consumption and the month is the billing period.

Read more…

Emerging BRIC Companies Will Leap to Cloud IT

I’ve been thinking about penning this post for awhile. Amazon’s  new South American cloud region spurred me to finally sit down and write it.

The basic theme is this: The trend we saw with world-wide mobile phone adoption is about to repeat for global enterprise IT.

In the 1980’s the United States led mass mobile telephone adoption with the roll out of analog service across the country. Long forgotten providers like Cellular One, McCaw Cellular and others invested billions building networks of towers connected by back-haul data lines. Motorola sold analog phone sets (remember the “bag phone”?) to use with these new analog networks. The rest of the world experimented and watched the US roll out.

In the mid-1990’s Europe and Japan began implementing widespread mobile telephone network infrastructure. But instead of using analog technology, they leaped directly to digital. They didn’t have a legacy of analog investment to amortize, and instead were able to take advantage of new digital features like texting and multi-media messaging on “day one” of their new cell phone services.

The US endured a protracted digital roll out because there was so much invested in the prior generation’s technology. For many years US mobile handsets were tri-mode so they could roam to analog networks while the digital networks matured. This is a classic example of the “fast follower” leaping over “the pioneer.”

Read more…

Cloud Innovation Acceleration Effect: Now Releasing 100 Stories

Cross-posting here a two part essay I wrote for the Sonian blog on how Sonian is benefiting from, and contributing to (by amplification,) the innovation cadence in cloud computing.

I’ve been working in enterprise software since the late 1980′s, and what I am witnessing as a participant in “the cloud” is the pace of cloud technology innovation over the past five years blows away the previous two decades.

There is a real noticeable trend here. We didn’t see this in SaaS powered by co-location hosting. What we are seeing with the cloud, and the ISV’s that adopted the cloud five years ago, is truly amazing. Sonian is entering a release cadence updating production systems with substantial new features every month.

Cloud Innovation – Part 1

  • Innovation history of Amazon Web Services 2005-2007
  • How Sonian amplifies cloud innovation
  • Sonian as an example of the “perfect” cloud ISV

Cloud Innovation – Part 2

  • Innovation history of Amazon Web Services 2008-2011
  • Comments about Gov Cloud

 

 

 

A Brief History: Cloud CPU Costs Over the Past 5 Years

When I started Sonian in 2007, one of the driving forces for beginning what would become my third start-up journey was the allure of all-you-can-consume “ten-cent-per-hour” cloud computing. Amazon Web Services was the new IT game changer in town, and the on-demand compute platform they launched in August 2006 literally brought cloud computing to the masses over night. These past five years I have been studying “cloud costs” in different ways, and this weekend I looked back at the compute pricing history and uncovered some interesting trends.

Before I continue with this post, here’s a brief history of my experience with previous “clouds,” which illustrates why in 2006 I was ready to take a big leap into the AWS cloud as an early adopter.

In 2004 I was involved with another SaaS information archiving project, and I worked with a team at SUN Microsystems to create a reference architecture for our archive software stack to live on the “SUN Utility Compute Grid.” At the time we were hosting the archiving software on dedicated co-located hardware racks, and planning a large capital expenditure to increase capacity. In the guise of “there has to be a better way!” we entertained the idea of moving our software to SUN’s “cloud.” (In 2004 the term cloud computing was pretty alien…. the common term for this type of shared virtual computing was “utility computing”). SUN offered the promise of true utility computing, but at the end of six months of effort, we could not make the underlying cost structures work. SUN was charging one dollar per CPU hour and one dollar per gigabyte per month for storage. We ended up adding more capacity to our existing co-located hardware plant because our “all-in” internal unit costs were less than what SUN was willing to sell their compute grid for.

Now back to the purpose of this post… a historical analysis of the cost of cloud CPU from 2006 through 2011.

Beginning in August 2006, Amazon Web Service’s new Elastic Compute Cloud (EC2) service introduced the concept of the EC2 Compute Unit (ECU) …. a standardized way to define a unit of cloud computing, the associated characteristics of that unit (processor speed and memory), and a revolutionary hourly cost model requiring no up-front expense. Amazon achieved what SUN, IBM and others had been talking about for years, but could never bring to market. In 2006, for ten cents per hour 1 EC2 Compute Unit could be rented with no up-front costs. In 2006, a single ECU was defined as equivalent to a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor with 1.7  Gb of RAM. This 1 ECU reference is still in effect today.
Read more…

State of Cloud Computing in Europe

I have just returned from a week in the United Kingdom meeting Sonian customers and business partners. The purpose of the trip was to expand Sonian relationships, but an added benefit was the opportunity to glean perspectives and adoption attitudes toward cloud computing in the greater EU market. Sonian is in a unique position to observe cloud adoption trends since our SaaS service is powered by true cloud computing infrastructures, and the conversations with EU business and technology leaders revealed their true thoughts about the state of cloud computing and indicators on adoption curves in 2012, and beyond.

The Buying Market

The EU market is not one single cohesive market, but rather smaller subsets that share some ideas, and diverge on others. UK and Ireland are (as you would expect) similarly aligned, and appear to share more in common with the Scandinavian countries, than with Germany and France, which have their own country-centric view of the cloud. The French language institute can’t even come to agreement on what to call “cloud computing” in France, settling on ”informatique en nuage” as a placeholder, but still searching for a unique French term that doesn’t break their rules on language purity and consistency (other examples: Software development is called “software addition” and the people that create software are called “software editors” since they literally “edit” source code files.) From my observations, the UK, Ireland and Scandinavian countries share more in common with the US thinking about the cloud, compared to Germany, France, Italy and Spain, which diverge on a number of key issues around data locality.

The total addressable information technology market in EU is roughly equal to that of the US. Except instead of a single national set of business rules, the EU market is fractured into separate countries, languages, tax systems and local business customs. This separation dramatically reduces the business efficiencies of technology providers attempting to service the EU community. The cloud could be seen as an antidote to inefficiency. Imagine an “EU Cloud” operating in a locality that pleases all consumers, and is the trusted provider. But it feels like a stretch goal to expect a single EU cloud to be accepted with the current barriers to a cohesive EU business strategy.

The Role of Government

The role of government in EU countries is more pronounced than we see in the United States, but there is no evidence yet that EU governments are pushing cloud computing as generic trend onto the private sector. Just recently the UK government established their “G.Cloud” initiative which looks similar to the US government’s “Cloud.gov” and Data.gov initiatives. This trend could be described as a “lead by example” scenario, with central government adopting cloud computing as proof it’s safe, cost effective and viable for the private sector. A myriad of data handling regulations seek to enforce “privacy” and “resiliency” to ensure citizens are protected from un-authorized access to personal information.

Read more…

The Secret Life of a Cloud Cost Control Czar

It’s a little after 7 in the morning and I tap the space bar key to wake my Macbook Air from it’s slumber. I click a tab in Chrome, hit refresh, and with a slight pang of “what will I see,” look at the balance for our October cloud infrastructure bill. You may know this feeling… think about a recent time opening the credit card bill and dreading an unwanted surprise.

“I don’t think I overspent this month, but… there was that steak dinner in New York City …”

Monitoring the rate of spend to make sure we’re not going to break the budget is one task in the routine as the “cloud cost czar.” It’s a daily task to track the trend lines and sound the alarm if expenses start to creep off plan.

The “czar” is the human gas pedal – modulating the enormous pulsing “cloud software engine” as we process half a terabyte of data a day.

But I am not a solo act in this cloud pageant. Making the cloud work from an economic perspective is a total team effort. It all starts in the engineering group with cloud-appropriate core architecture designs. And continues with quality testing, and and then to the team that manages daily operations. Everyone plays a role and has responsibility for our prime directive: process the most amount of data at least cost, without sacrificing customer satisfaction.

“Obsessively” managing cost is one of the three design requirements, alongside reliability and performance. “Gaming the cloud,” our internal slang for all we do to maximize efficiency, is a multi-disciplined effort the engineering and service delivery teams rally around. But there has to be a least one person who focuses on the trends, the 30,000 foot view down to sea-level: The Cost Czar.

Read more…

Gaming the Cloud: Balancing IaaS versus PaaS

This is the third post in the Gaming the Cloud series. In the first two posts I wrote about having the right use case and the need for cost-aware applications in order to “win in the cloud.” These are important initial steps, at a conceptual level, to adopting a cloud computing model. This post dives deeper into the tools available and how to balance cost versus control and cloud vendor “lock-in.”

A robust discussion about cloud computing should cover the two ways of consuming the cloud: Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). Let’s define these now and then compare and contrast the optimal way to utilize IaaS and PaaS in our pursuit to game the cloud.

Cloud computing in 2011 is mostly used to support large-scale web applications. Over time the cloud will start to power traditional IT, but for now big SaaS apps are where the innovation is occurring. In terms of software and infrastructure required to power software as as service (SaaS) systems, the state of the art thinking has been an evolutionary process as we have seen SaaS delivered by non-cloud co-location morph into cloud in IaaS livery and then finally cloud with PaaS. Cloud PaaS is at odds with human nature’s desire for more control. But having more control also comes at a cost, and the industry is collective reconciling the most efficient way to balance cost versus control. The pros and cons of Platform as a Service are at the center of this debate.

Certainly the cloud is now seen as a credible provider of IT services and the sniping between the no-cloud “Co-Lo” naysayers and cloud supporters is subsiding. But now that we’re firmly in the pure cloud world, another skirmish is brewing. IaaS versus PaaS. IaaS today is what we used to think about Co-Lo just 5 years ago. Accelerating technology evolution will have PaaS becoming the new acceptable standard in the next few years, and the industry will look back on 2011 with the attitude “what was all the fretting over IaaS versus PaaS.” The debate about IaaS versus PaaS can be summarized as:

If all things are equal in terms of raw material costs, and there is no fear of vendor “lock-in,” then what’s the best choice to maximize time and effort?

Comparing IaaS versus PaaS with respect to cost, control and lock-in, along the same continuum, looks like this:

One advantage IaaS has over PaaS is more predictable infrastructure cost estimating. Compute and storage are easier to model when the building blocks are CPU hours and gigabytes consumed per month. Extrapolating PaaS costs is a more challenging exercise because cost models are multi-dimensional. Over time multi-dimensional pricing will be a benefit, since the software written specifically for PaaS systems can operate more efficiently. With PaaS there is a one-time learning curve to master the APIs and operational characteristics. The pay back for this one time investment will yield dividends forever. IaaS also has a learning curve, but it’s less steep than PaaS, and IaaS also has long-term operational costs that do not go away.

PaaS allows teams to focus on core domain expertise and not get bogged down “fighting yesterday’s fires.”

Within a specific cloud eco-system, Amazon Web Services for example, we can game each of the building block components (S3, EC2, EBS, RDS, SDB) to achieve the best cost / performance advantage. Read more…

Gaming the Cloud: You Need “Cost Aware” Applications

This is the second post in my “Gaming the Cloud” series. You can read the first post here.

There are two primary reasons to adopt cloud computing for SaaS applications: One, save money and two, be more reliable (and the great aspect of the cloud is both can be achieved with the same engineering effort.) There is no reason to use cloud computing unless you have a “cost aware” application. If you use the cloud to power traditional enterprise software you won’t save money, and will probably be less reliable too. See my previous post in the “Game the Cloud” series about having the right use case. Go read it now and come back. Do you have the right use case? If so let’s continue. After validating your use case, the next step in your cloud journey is to design your application to be elastic and at the same time “cost aware.”

So what exactly is a cost aware application and why should you care?

In the old world of SaaS, using traditional co-located data centers and co-mingled hardware, it was nearly impossible to figure out at a granular level how much each software component costs itself to run. With the cloud this all changes in a very positive way. As compute and storage are consumed in small units, and each of these units has a cost (for example compute at ten cents an hour or storage at fifteen cents a gigabyte) it’s a requirement to think about software designs that focus on operational efficiency because we can now measure costs at the atomic level. When I started Sonian in 2007, with a mandate to be purely cloud focused, we had access to 1 CPU type. Very quickly, our cloud provider Amazon offered more CPU variety and our reference architecture matured in real-time as we were able to optimize the software to match virtual compute units that had more memory, more cores, or both.

A cost aware application is software with an inherent design to “game the cloud” and be ultra efficient on every transaction. This in essence means granular workload management, the ability to right-size the CPU profile for the task, and take advantage of several long-term cost management features offered by the cloud infrastructure providers.

Read more…