This is the third post in the Gaming the Cloud series. In the first two posts I wrote about having the right use case and the need for cost-aware applications in order to “win in the cloud.” These are important initial steps, at a conceptual level, to adopting a cloud computing model. This post dives deeper into the tools available and how to balance cost versus control and cloud vendor “lock-in.”
A robust discussion about cloud computing should cover the two ways of consuming the cloud: Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). Let’s define these now and then compare and contrast the optimal way to utilize IaaS and PaaS in our pursuit to game the cloud.
Cloud computing in 2011 is mostly used to support large-scale web applications. Over time the cloud will start to power traditional IT, but for now big SaaS apps are where the innovation is occurring. In terms of software and infrastructure required to power software as as service (SaaS) systems, the state of the art thinking has been an evolutionary process as we have seen SaaS delivered by non-cloud co-location morph into cloud in IaaS livery and then finally cloud with PaaS. Cloud PaaS is at odds with human nature’s desire for more control. But having more control also comes at a cost, and the industry is collective reconciling the most efficient way to balance cost versus control. The pros and cons of Platform as a Service are at the center of this debate.
Certainly the cloud is now seen as a credible provider of IT services and the sniping between the no-cloud “Co-Lo” naysayers and cloud supporters is subsiding. But now that we’re firmly in the pure cloud world, another skirmish is brewing. IaaS versus PaaS. IaaS today is what we used to think about Co-Lo just 5 years ago. Accelerating technology evolution will have PaaS becoming the new acceptable standard in the next few years, and the industry will look back on 2011 with the attitude “what was all the fretting over IaaS versus PaaS.” The debate about IaaS versus PaaS can be summarized as:
If all things are equal in terms of raw material costs, and there is no fear of vendor “lock-in,” then what’s the best choice to maximize time and effort?
Comparing IaaS versus PaaS with respect to cost, control and lock-in, along the same continuum, looks like this:
One advantage IaaS has over PaaS is more predictable infrastructure cost estimating. Compute and storage are easier to model when the building blocks are CPU hours and gigabytes consumed per month. Extrapolating PaaS costs is a more challenging exercise because cost models are multi-dimensional. Over time multi-dimensional pricing will be a benefit, since the software written specifically for PaaS systems can operate more efficiently. With PaaS there is a one-time learning curve to master the APIs and operational characteristics. The pay back for this one time investment will yield dividends forever. IaaS also has a learning curve, but it’s less steep than PaaS, and IaaS also has long-term operational costs that do not go away.
PaaS allows teams to focus on core domain expertise and not get bogged down “fighting yesterday’s fires.”
Within a specific cloud eco-system, Amazon Web Services for example, we can game each of the building block components (S3, EC2, EBS, RDS, SDB) to achieve the best cost / performance advantage.
IaaS – Infrastructure as a Service
IaaS is the basic building blocks for a cloud application. “Basic” as in the raw materials such as compute, storage, and networking. A “Do it yourself” (DIY) mindset is at play when one is using IaaS. (As an aside, migrating from co-location hosting to cloud-powered IaaS is an admission that DIY isn’t always the best route, but giving up total control versus the cost to let another entity manage a service is a hurdle some teams can’t get past.)
But even in the cloud there are gradations between DIY and outsourcing the same equivlanet functionality with a PaaS function. The decision of IaaS or PaaS all comes down to costs, domain expertise, and the best use of people talent and where to focus innovation efforts.
Infrastructure as a Service is the next logical step from co-location hosting in the evolution of adopting a cloud computing mindset. Moving to the cloud, even if only to utilize IaaS and not PaaS, means an evolved thinking that the DIY approach is not always the best way to purchase compute infrastructure.
PaaS – Platform as a Service
PaaS is the concept that common application building blocks such as database, queueing and file sharing (among others) should be consumed as commodity services and not a place to waste time managing as a stand-alone. The idea is the cloud provider can operate a database system more reliably and for less cost than doing it yourself. The same uber argument about cloud versus non-cloud is now heard at the basic building blocks level, with the same pros and cons.
Most architectures use a mix and match scheme blending some IaaS and some PaaS because each functional area has different strategic importance to the project. For example a web application that has light database needs but intense queueing could easily use a PaaS database solution and an IaaS queueing system. Choosing PaaS or IaaS is a separate analysis for each major sub-system in the application’s software stack.
Within many technical teams the usage of PaaS fights against the urge of the tinkerer mentality to want total control. Face it, geeks like to tinker and PaaS has less obvious tinkering ability. But really this is a falsehood. While IaaS offers more knobs and dials (and the subsequent long-term ownership responsibility) PaaS offers new greenfield ways to focus on the most important value-add efforts (research) for your project. When PaaS becomes the base level from which we operate, then a whole bunch of new innovation will occur “gaming PaaS” the same way we think about gaming the cloud now. PaaS has a lower base cost (for most use cases) so any effort spent figuring out how to process more transaction for less money with PaaS will be time well spent. Spending time supporting IaaS is doing the same old thing over and over, at increasing opportunity costs.
Below is a IaaS versus PaaS cheat sheet:
Let’s compare three popular services that can be delivered with either IaaS or PaaS approaches. In these examples the IaaS and PaaS platforms are both powered by Amazon Web Services.
Comparison: Relational Databases
MySQL is the default SQL database for nearly every web framework (also giving a shout out to PostgeSQL too, but for this example let’s focus on MySQL.) Every project needs to figure out how to get reliable, economical SQL instances running in the cloud. You can run your own MySQL instances or use Amazon’s RDS (Relational Database Service) for SQL functionality. Running your own MySQL is the do-it-yourself IaaS method and using RDS is the PaaS approach. Each has pros and cons, but it’s good to have a choice, right? The cloud is all about choice and having multiple options to solve a problem.
The first part of a IaaS MySQL versus PaaS RDS analysis should be cost. Below is a cost comparison for two common implemetation scenarios: a small instance managing a 100Gb database and a large instance for a 500Gb database. In both scenarios there is a resiliency requirement, so the calculations are for a multi-zone implemetation in both IaaS and PaaS modes.
The cost estimates speak for themselves. For many use-cases the RDS option is less expensive then do it yourself. For the small multi-zone database RDS is 15% less expensive. The savings get better for the large database example, with RDS 19% less than IaaS MySQL. Not factored in the calculation is the quantifiable dollar amount representing the distraction factor of running MySQL yourself. If we could get an honest number, that would show RDS to be even more cost effective.
For example, to use Riak instead of SDB/S3 requires different programming APIs and an application designed around how Riak behaves. The same for SDB/S3. By the way, the reason it’s both SDB and S3 as a comparison to Riak is because Riak is both a key value store and distributed file system. For the Amazon equivalent, SDB is the key value store and S3 is the distributed file system.
A fair cost comparison between IaaS Riak and PaaS SDB/S3 requires a multi-dimensional analysis of compute, storage, and number of transactions. Riak is easier to model because it uses compute and storage, but SDB/S3 charges by compute, storage AND transactions. Knowing how many transactions occur in a fast evolving software stack is difficult quantify. But it is safe to say that SDB/S3 will be less expensive up to a certain tipping point of transaction volume, in which case Riak could be the less expensive option.
The number one barrier preventing greater PaaS adoption is the fear of the unknown: Not being able to accurately predict transaction volume and their associated costs.
In the cost comparison below, the Riak configuration is a 6 node cluster with 500 Gb of attached storage per node. This would be considered a medium case architecture to handle an average load for a growing enterprise focused web application. Riak can be as small as a single node, or much larger with fifteen to twenty nodes per cluster. The more nodes, the more aggregate storage, faster processing time, and higher costs.
The Amazon SDB/S3 configuration doesn’t require pre-sizing or specific a number of compute nodes, so costing is “pay as you go.” Instead Amazon charges for amount of data stored in SDB and the compute time for inserting data and running queries against the stored data. To make a fair comparison the SDB scenario has 6 client instances making the equivalent of 24×7 query execution to match Riak’s 6 node 24×7 cluster. And herein is the difference between PaaS and IaaS for the NoDQL use case. The Riak cluster needs to run all 6 nodes 24×7, even if there is no activity. With the SDB option you are only charged for each query execution. But let’s assume there is some query running for 24 hours a day to make this a fair comparison.




