The Problem with PaaS Pricing: Total Cost Uncertainty at Scale

 

Highlights of this post:

  • PaaS costs are difficult to predict at scale
  • IaaS costs are going down due to improved operational proficiency
  • Admin cost differences between IaaS and PaaS are negligible
  • PaaS should be less expensive to get better market traction

 

Here’s a handy decoder ring for all the acronyms in this post:

  • IaaS = Infrastructure as a Service. On-demand compute and storage typically available as an API call.
  • PaaS = Platform as a Service. On-demand turn-key web service that abstracts scaling, and reliability available as an API call.
  • AWS = Amazon Web Services
  • EC2 = AWS’s Elastic Compute web service
  • DIY = Do it Yourself
  • DevOps = Developer Operations… a new category for cloud systems management
  • GB-month = A pricing mechanism for cloud storage. Amount stored multiplied by hours stored multiplied by unit-cost per month during a billing period.

In the past I have written about the pros and cons facing cloud architects when choosing between an IaaS or PaaS solution for critical application infrastructure. Take a moment and read this post, Balancing Infrastructure as a Service (IaaS) versus Platform as a Service (PaaS), which focuses on the trade-offs between IaaS flexibility and PaaS’s vendor lock-in. There I briefly mention PaaS pricing challenges, so wanted to expand on that topic with a point of view on how current PaaS pricing schemes hinder adoption.

I’ll put my main theme right out here: Most PaaS solutions have a fundamental problem estimating operating costs at production scale.

There’s an implied “grand bargain” for cloud customers who expect an economic advantage for choosing a cloud PaaS service over a comparable cloud IaaS equivalent. From an anecdotal perspective that seems true. When using PaaS you expect lower people and development costs. PaaS is supposed to provide a price advantage because extensive operational efficiencies are supposed to lower costs. This is because massive physical and human expense are spread across many many customers. It’s a text book example of the “economies of scale.”

But wide-spread PaaS adoption is being hindered because cloud architects can’t wrap their minds around reliable cost estimation. Cost calculators, without real-world at scale metrics, give a false economic security.

Imagine this Scenario:

It’s the beginning of a software project and the architects determine a document database is needed. The choices are between using IaaS to host your own software, or use a turn-key PaaS solution and basically program to it’s APIs. In the agile-way, quick iteration is expected, starting with an immature design. But the architects need to put a stake in the ground as time is of the essence. Since this is a cloud project, costs are easily modeled for an IaaS approach with this formula: (compute hours + storage hours + software = IaaS costs.) Estimating PaaS costs is more difficult. There are cost calculators, but the inputs to get a valid answer are only *guesstimates.* A PaaS costing formula looks like this: (storage hours + number of API calls * price per API call + the duration of queries). The problem is PaaS has multiple cost drivers. IaaS has costs accruing over time. PaaS has cost accruing over time, but also two to three other components that contribute to the monthly total. It’s the difference between regular chess and three dimensional chess. Spock might be able to figure it out, the rest of us find it more challenging.

The project architects choose the IaaS approach because it feels more comfortable. PaaS pricing was too difficult to model with no real-world metrics.

IaaS Operational Costs Compared to PaaS

One big advantage PaaS services tout is the reduced operational costs compared to IaaS.

IaaS and PaaS solutions, in comparison to alternative on-premises DIY, forced the IT industry to have a honest conversation about calculating “all-in costs.” Prior to IaaS/PaaS costs were based on hardware expense, software license fees, co-location charges and a very intangible cost of people expense to support a full DIY system. IaaS came before PaaS, and with IaaS it became much easier to tabulate the price difference between DIY and IaaS. But with IaaS there is still the people expense and the same challenges. PaaS extends the efficiencies realized with IaaS and also starts to deal with absorbing the people costs.

It’s safe to say DIY has higher people costs than IaaS, and IaaS has higher people costs than PaaS. But the difference between DIY and IaaS is dramatic, while the difference between IaaS and PaaS is smaller. Almost to the point of a diminishing return. This is because the state of DevOps in the cloud is really great. IaaS automation has improved so much in the last few years the operational cost difference between an IaaS solution and PaaS is shrinking.

Mature DevOps teams are now managing IaaS with super efficiencies. Teams of five are managing thousands of nodes and petabytes of storage in the cloud. Implementing our theoretical document database in the cloud using IaaS will not add much more burden to a fully-automated DevOps team. PaaS solutions claim to lower operating expense, but I’m not sure the decrease is as dramatic as the PaaS vendors want us to believe.

Examining Several PaaS Pricing Models

I was spurred to write this post because of Amazon’s new database as a service offering DynamoDB, announced this week. DynamoDB (DDB) joins SimpleDB (SDB) and Relational Database Service (RDS) as three PaaS offerings with various use case appeals. They also have different cost models. And for each PaaS offerings there is an “IaaS equivalent.” Let’s look at these PaaS databases and compare to the IaaS equivalent.

DynamoDB (DDB)

DDB is the PaaS equivalent to running CouchDB, Cassandra, Riak etc. on your own cloud compute (EC2) instances. There are many published reference architectures that show how to deploy and operate a successful system. Each architecture can show accurate monthly costs, where the scale boundaries exist, etc. The IaaS to PaaS tradeoff is that you will need to supply more operational overhead for CouchDB compared to DynamoDB. But as we have seen, DevOps are getting pretty good managing these cloud databases and the operational overhead doesn’t feel as onerous as it was just a few years ago.

DDB is priced in the following ways:

Write (storing) Data: $0.01 per hour for every 10 units of Write Capacity
Read Data: $0.01 per hour for every 50 units of Read Capacity
Storage: $1.00 Per GB-month

Figuring out the true operating costs at scale will need detailed metrics from the application that will be using the API’s. For many teams, the alternative IaaS approach will feel more comfortable because costs can be extrapolated. From my own experience, even a mature (at two years) software stack has challenges getting good metrics for cost estimation. This is an example of the three-dimensional price structure. We have a classic chicken and egg scenario. We can’t take a leap of faith that the PaaS all-in costs will work themselves out at scale. And we don’t have good metrics up-front to know how many “read or write units per hour” will be consumed.

SimpleDB (SDB)

For years I have looked at SimpleDB and tried to figure out how to make it work for my projects, and always reverted to a SQL or NoSQL solution on IaaS. The fear of SDB was not knowing true costs at scale. I can model the storage costs, but was uncomfortable with estimating machine hour expense. It was easier to implement a NoSQL solution on EC2 and pay the one-time DevOps tax for automation.

SDB is priced as follows:

Storage: $0.250 per GB-month
Compute: $0.140 per SDB Machine Hour (the compute time used to execute commands)

Relational Database Service (RDS)

RDS is the easiest PaaS database to model. Because the pricing model looks more like IaaS than DDB or SDB. Subsequently there are a lot of RDS users because choosing RDS is easy to cost-estimate.

RDS has this price model:

DB Instance Type: Hourly rate for small, medium or large
Storage: $0.10 per GB-month
I/O Rate: $0.10 per 1 million requests

RDS is interesting because even though there are thee price dimensions, the I/O rate can be estimated by using a MySQL reference standard for the same application. And if RDS somehow proves to be cost prohibitive at scale, the alternative is a relatively easy MySQL swap out.

RDS also has the concept of reserved instance discounts that can lower costs up to 49%. RDS is a nice compromise between IaaS and PaaS. The other PaaS database offerings need to move in that direction.

Summary

With IaaS we know with more certaintity how much CPU and storage will be needed. This means costs can be accurately estimated.

PaaS systems with charges based on API calls, storage, compute time, etc. mean starting a journey without know the end point. Its a leap of faith.

 PaaS solutions will need to be siginificantly less expensive to become attractive to engineering teams that have a propensity to want to “own more of the stack.” When the economics change, then PaaS will excel.