Blog Series

Cloud Cost Savings In Action

This morning Amazon Web Services notified its cloud customers a new CPU configuration is available in all regions. This new virtual CPU type is hi1.4xlarge, and is significant in a number of ways. Amazon heard from customer a high I/O, low latency configuration would be ideal for applications like relational and NoSQL databases. It’s also the first EC2 instance type to use SSD storage. Netflix, like Sonian, a beacon of cloud success, has already shared a great benchmark study showing how this new instance will improve performance and lower costs.

Wow… more performance… and lower costs. This trend tracks back to a previous post I wrote about active and passive cloud cost savings. The introduction of this new instance type creates an “optimization opportunity.” If we cloud customers are willing to invest engineering resources to optimize our software around a new instance type, that is an example of “active savings.” We have to apply effort to realize a cost reduction. On the other hand, if AWS simply lowers the price of an existing instance type, that is an example of passive savings. Just happens automatically.

This is the cloud’s grand bargain. Cost efficiencies flow from infrastructure provider, through the application layer, to the end customer.

Reflecting on One Year of Cloud Cost Optimization

For the past year I held the unelected position of “Cloud Cost Czar.” I have written about the duties such a role entails in A Day in the Life of a Cloud Cost Czar. Recently I handed over the cost czar responsibility to a colleague who will carry on the daily routines and continue to improve our cloud cost management endeavors. In the handoff process, almost a year to the day of assuming the czar’s responsibilities,  I reflected on the previous twelve months and all the accomplishments the company made as a united team to “tame the cloud.”

I created a graph to visualize the dramatic change over one calendar year. To the right is an area graph that shows subscriber seats (in green) overlaid on subscriber costs (blue, orange and red; our principle costs are cloud compute and two types of cloud storage.)  As subscriber growth increased, costs went up, peaked, and then went down over the course of one year. The rise, peak, and subsequent decline all map to various cost cutting efforts initiated by Sonian engineering and support groups.

Throughout the year we got smarter on how to “purchase” compute time for less than retail, how to store more customer data while consuming less cloud storage, and how to process more customer data using fewer CPU hours. In the cloud, we re-affirmed with a high-five on each improvement, we were in control of our cost destiny. This is when the phrase “infrastructure as code” really means something.

Read more…

Comparing 6 Cloud App Marketplaces

Enterprise application marketplaces are sprouting up like Spring-time Daffodils. The latest entrant is Amazon Web Services’ AWS Marketplace. Amazon the e-tailer is no stranger to broad e-commerce initiatives, having conquered books, home goods, electronics, digital media and most recently mobile. (Aside: All indications show Amazon’s new Android Marketplace is off to a great start after a somewhat lukewarm industry reception.)

Many of the newest cloud apps are launched in the AWS cloud. AWS has done a great job courting startups onto their cloud platform. With the AWS Marketplace, Amazon is helping its customers be more successful by giving visibility to both small and large companies who choose AWS for their cloud infrastructure. The AWS Marketplace will also further cement customers into the AWS cloud, since Marketplace participation requires an AWS account. You can’t sell a non-AWS hosted application in the AWS Marketplace. Recently AWS has been publicly advocating the idea of “take your data/app” with you, but in reality moving a complicated SaaS application with a large data footprint from one cloud to another is no small feat. The AWS Marketplace is one more glue point between ISV and AWS.

Apple’s extremely successful iOS App Store, along with iTunes, paved the way for the current marketplaces targeting enterprise customers. is the poster child for business application marketplace success.

I found six “cloud” themed business oriented marketplaces which are described below in alphabetical order. Across these six marketplaces we do see a recurring theme: marketplaces are tied to their underlying technical platforms, and there are none that support a “cross platform” environment. Google, Box and Salesforce each allow the others to sell into their customer base, but all require a technical hook into an API or account.

  • AWS Marketplace
  • OneBox
  • Chrome Web Store
  • Google Apps Marketplace
  • AppExchange
  • Zoho

1. AWS Marketplace

What is it?

The AWS Marketplace aggregates and curates thousands of applications powered by the AWS cloud.

Amazon has powerful e-commerce tools for subscription management, billing, shopping carts and customer ratings which AWS customers can use to get more third-party customer traction. The AWS Marketplace compliments DevPay and paid AMI’s with a robust e-tailer user experience.


  • AWS Account
  • Application must be running within the AWS cloud

Pricing Model?

Application publishers choose their own price. Currently ISV’s can sell a paid AMI in which case Amazon generates revenue from the EC2 costs when the application is running on an EC2 instance. For turnkey SaaS applications, the AWS Marketplace acts like a referral business, in which case the revenue to AWS is indirect.

Interesting note:

The AWS Marketplace and Amazon Partner Network both launched within days of each other. Amazon is accelerating innovation on multiple fronts for its juggernaut cloud platform. The startup community is pretty much a lock-in. Now the goal is to expand to the enterprise, and Partner Network and Marketplace are two steps toward that goal.

Read more…

A Tale of Two Cloud Search Engines

Sonian Cloud Search and Amazon Cloud Search. Their names may sound the same, but they  couldn’t be further apart in terms of how much they cost to operate and their intended use cases.

Sonian is a veteran “Cloud Search” pioneer. In 2008 we launched the first version of search in the cloud, and today the service operates simultaneously across multiple public clouds using a single reference architecture.

Over the past 4 years we have perfected cloud search scaling and cost efficiencies. It’s been a steep learning curve, but well worth the effort. Today there are over seven billion documents indexed, with fifteen million new documents added each day. Daily index and retrieval volumes rise as new customers sign-up for the service.

The secret to Sonian Cloud Search mastery is a combination of open source and IP developed in-house and detailed metrics to show us information on cost and performance. Every few months improvements are deployed to lower costs and increase reliability. We’ve achieved per-document unit costs to fractions of a cent.

Read more…

FISMA Chronicles: Prologue – Quick Immersion into a New World

In December 2002 the US Congress passed the Federal Information Security Management Act (FISMA).  FISMA requires each government agency to implement policies, procedures, and documentation for information security. This includes internal and external government-run systems, and systems provided by third-party providers.

A flourishing information security practices industry has developed in FISMA’s wake to help guide the government and vendors through the numerous, byzantine certification activities.

The FISMA mission statement is to:

Protect the Nation’s Critical Information Infrastructure

FISMA has three assessment levels and risk profiles.

  • Low – Procedures to manage public-facing government websites, such as
  • Moderate – Best practices for managing sensitive data and personal identifiable information such as credit card numbers, social security numbers, etc.
  • High – Strict policies for managing military, intelligence and classified information.

The majority of internal applications require FISMA Moderate. The moderate certification process is the focus of this series.

The moderate risk profile means addressing over three hundred controls ranging from information handling, physical media management and threat assessments. The hundreds of controls are categorized into the following “control families”:

  1. Access Control
  2. Awareness and Training
  3. Audit and Accountability
  4. Security Assessment and Authorization
  5. Configuration Management
  6. Contingency Planning
  7. Identification & Authentication
  8. Incident Response
  9. Maintenance
  10. Planning
  11. Personnel Security
  12. Risk Assessment
  13. System and Communication Protection
  14. System and Information Integrity

Many start-ups address the above in various levels of completeness, but may not necessarily have all the supporting documentation to prove compliance. For SaaS systems operating in a cloud environment, the challenge is to describe the control boundaries between the cloud provider and the application layer. For example FISMA requires a policy for physical media disposal. The app layer (i.e. the cloud customer) doesn’t have access to physical media in a cloud environment, so that control is the responsibility of the cloud provider and the app layer inherits the control.  Conversely, the cloud infrastructure has no control over the app layer, and the FISMA requirement to support two-factor web-app authentication is the responsibility of the app layer, not the cloud provider.

FISMA wasn’t designed for a world with cloud computing. It’s heritage back to 2002 is a world with hardware-centric design principles and best practices. Sonian and others are pioneering FISMA Moderate certification in a cloud environment.

Topics I will cover in upcoming issues of the FISMA Chronicles:

  • The impact cloud computing has on FISMA
  • How “agile” start-ups manage ongoing FISMA compliance requirements
  • FEDRamp is the next step to consistent FISMA-like accreditation


Image Credit:

Creating Differentiated IP in a Cloudy World

Increasingly, cloud-based start-up intellectual property is a mixture of proprietary technology and “IP” created by tuning open source modules to work on various cloud environments. This includes configuration settings, tuning parameters, architectural designs, automated deployment scripts and a best practices “run book.” The combination creates trade-secrets in the form of code and operational best practices for Amazon Web Services, IBM SmartCloud, Openstack and other clouds.



A fusion of recent meetings with investor audiences:

  • Investor: “What’s the secret sauce here?”
  • Cloud Start-Up CTO: “It’s a combination of open source and home built technology.” [Projects pretty diagram of all the platform components and their function.]
  • Investor: “What’s the protectable IP?”
  • Entrepreneur: “It’s not a single patentable idea. It’s a collection of best practices, proprietary code, and of course, open source.”
  • Investor: “What keeps a competitor from copying you?”
  • Lean Start-Up CTO: “Hard to say. We have five years accumulated experience that you can only get by living through five years of growing up with the modern public cloud.”
  • Investor: “How long would it take for someone else in their garage to build the same thing?”
  • Curious CTO: “Probably three years. A hypothetical team on the same mission will have about two years decreased learning experience compared to our journey. Do you have any other cloud-themed investments?”
  • Investor: “What’s the secret sauce here?”
  • Exasperated CTO: “I think we already covered this?”
  • Investor: “I’m still not getting it. What’s the proprietary IP?”
  • Frustrated CTO: “Let me try to explain. There is no off the shelf “playbook” or Cliff Notes for what we are doing in the cloud. Best practices are learned in real-time, just in time. Each cloud has it’s own unique operating characteristics. Think of each cloud having its own laws of physics. There are some similarities, but also many important differences. And understanding these differences determines success or failure.”
  • Investor: “I want to be part of this project. But I don’t quite get what’s special here, and what can keep competitors from catching up. Help me get to…. Yes.”
  • Enlightened CTO: “In the cloud, differentiated IP is created with the combination of proprietary code, open source, and tuning the entire system to work simultaneously on many cloud environments, and using one reference architecture. It’s our trade-secrets expressed as a system run-book and operational best practices for Amazon Web Services, IBM SmartCloud, Openstack and the the other clouds.”



New Cloud Rules: Replace Instead of Fix

Here’s an all too common scenario from the “cloud chronicles.” A virtual machine that has been operating just fine for days, and has 50 other identical twins with the same configuration, starts to exhibit problems. Slow virtual disk performance. Network brown-outs. Disconnecting and reconnecting within it’s functional cluster. Monitoring systems alert on degrading performance, and the knee-jerk response is to jump on the box (nee VM) and start to troubleshoot the issue. The problem is, spending any time troubleshooting an anomaly in the “cloud” is the wrong reaction. In the cloud, the first response, when a node starts to exhibit erratic behavior, should be to replace, not fix.

Replacing, instead of fixing, goes against the ingrained habits of over two decades of entrenched IT best practices. In the pre-cloud world, when real hardware was the base, we had to “fix IT” because replacing was too expensive and not practical. There was not an endless pile of spares lying about for a “replace IT” mindset.

But in the cloud, with, in theory, nearly infinite CPU, the remediation to an errant node should be to immediately replace, and move on.

Why Is This?

Because there are too many causes beyond our control at the OS level in a cloud environment. Think of the cloud like living in a high-rise building. Each unit in the building, just like each cloud customer, can have whatever interior they want, but there are also massive shared resources in the building. So while our interior may be a candidate for the next architectural digest cover, our neighbor could “kill our chill” with a too-loud stereo boom box. The cloud suffers from the noisy neighbor problem just like our theoretical high-rise. But in the cloud, we can choose to move and jump back into the random lottery for a new unit. We can’t change the building, but we can change the location within the building.

Of coure, you need the right cloud-centric architecture to be able to simply “replace IT” instead of “fix IT.” Having cloud-dexterity is critical to operating a successful cloud deployment.

The cloud requires us to “un-learn” the best practices of the past and embrace a new way of thinking about “break fix.” While replacing instead of fixing may seem wasteful, it’s really not. The time spent troubleshooting the random problem will not yield significant insights, and could be better spent focusing on more value-add projects. Usually after extensive diagnosis, the only recourse is to replace the node, since the original problem was an outlier.

You have just finished reading “New Cloud Rules: Replace Instead of Fix.” Please consider sharing a link to this post.


Cloud Success Requires Cost-aware Engineering

This is a true story from the “Cloud Cost Czar Chronicles.”

Our S3 “penny per one thousand” API costs started to rise rapidly in the second half of the cloud infrastructure billing period. We have seen this behavior before, and knew this could be attributed to increased usage, a new defect, or a design flaw that rears its head at a scaling tipping point. My job as “cost czar” is to raise the alarm and work with the team to figure out what was going wrong. At the observed rate of increase, the excess charges would push the monthly bill beyond the budget. One thing we have learned in the cloud, is that costs can rise quickly, but take awhile to go down, since the deceleration effect can be out of proportion to the acceleration if trying to manage expense in a single billing period.

When we started using Amazon Web Services S3 (a PaaS object store) back in 2007, we were acutely aware of the three pricing vectors in effect; storage consumed, price for API calls to store and list data and price for API calls to read and delete data. We’ve been using S3 heavily for five years and we tried to model the “all-in” costs as accurately as possible. But “guestimating” costs beyond the raw storage was stretch. PaaS services have an intrinsic “social engineering” element. If you color outside the lines the financial penalty can be significant. But if you master the pricing game, the rewards are equally as significant. So five years ago we thought as long as we point in the right general direction, “we’ll figure it out as we go along.” Some assumptions proved a positive surprise. Raw storage costs went down. Some surprises not so pleasant; continually wrangling the API usage fees, especially the transactions that cost a penny per thousand, proved to be a constant challenge. But I still like my options with S3 compared to buying storage from a hardware vendor and having to incur the administrative overhead. With S3 we can lower our costs by smarter engineering. With storage hardware, the only way to lower costs is to wrangle a better deal from an EMC sales person. As one of the original “cloud pioneers,” Sonian is not alone in this effort, and it’s been a real eye-opener for software designers to have to think about how their code consumes cloud resources (and expense) at scale. Because whether a penny per thousand or penny per ten thousand, when processing hundreds of millions of transactions a month, any miscalculation suddenly brings a dark cloud raining over your project.

Read more…

Cost Transparency in the Cloud

Today Amazon Web Services lowered S3 “standard” pricing for storage volumes less than 450 terabytes. Standard service (STD) is the very reliable “eleven-nines” SLA. This is the original “gold standard” for cloud-based object store. S3 is a great example of Platform as a Service (PaaS) storage. This price decrease is interesting. In the past, instead of lowering the price of the standard service, Amazon creatred a new class of storage, Reduced Redundancy (RRS), with a different SLA and a different price. RRS is “four nines” of durability, and a lower price for less durable than “eleven nines.”

RRS was the recognition that cloud customers didn’t need “one size fits all” storage, but instead would benefit from different types of building blocks with lower price points, and varying service qualities. But in order to realize the lower price for RRS, AWS customers needed to write code or change behaviors. So today, AWS gave us a gift. The same reliable service we used yesterday, is five to ten percent less expensive today. With no work. We didn’t have to write one line of code.

This very public price reduction got me thinking about cloud cost transparency; in the cloud, all customers in the distribution chain know all the underlying costs. Our customers know how much we are paying for storage. So do our competitors. What this means is the cloud, different than the old co-located world, propels a new era of transparency and a healthy “checks and balance.”

Price transparency forces each value add layer in the cloud to amplify the innovation.

Amazon clearly found a better way to manage data and passed some or all of the savings to the customer. In turn, Sonian as a good cloud customer, will amplify with a positive change to our flagship archiving service.

A 2007 Multi-Cloud Fantasy Becomes a 2012 Reality

Five years ago I wrote a business plan that described an archiving SaaS project built on cloud computing. In 2007 that was an uphill battle to convince prospective investors “the cloud was the future.” And at that time there was really only one cloud, from the e-commerce giant Amazon. Amazon Web Services really started the modern cloud movement. No existing IT provider (IBM, HP, Microsoft, Dell, etc.) would have had the gusto to upset their current business model with a “disruptively priced” cloud option. For the past four years those IT giants fought the cloud momentum until they had a credible cloud themselves. But for a lean start-up getting funded five years ago, it wasn’t a stretch to assume other clouds would appear to take on Amazon.

The graphic above was my crude way to visualize how a cloud-powered digital archive, anticipating someday living on multiple clouds, could in essence become a “cloud of clouds.” A lot of positive breakthroughs would need to occur to be able to successfully operate a single reference architecture software stack across more than one cloud. There was no terminology to describe this desire. We weren’t using terms like “Big Data” or “DevOps” nor many of the acronyms that today are common lingo in our modern cloud-enabled world. The business plan depicted a system designed to manage lots of data, and being an enterprise document archive, the data itself was large in size and numerous in quantity. We probably started one of the worlds first cloud-big-data projects.

In the beginning the multi-cloud goal was a fantasy dream, a placeholder for a future that seemed possible, but the actual crawl, walk, run steps not precisely defined because we didn’t yet know “what we didn’t know.”

So why in 2008 were we thinking about “multi-cloud?” The answer is we wanted to avoid single vendor lock-in and maintain a modicum of control over our infrastructure costs. The notion of an evolving multi-cloud strategy meant the ability to seek lowest cost of goods from multiple cloud vendors. In the pre-cloud IT world, when services were built on actual hardware, pricing flexibility was derived by negotiating better deals with hardware vendors. The customers didn’t know or care that their SaaS app might be powered by HP sever one day or a Dell 1U box the next. Those decisions were up to the discretion of the SaaS provider to get the best infrastructure value by shopping vendors. But in a single cloud, when there is only one choice, there’s no ability to negotiate between multiple vendors, unless you have multi-cloud dexterity.

Multi-cloud capable means the necessary infrastructure and abstraction layer is available to run a single common reference architecture on different clouds at the same time, with one master operator console. Multi-cloud is almost like, but not exactly, the concept of running a common program across IBM, DEC, Control Data mainframes. The clouds today somewhat resemble massive time-sharing mainframes of the previous decades.

Our early start five years ago, and all the hard lessons learned since, allows us to easily assume a commanding position in multi-cloud deployments. Engineering teams just now starting their “cloud journeys” will learn from us pioneers, but there is an old saying; “until you’ve walked a mile in my shoes, don’t claim to know anything otherwise.”

Read more…