Blog Series: Cloud Masters

The Evolution of Purchasing Cloud Compute

In the beginning, as it were, just five years ago, purchasing cloud compute was simple because the rules were easy to understand and there were no choices. There was one cloud compute instance type that cost ten cents an hour. And a credit card was the only way to pay your monthly cloud compute bill.

Today there are a myriad of compute instances to choose from and multiple ways to pay for your cloud CPU time.

“Time…”

It’s the key word in the previous statement. The paradigm shift toward cloud computing away from the old dedicated co-lo world is bringing back the concept of purchasing compute “time.” Seasoned IT folks know there is nothing new to the concept of “buying” computer time. In the mainframe era (would it be hurtful to call it the IT Jurassic age?), computer time-sharing was the norm, and developers had to be mindful of how much computer time their programs consumed because mainframe’s were very expensive. In today’s dollars the equivalent of hundreds of dollars per hour.

Steve Wozniak, in his autobiography iWoz, tells a funny-turned-serious anecdote from his University of Colorado at Boulder freshmen year. He couldn’t return for his sophomore year because a program he executed on the University’s timeshare mainframe excessively consumed shared compute resources. The fees charged to his computer science department were astronomical for the era; more than $10,000.

Today’s cloud has the same gotchas: Watch out for excessive consumption. Once an hour has been consumed, there is no “return policy” to get your money back.

The cloud is pulling us back to that “time-sharing” mindset. We’re not buying 1U servers anymore. Instead, we’re buying virtual compute processing time based on haw many hours in a month a CPU runs, regardless of how much work was performed.

In the cloud, time is the unit of consumption and the month is the billing period.

Read more…

AWS EC2 Fleet Upgrade Tests our “Cloud Abilities”

This is an essay that was published to the Sonian cloud compute blog. Cross posting here for this audience.

In the past I have written about the secret to successful cloud deployments and how to architect for the cloud. Being successful requires a “designed-for-the-cloud” architecture, best operational practices and DevOps on steroids.

A couple weeks ago Amazon notified a majority of their customers about an upcoming event that us early-to-the-cloud pioneers hadn’t seen before; a forced reboot of the host operating system. On a massive scale. For Sonian, 72% of our currently running EC2 instances will need to be restarted before Amazon’s deadline. There is no reprieve. There is no deferment. Welcome to Infrastructure as a Service!

Our AWS business development contact gave us an early heads-up, and Twitter lit up when the first email notices started to arrive for the US-West region. Something big was afoot. And a lot of groans from the EC2 user community. First let me state flat out that Amazon did a pretty good job getting the word out and provided several methods to know which EC2 instances would need to be restarted. An email was sent with the list, the EC2 Management Console displays the information, and the EC2 API ‘Ec2-describe-instancestatus’ field has the information. Fortunately Joe Kinsella (@joekinsella) enhanced our Cloud Control Viewer and provided a report showing the exact instances and their reboot schedule.
Of the various reboot types, the most invasive is the one that moves the virtual host to new hardware. That will force a change in IP address and ephemeral storage is lost. This activity will certainly shake out any bugs in automated deployments, hard-coded settings, and sloppy shortcuts.

We had to scramble in order to assess the impact. All we learned from the email notice was that a portion of our EC2 instances would need to be restarted. Actually there were two types of restarts. An operating system reboot, which would preserve the non-persistent ephemeral storage, and a more invasive full instance restart (meaning the hardware under the hypervisor would power-cycle) which would not preserve the ephemeral storage.

One of the major mistakes cloud customers can make is to get complacent and treat the cloud like traditional co-located hosting. The cloud has different operating characteristics, what one could call the “cloud laws of physics,” and this forced restart is a good example of this principle in action. It’s also a wake up call to not get lazy. A large scale forced restart is like an earthquake drill. Practice makes perfect, and if this were an actual un-scheduled emergency, then we would be scrambling.

Despite the headache, this event has some positive spins. First it’s encouraging there is an “EC2 fleet upgrade.” This means newer underlying hardware. Perhaps faster NIC cards in the hosts. But for the companies like Sonian that started in the cloud circa 2007, some of our original instances that have been running for more than a year needed a “freshening.” This event reminds us there is a “hardware” center to every amorphous cloud. Amazon just does a great job to allow us to not have to think about that too often, except for times like these. A stale part of the cloud gets a refresh.

The second “benefit” is the forced fire drill. I know, there’s never a good time for the fire drill. But this type of event has similar qualities to an unexpected outage. There is some luxury to pre-planning, but the shake-out will be the same. Something will be discovered in your architecture or deployment practices that will get improved by this reboot activity. Clusters may be too hard-coded. Config settings may be to restrictive. Reboot scripts may not work as you think.

Sonian survives unscathed due to our maniacal focus on 100% automated deployments, 100% commitment to “infrastructure as code,” and an investment in cloud control tools that allowed us to triage the situation and develop an action plan relatively quickly. We also employ the best darn DevOps team the cloud has seen.

Cloud Innovation Acceleration Effect: Now Releasing 100 Stories

Cross-posting here a two part essay I wrote for the Sonian blog on how Sonian is benefiting from, and contributing to (by amplification,) the innovation cadence in cloud computing.

I’ve been working in enterprise software since the late 1980′s, and what I am witnessing as a participant in “the cloud” is the pace of cloud technology innovation over the past five years blows away the previous two decades.

There is a real noticeable trend here. We didn’t see this in SaaS powered by co-location hosting. What we are seeing with the cloud, and the ISV’s that adopted the cloud five years ago, is truly amazing. Sonian is entering a release cadence updating production systems with substantial new features every month.

Cloud Innovation – Part 1

  • Innovation history of Amazon Web Services 2005-2007
  • How Sonian amplifies cloud innovation
  • Sonian as an example of the “perfect” cloud ISV

Cloud Innovation – Part 2

  • Innovation history of Amazon Web Services 2008-2011
  • Comments about Gov Cloud

 

 

 

A Brief History: Cloud CPU Costs Over the Past 5 Years

When I started Sonian in 2007, one of the driving forces for beginning what would become my third start-up journey was the allure of all-you-can-consume “ten-cent-per-hour” cloud computing. Amazon Web Services was the new IT game changer in town, and the on-demand compute platform they launched in August 2006 literally brought cloud computing to the masses over night. These past five years I have been studying “cloud costs” in different ways, and this weekend I looked back at the compute pricing history and uncovered some interesting trends.

Before I continue with this post, here’s a brief history of my experience with previous “clouds,” which illustrates why in 2006 I was ready to take a big leap into the AWS cloud as an early adopter.

In 2004 I was involved with another SaaS information archiving project, and I worked with a team at SUN Microsystems to create a reference architecture for our archive software stack to live on the “SUN Utility Compute Grid.” At the time we were hosting the archiving software on dedicated co-located hardware racks, and planning a large capital expenditure to increase capacity. In the guise of “there has to be a better way!” we entertained the idea of moving our software to SUN’s “cloud.” (In 2004 the term cloud computing was pretty alien…. the common term for this type of shared virtual computing was “utility computing”). SUN offered the promise of true utility computing, but at the end of six months of effort, we could not make the underlying cost structures work. SUN was charging one dollar per CPU hour and one dollar per gigabyte per month for storage. We ended up adding more capacity to our existing co-located hardware plant because our “all-in” internal unit costs were less than what SUN was willing to sell their compute grid for.

Now back to the purpose of this post… a historical analysis of the cost of cloud CPU from 2006 through 2011.

Beginning in August 2006, Amazon Web Service’s new Elastic Compute Cloud (EC2) service introduced the concept of the EC2 Compute Unit (ECU) …. a standardized way to define a unit of cloud computing, the associated characteristics of that unit (processor speed and memory), and a revolutionary hourly cost model requiring no up-front expense. Amazon achieved what SUN, IBM and others had been talking about for years, but could never bring to market. In 2006, for ten cents per hour 1 EC2 Compute Unit could be rented with no up-front costs. In 2006, a single ECU was defined as equivalent to a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor with 1.7  Gb of RAM. This 1 ECU reference is still in effect today.
Read more…

The Secret Life of a Cloud Cost Control Czar

It’s a little after 7 in the morning and I tap the space bar key to wake my Macbook Air from it’s slumber. I click a tab in Chrome, hit refresh, and with a slight pang of “what will I see,” look at the balance for our October cloud infrastructure bill. You may know this feeling… think about a recent time opening the credit card bill and dreading an unwanted surprise.

“I don’t think I overspent this month, but… there was that steak dinner in New York City …”

Monitoring the rate of spend to make sure we’re not going to break the budget is one task in the routine as the “cloud cost czar.” It’s a daily task to track the trend lines and sound the alarm if expenses start to creep off plan.

The “czar” is the human gas pedal – modulating the enormous pulsing “cloud software engine” as we process half a terabyte of data a day.

But I am not a solo act in this cloud pageant. Making the cloud work from an economic perspective is a total team effort. It all starts in the engineering group with cloud-appropriate core architecture designs. And continues with quality testing, and and then to the team that manages daily operations. Everyone plays a role and has responsibility for our prime directive: process the most amount of data at least cost, without sacrificing customer satisfaction.

“Obsessively” managing cost is one of the three design requirements, alongside reliability and performance. “Gaming the cloud,” our internal slang for all we do to maximize efficiency, is a multi-disciplined effort the engineering and service delivery teams rally around. But there has to be a least one person who focuses on the trends, the 30,000 foot view down to sea-level: The Cost Czar.

Read more…

Cloud Eyes Wide Open

The oft-used axiom “hindsight is twenty-twenty” is proven once again . With 20/20 vision looking back over the previous four years, 2008 through 2011, perspectives on cloud computing come into sharper focus. There is no question “cloud computing” is a revolutionary advance in the way businesses and consumers utilize computing resources. And all the previous technology “revolutions” that preceded the cloud were required in order to make cloud computing a reality. By previous revolutions I mean the Internet/Web, open source, cheap and reliable bandwidth, and commodity priced hardware. If back in 2004 one had an astute crystal ball peering forward into 2012, the natural leading edge thinking would have seen that cloud computing was the inevitable conclusion of all the afore mentioned “computing revolutions.” Plus the crystal ball would have given us a glimpse of Amazon’s Jeff Bezos and Werner Vogels plotting their disruptive cloud mission.

In 2008 SaaS startups were “glamoured” by the cloud. Especially by startups where founders had previously created SaaS applications that required building dedicated hosting infrastructures at great expense and distraction. The cloud looked to be a panacea for all the hassles of operating a data center. Need more compute, just make an API call and a new virtual computer comes to life. Need more storage, make an API call and terabytes of quality file systems are yours for as long as you need them. But what was unknown about the cloud, and not clearly visible in the 2008 crystal ball peering into 2011, is that the easy parts of the cloud started to work against our collective best interests. With dedicated hosting, the simple act of “adding more infrastructure” had an established  purchasing approval workflow; budget with the CFO, negotiate price with the vendors, track UPS shipments, and pay an invoice 30 days later. That’s a lot of friction in a fast paced environment, but the purchasing controls (in hindsight) created an accountability layer the cloud lacked.

For the Sonian project, we needed to create purpose built tools to help manage costs and reduce complexity. Some of these tools will be open sourced in a “pay it forward” contribution to the community. Adding to this, the industry is starting to see startups emerge that focus on cloud management systems. The cloud solves many pain points, but the cloud itself has pain points too. The eco-system around cloud computing is innovating quickly and it will be exciting to see what comes next.

Despite the learning curve, mastering the cloud for the right use case is worthy of any and all efforts. The cloud, combined with the right use case, and now the right tools, puts all the right incentives in place to deliver customers “more for less.” Who doesn’t want that in this current economic climate?

Gaming the Cloud: You Need “Cost Aware” Applications

This is the second post in my “Gaming the Cloud” series. You can read the first post here.

There are two primary reasons to adopt cloud computing for SaaS applications: One, save money and two, be more reliable (and the great aspect of the cloud is both can be achieved with the same engineering effort.) There is no reason to use cloud computing unless you have a “cost aware” application. If you use the cloud to power traditional enterprise software you won’t save money, and will probably be less reliable too. See my previous post in the “Game the Cloud” series about having the right use case. Go read it now and come back. Do you have the right use case? If so let’s continue. After validating your use case, the next step in your cloud journey is to design your application to be elastic and at the same time “cost aware.”

So what exactly is a cost aware application and why should you care?

In the old world of SaaS, using traditional co-located data centers and co-mingled hardware, it was nearly impossible to figure out at a granular level how much each software component costs itself to run. With the cloud this all changes in a very positive way. As compute and storage are consumed in small units, and each of these units has a cost (for example compute at ten cents an hour or storage at fifteen cents a gigabyte) it’s a requirement to think about software designs that focus on operational efficiency because we can now measure costs at the atomic level. When I started Sonian in 2007, with a mandate to be purely cloud focused, we had access to 1 CPU type. Very quickly, our cloud provider Amazon offered more CPU variety and our reference architecture matured in real-time as we were able to optimize the software to match virtual compute units that had more memory, more cores, or both.

A cost aware application is software with an inherent design to “game the cloud” and be ultra efficient on every transaction. This in essence means granular workload management, the ability to right-size the CPU profile for the task, and take advantage of several long-term cost management features offered by the cloud infrastructure providers.

Read more…

Gaming the Cloud: Start with the Right “Cloud” Use Case

The other day I met with a group of Boston start-up CTOs to share ideas on technology and team building. Typically these discussions veer into helping each other with technical challenges scaling SaaS software.  Of all the companies present, we’re all pushing the envelope with big data, large subscriber bases, and managing lots of cloud infrastructure. Many great ideas were exchanged, but one major theme resonated: The cloud may not be the best place for every kind of software stack. The group represents many different use cases; big data archiving, analytics, social networking for niche audiences, video encoding, application performance analysis, Advertising sales networks, and others.

To understand how we got to this place today, let’s step into the time machine elevator and press the button for “Year 2008.” Down, down we go and the doors open to “the cloud” (and specifically Amazon Web Services) just coming onto the tech scene. All of a sudden software architects and developers could control their own infrastructure (and not have to hassle with hardware sales reps or CFO’s and their purchase orders.)  We technologists, myself included, with big ideas “projected” our hopes and dreams onto the cloud as the panacea solution for all our infrastructure needs. We understood for the most part utilizing the cloud would require new ways of thinking about software architectures. We also had “real world” time-to-market pressures to get our service running quickly and start to prove out business viability. The cloud has tremendous potential to help accomplish great things, but used incorrectly, the cloud could cause a lot of (financial, technical stability) harm.

In the “gold rush” to the cloud we (the nascent start-ups pioneering in the cloud) were all learning at relatively the same time the same “lessons.” How and when to scale, how to analyze costs and efficiencies, the need for different kinds of monitoring and alerting, how to deploy software updates to a cloud-based environment, the list goes on. Many lessons learned, and the theme of mastering the cloud tilted more toward “let’s figure out how to “game the cloud.” Because there’s no reason to use the cloud unless you can make the economics work, and to make the economics work requires a mind set to “game the cloud;” process more data & transactions at least cost. Easy to write, but very technically challenging and rewarding to pull off.

Read more…

Three Secrets to Cloud Computing Harmony

A Haiku for the big data cloud:

cloud promises quick

beware hidden expenses

elastic apps fix

In 2006 Amazon Web Services flashed brilliance with a “light bulb moment” that sparked the imaginations of leading edge technologists and entrepreneurs. Literally overnight “The Cloud” had arrived. The cloud offered the ability to create, launch and operate SaaS applications in a way never before possible. Using simple and secure API’s a software engineer could harness vast quantities of compute and storage services, on-demand and with no up-front costs, without touching a single physical atom. The cloud allowed small, efficient teams to build an application that could serve a very a large audience.

Within a couple years of launch, Amazon Web Services, nee “the cloud”, brought cloud computing to the technical masses. Thus was coined the phrase “infrastructure as code” and a new reckoning by the old guard enterprise software, hardware and hosting companies that times were changing (and in 2011 hindsight, the times would be changing rather quickly.)

For a technologist, cloud concepts seemed deceptively simple: Just design a software architecture that is tuned for cloud computing operating characteristics. The cloud offered the capability to automatically scale up and scale down. The cloud offered cost efficiencies. The cloud offered incredible reliability. And all these fine attributes were possible without sacrificing one for the other. For software architects used to the traditional co-located hosting design patterns, the cloud was something entirely new to comprehend. The differences are many, but the reward for adopting the cloud and succeeding was greater than the hardship to change our collective thinking.

The core requirements for every SaaS application are scale-up, reliability and efficient infrastructure utilization. Scaling in the cloud means harnessing the on-demand capabilities. Reliability in the cloud means designing for failure by making software mirror what “physical” hardware used to supply in the co-located world. Operating cost-efficiently means “gaming” the cloud to find every place where you can process more work with less compute resources.

But in reality, taming the cloud is not a trivial pursuit. If in this new “cloudy” world infrastructure is code, and invoking more API calls can launch more infrastructure, then we thought we were dining at the “all you can eat buffet.” Heartburn ensues. The best guidance is before you start to build a cloud-enabled application you need the scaffolding in place to “raise the app” with all the necessary supporting infrastructure required to operate a dynamic cloud-based SaaS system. Retro-fitting the augmenting management framework after the fact is the wrong approach for the cloud. This approach worked in the old world with software designed for dedicated hosting, but the cloud is such a different environment the old world thinking does not transfer well to the new cloud world.

Below are three prime areas to focus on when planning to build a cloud-based software stack.

1. Effective Budgeting with a Cost Control System

Compared to a traditional dedicated data center environment, it’s way too easy to spend money in the cloud. “Purchasing” in the cloud is psychologically different when the duality of two mindsets (using purchase orders to buy everything up front versus consume small bits at a time) have to reconcile with the vastly different operating styles of dedicated compared to cloud. In the dedicated environment, big capital expenditures get multiple approvals and are on many people’s radar. But in the cloud, most teams start their cloud relationship with a credit card and pay monthly for the previous 30 days of small micro-charges for gigabytes of storage and hours of cpu time consumed. Read more…

Security in the Big Data Cloud

(ed. A version of this post appears at the Sonian Big Data Cloud blog)

A cloud software company’s worst nightmare came true for Dropbox this past weekend when a software bug allowed anyone to login to an account (over a four hour time period) using any password. It’s unknown if or how many accounts were accessed inappropriately. So far there are no reports of data breaches.

This recent occurrence, coupled with other non-cloud, but seemingly similar themed data breaches as reported by Citi Bank, Sony and LulzSec, has moved the “can the cloud be secure” conversation into the spotlight. The short answer is yes, the cloud is secure, and here is why.

Defining Cloud Security

Data security in the cloud is a combination of “inherited responsibilities” between the cloud infrastructure provider (Amazon, Rackspace, Softlayer, etc.) and the independent software vendor (an ISV, i.e. Dropbox), and the customer.
Data security in the cloud is really two components: resiliency and privacy. Resiliency means when a customer stores data in the cloud, the cloud vendor should not lose that data. Privacy means nobody but the customer should be able to “see” the data stored in the cloud.

The cloud vendor is responsible for data resiliency. Cloud vendors provide Service Level Agreements (SLA) that provide a measure of resiliency so that customers can compare one cloud versus another. For example Amazon Web Services provides a “eleven-nines” of cloud storage resiliency, while SoftLayer offers “five-nines.” These SLAs are far better than what a typical enterprise can achieve in their own data center.

Read more…