Archive for the ‘Archiving’ Category

Reflecting on One Year of Cloud Cost Optimization

For the past year I held the unelected position of “Cloud Cost Czar.” I have written about the duties such a role entails in A Day in the Life of a Cloud Cost Czar. Recently I handed over the cost czar responsibility to a colleague who will carry on the daily routines and continue to improve our cloud cost management endeavors. In the handoff process, almost a year to the day of assuming the czar’s responsibilities,  I reflected on the previous twelve months and all the accomplishments the company made as a united team to “tame the cloud.”

I created a graph to visualize the dramatic change over one calendar year. To the right is an area graph that shows subscriber seats (in green) overlaid on subscriber costs (blue, orange and red; our principle costs are cloud compute and two types of cloud storage.)  As subscriber growth increased, costs went up, peaked, and then went down over the course of one year. The rise, peak, and subsequent decline all map to various cost cutting efforts initiated by Sonian engineering and support groups.

Throughout the year we got smarter on how to “purchase” compute time for less than retail, how to store more customer data while consuming less cloud storage, and how to process more customer data using fewer CPU hours. In the cloud, we re-affirmed with a high-five on each improvement, we were in control of our cost destiny. This is when the phrase “infrastructure as code” really means something.

Read more…

A Tale of Two Cloud Search Engines

Sonian Cloud Search and Amazon Cloud Search. Their names may sound the same, but they  couldn’t be further apart in terms of how much they cost to operate and their intended use cases.

Sonian is a veteran “Cloud Search” pioneer. In 2008 we launched the first version of search in the cloud, and today the service operates simultaneously across multiple public clouds using a single reference architecture.

Over the past 4 years we have perfected cloud search scaling and cost efficiencies. It’s been a steep learning curve, but well worth the effort. Today there are over seven billion documents indexed, with fifteen million new documents added each day. Daily index and retrieval volumes rise as new customers sign-up for the service.

The secret to Sonian Cloud Search mastery is a combination of open source and IP developed in-house and detailed metrics to show us information on cost and performance. Every few months improvements are deployed to lower costs and increase reliability. We’ve achieved per-document unit costs to fractions of a cent.

Read more…

Stolen Macbook and iPads Re-affirms Cloud Backup Strategy

A recent computer theft highlights the critical differences between “backup” versus protecting local & cloud storage from identity theft.

I wasn’t planning to write about this topic for my weekly post, but then “life happens” and this subject is at the top of my mind. My hope is you will learn from my mistakes and save yourself a lot of grief.

Two weeks ago the family computer, a Macbook, a couple of iPads, and an iPod were stolen from my part-time residence. The detective, while dusting for fingerprints and examining the bent window frame where the thief(s) entered, muttered “typical B & E [breaking and entry,] smash and grab, you won’t see your stuff again unless we’re really lucky.” Unfortunately the burglar alarm wasn’t enabled because at the time of the robbery there was a fierce Santa Ana wind storm, and the over-sized glass windows, offering mountain views, flex with the wind and set off the motion detectors. The robbery took place during a quick run to the store and was probably in-progress when the car pulled back into the driveway. I know this because a pile of other stuff the robber was gathering remained in the middle of the kitchen floor, abandoned because of a sudden exit. If not for arriving at that time, more valuable stuff would have been stolen. A sliding door at the rear of the house was ajar, and police assume that was the exit path.

Event Timeline


10pm: In-progress robbery thwarted, police called

10:10pm – Police arrive and assess the crime scene. Fingerprint dusting, etc.

For about an hour in shock and figuring out what was stolen, and trying to repair the broken window to keep the wind out. Realizing all the computer equipment was gone and started to think about protecting the data.

Midnight – Police leave and ask for serial numbers so they can log the computer equipment into a database for pawn shops to check.

Read more…

“Cloud Killed the (SaaS) Rock Star”

“Cloud Killed the (SaaS) Rock Star”…

… well, not literally, but definitely in a figurative sense.

The press release below is the all-points-bulletin heralding the cloud has “won.” Why do I say this? Because LiveOffice, a non-cloud SaaS start-up, couldn’t compete against the new generation of SaaS start-ups powered by true public cloud computing like Sonian.



LiveOffice was the rock star of SaaS archiving. Ten years in business and they deserve the credit as one of the pioneers to legitimize the SaaS market. When LiveOffice launched a decade ago, they had to operate their own data centers. (This is called “Co-located Powered SaaS.”) But during the past five years, the world changed underneath them. Usually, market dynamics cause this kind of disruption, but the SaaS archiving market size didn’t get smaller, rather it’s bigger than ever. What changed starting in 2007? The advent of the public cloud. Suddenly, any SaaS company running their own data center became vulnerable to competitors able to harness the cloud. This is the beginning of the cloud-powered SaaS era.

Seriously, I wish all the best to the LiveOffice team. Sonian and LiveOffice competed vigorously from 2008 to 2011. Symantec acquired a great team, and the fit between LiveOffice and Symantec makes a ton of sense, and it’s understandable why Symantec made the acquisition.

Although LiveOffice called themselves a “cloud archiving” company, that was stretching the truth. The cloud moniker is so overused at this point, the public is deceived into believing they are using a cloud service, when in fact, it’s really just re-packaging the same old SaaS with a new label.

Why did this Happen?

Operating a SaaS infrastructure on a pure cloud environment is vastly different compared to a co-located system; it’s the reason we’re going to see more of old-world SaaS companies change control or fade away. It will be exceedingly difficult to re-tool a co-located hosted SaaS business to use the cloud. Not impossible, but very difficult. The whole architecture would need to change. I say this having lived in both worlds — with the cloud battle-scars to prove it.

Read more…

Cloud Innovation Acceleration Effect: Now Releasing 100 Stories

Cross-posting here a two part essay I wrote for the Sonian blog on how Sonian is benefiting from, and contributing to (by amplification,) the innovation cadence in cloud computing.

I’ve been working in enterprise software since the late 1980′s, and what I am witnessing as a participant in “the cloud” is the pace of cloud technology innovation over the past five years blows away the previous two decades.

There is a real noticeable trend here. We didn’t see this in SaaS powered by co-location hosting. What we are seeing with the cloud, and the ISV’s that adopted the cloud five years ago, is truly amazing. Sonian is entering a release cadence updating production systems with substantial new features every month.

Cloud Innovation – Part 1

  • Innovation history of Amazon Web Services 2005-2007
  • How Sonian amplifies cloud innovation
  • Sonian as an example of the “perfect” cloud ISV

Cloud Innovation – Part 2

  • Innovation history of Amazon Web Services 2008-2011
  • Comments about Gov Cloud




Cloud-powered “Time Machine” Creates Corporate Timelines

With much fanfare, Facebook announced a new “Timeline” feature at its F8 developer conference this week. This feature takes advantage of the enormous amount of information (photos, status updates, location) we all store in Facebook. The Timeline is accessible to Facebook application developers as well as the half a billion folks who use the the social network. With increased competitive pressure from Google+ and Twitter, Timeline will be an important differentiators between Facebook and their competition. Timeline also shows there is an interest in melding the past with the present. Timeline wouldn’t have been possible or relevant until Facebook achieved significant adoption and large amounts of data under management. The “network effect” of big data stored in one cloud-computing environment allows Facebook to have unique, unparalleled access to information never before possible with any other online system. Perhaps only AOL or CompuServe had this opportunity, but they didn’t have “the cloud” or sophisticated tools like Hadoop or NoSQL to make their data useful.

(As an aside, I expect some Facebook users might be startled with a “creepiness” factor when they see their Timeline presented back to them. Facebook will have the unique ability to remind us with a visceral visual recollection of past people, places and events.)

What’s interesting about Timeline is the way events, photos, postings, and news-feeds are visually presented. There is a lot of machine learning computational effort required behind the scenes to create relevant and compelling time-lines for 5ivehundred million accounts. This is an example of cloud computing, big data, analytics and creating a pleasing consumer experience.

Read more…

Seeking Cloud Search Engine Technologist

Sonian has a fantastic opportunity for someone with Java, distributed systems, “cloud”,  and Lucene skills to join our team as a Principle Software Engineer to work on our “Cloud Search Engine.”

Ideal candidate is Boston-based, but we will consider any location. This is a unique opportunity, combining two hot trends: cloud computing and “big data search.”

Sonian is designing for petabyte scale data volumes using leading edge warehouse technologies designed for cloud computing infrastructures, distributed systems and full-text search.

Ultimate “Mix-In”: Enterprise Data + Public Data Sets

There is an incredible number of free and low cost data sets available to anyone with sufficient compute power to perform ETL (extract, transform and load) operations (cue “cloud computing mantra theme song”).

For example,, Microsoft Azure Public Data Sets, Amazon Public Data Sets, Infochimps, Guardian Datastore, Google Freebase and others offer access to public (and private) data sets that will be useful to an enterprise audience. The advent of cloud computing, secure API’s and data formats like JSON and XML create the environment to let these public data sets exist and be useful in infinite combinations with enterprise data.

Imagine the ability to “mix in” traditional enterprise content (email, IM, Sharepoint, files, CRM,) WITH public data sets? A mix-in means the enterprise data remains private and secure, but is stored in facilities that support the ability to link facets with public data sets ranging from health data, geo location, census, weather, and more.

There are over 3,000 free public data sets from, including:

  • Supreme Court Justices and Their Decisions
  • Weather statistics for every zip code from the past 50 years
  • Election data for local, state and national races

An example mix-in scenario could be an e-discovery situation where email records are being used to support a civil complaint and the subject matter refers to the weather conditions from 3 years ago. The e-discovery system (enhanced with the ability to access the weather-base) can now access the public weather data set and retrieve the exact conditions for a physical location.

Data sets accessible over an API, or pre-packaged in JSON, dramatically simplify the ETL function and allow the data to be useful in ways not possible prior to cloud computing and open standards.

2010, The Year of Hybrid IT


2010 is the year “Hybrid IT” will help enterprises harness cloud computing. Hybrid IT is a new paradigm for IT management, enabling organizations to benefit from cloud computing’s low cost, infinite storage and powerful and flexible CPU platforms. All these positive capabilities are accessible while still preserving the idea of maintaining control of data as if it were stored on-premises. Hybrid IT is a “win/win” scenario for the IT manager and the chief financial officer.

From small to mid-size enterprises (SMEs) to the Fortune 1000, organizations will move parts of their IT infrastructure to the Cloud.  For the SMEs, Cloud-powered hybrid IT will level the playing field by enabling organizations to adopt strategic hierarchical storage management (HSM) and Information Lifecycle Management (ILM) features without breaking the IT budget. For larger enterprises, Hybrid IT will introduce new cost savings by allowing them to allocate portions of their storage environments to the cloud and eliminate the hardware, software and management costs previously associated with keeping those activities on site.

New software-as-a-service (SaaS) applications built specifically for the Cloud will act essentially as an extension of the customer’s own data center. These Cloud-powered SaaS services enable organizations to strategically and tactically meld cloud-powered SaaS functions with on-premise servers.

As a result, organizations will leverage the Cloud’s on-demand CPU to power through terabytes of content for large searches or deep analytics. With better business intelligence culled from employee generated content silos across the organization, businesses will unlock the previously inaccessible value of their data and gain new insights into their customers and their operations. Cloud-powered archiving will help organizations focus on long-term growth initiatives.

Hybrid IT is ushering in a new wave of computing that will be as profound as the change from mainframes to minicomputers or client-server to the web. Leveraging the cloud requires new software development skills and technology frameworks in order to build SaaS applications on a radically different stack as well as new IT management skills for the individuals who use these applications.  The payoff for end users will be enormous.

I’m in the business of managing digital bytes on virtual atoms

atomic-cookiesJust thinking….. Sonian uses cloud computing, specifically Amazon Web Services, to deliver an digital content archiving service for businesses.

Another way to say what we do is we’re managing bytes of information on virtual atoms.