A Tale of Two Cloud Search Engines

Sonian Cloud Search and Amazon Cloud Search. Their names may sound the same, but they  couldn’t be further apart in terms of how much they cost to operate and their intended use cases.

Sonian is a veteran “Cloud Search” pioneer. In 2008 we launched the first version of search in the cloud, and today the service operates simultaneously across multiple public clouds using a single reference architecture.

Over the past 4 years we have perfected cloud search scaling and cost efficiencies. It’s been a steep learning curve, but well worth the effort. Today there are over seven billion documents indexed, with fifteen million new documents added each day. Daily index and retrieval volumes rise as new customers sign-up for the service.

The secret to Sonian Cloud Search mastery is a combination of open source and IP developed in-house and detailed metrics to show us information on cost and performance. Every few months improvements are deployed to lower costs and increase reliability. We’ve achieved per-document unit costs to fractions of a cent.

Amazon’s new Cloud Search service is a natural evolution for the cloud computing leader. As more data streams into Amazon S3, there was a need for a web service to make that information search-able. AWS Cloud Search is available in beta now and is designed for developers that need a full text index for data typically stored in a relational database. This means data sets in the megabyte to low gigabyte size, and indexes that are more temporal than persistent. AWS prices this web service at the higher-end of the spectrum because index files are stored in memory on cluster compute instances using SSD storage. This is Amazon’s more expensive cloud compute infrastructure, and thus AWS Cloud Search pricing reflects the cost of the underlying infrastructure required to support the service.

AWS Cloud Search in it’s current form is not intended to index terabyte or petabyte document archive workloads. This service would not be cost effective for an email archive SaaS product.

Sonian Cloud Search is designed for long term index storage requiring low operating costs, while not sacrificing search retrieval performance. This is a different use case than what AWS built their new service for.

A typical Sonian customer workload is 100 million documents with 6 Tb of index storage. Internal costs to support this data volume are $1,300 a month. The comparable cost to index the same amount of data with AWS Cloud Search would be $14,000 a month. This is not a ding against AWS Cloud Search, but rather an example of a PaaS service being used for the wrong intended purpose.

Cloud success requires variations of the same theme. There is plenty of variety in compute and storage. Now we have search technology choices to pick the best tool to solve the problem at hand.

n.b. A version of this post is published at http://blog.sonian.com/cloud-buzz-blog/

Image Credit