(ed. A version of this post appears at the Sonian Big Data Cloud blog)
A cloud software company’s worst nightmare came true for Dropbox this past weekend when a software bug allowed anyone to login to an account (over a four hour time period) using any password. It’s unknown if or how many accounts were accessed inappropriately. So far there are no reports of data breaches.
This recent occurrence, coupled with other non-cloud, but seemingly similar themed data breaches as reported by Citi Bank, Sony and LulzSec, has moved the “can the cloud be secure” conversation into the spotlight. The short answer is yes, the cloud is secure, and here is why.
Defining Cloud Security
Data security in the cloud is a combination of “inherited responsibilities” between the cloud infrastructure provider (Amazon, Rackspace, Softlayer, etc.) and the independent software vendor (an ISV, i.e. Dropbox), and the customer.
Data security in the cloud is really two components: resiliency and privacy. Resiliency means when a customer stores data in the cloud, the cloud vendor should not lose that data. Privacy means nobody but the customer should be able to “see” the data stored in the cloud.
The cloud vendor is responsible for data resiliency. Cloud vendors provide Service Level Agreements (SLA) that provide a measure of resiliency so that customers can compare one cloud versus another. For example Amazon Web Services provides a “eleven-nines” of cloud storage resiliency, while SoftLayer offers “five-nines.” These SLAs are far better than what a typical enterprise can achieve in their own data center.
Secondly, the ISV is responsible for data privacy. This means any number of methods for encrypting or obfuscating the data contents (a photo, a word processing file, etc.) There are different use cases for the various privacy methods available. In some examples, data is encrypted before uploading to the cloud. This method works well for “backup to the cloud” where the cloud software vendor doesn’t need to be able to open the contents. In this case the data is “black box” to the software and cloud infrastructure. Black box data isn’t useful much beyond a backup system. To make the data more useful in the cloud requires a multi-layered security approach.
An alternative security implementation uses two or more encryption layers. As data moves “over the wire” from customer to cloud, Secure Socket Layers (SSL) will protect the contents as the data packets fly through the Internet. This is the same protection used for credit card numbers, online banking and e-commerce. Once the data is securely transmitted to the cloud, a second encryption layer takes over and secures the data “at rest.” This two-stage approach allows the data to be useful in the cloud, while at the same time offering a high degree of privacy. Cloud computing’s great potential is to apply “on-demand” scale to big data crunching. Having “trusted access” to contents means being able to create full-text indexes on terabytes of data, analyzing hundreds of gigabytes for business intelligence, or running sophisticated analytics on thousands of virtual computers all at once. There is no limit to the potential positive uses for data stored in the cloud, but there is an implicit trade off of each method for securing data in the cloud. For data to be useful beyond simple backup and restore requires a multi-layer approach.
And finally, the customer has an important role to play in cloud security. The customer is responsible for choosing “strong” passwords. It’s a simple request, but amazingly many cloud customers do not take the time to generate a sufficiently unique password for their accounts.
How Sonian secures the Big Data Cloud
Sonian implements the multi-layer encryption method for archiving content to the cloud. SSL used for “over the wire” and AES-256 is used for “data at rest.” In addition, the Sonian web application enforces strong passwords, and allows the customer to enforce password rules. Sonian also uses per-customer encryption keys, and a method of securing data that allows for key rotation, key deprecation and rapid re-encryption. Each customer has their own unique login URL in addition to a username and password combination. In addition, all login and user interface interactions are logged and audited. Every customer can view their own unique audit trail to ensure no unauthorized access has occurred.
Futures Improve Big Data Cloud Security
The next wave in security innovation will make the multi-layered approach even stronger. Start to look for software supporting multi-factor authentication (MFA) and support for customer-owned encryption keystores. MFA means access to data requires two or more authentication factors, for example a password and a code from a secure ID time token. This approach virtually eliminates “hacking” potential because a physical device has to be present at login.
A second innovation will allow customers to manage their own keystore. This is the concept that customers can “revoke” access to their data in the cloud from the software and infrastructure layers. The customer maintains a small file with encryption key information and the software must have access to this file in order to open the contents for indexing and other value added activities. If the customer is uncomfortable with the cloud vendor(s) then the customer can cut off access to the remote keystore.
Summary: Security in the Big Data Cloud
- Data in the cloud can be more secure than on-premises
- Data security = resiliency and privacy
- Cloud vendors, software developers and customers all play a role in data privacy
- Multi-layered security allows cloud ISVs to
- New technologies will dramatically improve the cloud security model

