Why the cloud can't be trusted
/Late the night of June 19th I was chatting with one of my developers and he mentioned that our hosted source control provider, Codespaces, was down. I checked the website, and sure enough, nothing. At first, it didn’t seem like anything new, from time to time Codespaces would go down (more often than I'd like) but it would eventually come back up and everything would be back to normal.
I figured I'd check their Twitter feed to see if there was any details on what was happening and an ETA for it to be back online:
The Codespaces website they were pointing to was down so I couldn't get any more information there, but some quick googling revealed that the nightmare cloud security scenario had just occurred to my hosted source code provider. From reading various articles online it became clear that all or most of Codespaces data had been deleted. Ouch.
An attacker orchestrated a DDOS attack, gained control of Codespaces' AWS account and tried to extort money. When Codespaces didn't pay, the attacker started deleting assets in the AWS account. That includes all the machine images, EBS volumes, backups, customer source code repositories and snapshots. Almost everything was deleted, essentially wiping out the business along with all of their customers precious data.
Fortunately I was one of the lucky ones. By chance, my repository was hosted on one of their older nodes that survived the attack, and I was able to get them to send me a link to download a dump of the repository. It was an intense 48 hours while I contemplated next steps in case my source code repository containing 3.5 years worth of development history was lost.
As a side note, I recall an email from Codespaces some time ago with an offer to upgrade to their new SVN hosting. It didn’t seem super important and I never got around to it. I'm normally all about the upgrades, but that's one upgrade I'm very glad I didn’t do :)
During the 48 hours when I was contemplating the worst case scenario for Shindigg, I realized that it wasn't actually that bad. The latest version of the code is available locally on our development and build machines so at worst we'd lose our history, but still have the latest version. Although being able to look back in time at source code history isn't something that you need everyday, it's important to have when you need it. Losing history is a hassle, but not crippling.
So, what can we learn from this?
Lesson 1 - Do Not Trust Your Data To Any Single Cloud Provider
The first and biggest lesson is that you should never trust your data to any one provider. No matter who the provider is or what kind of redundancy they have in place, never trust your data to a single provider.
Whether you're an individual, a business using hosted services, or a business providing hosted services to others (and probably using someone else's hosted services in the process), you've got to take responsibility for your own backups.
I made a mistake trusting our source code repository data to Codespaces and relying on them to properly operate their business and back-up their data, which they failed to do. Our most important data, the latest version of the source, was replicated in several places, so it was safe, but the repository (and therefore history) was vulnerable.
I fixed that by choosing a new hosted source code control provider that has a feature to automatically create backups and send them to my S3 account, thus giving me redundancy across 2 different providers. If my source code control provider gets wiped out, I've got a backup on a completely different system.
At Shindigg, we use a combination of self-hosting and a variety of other services including Azure, Cloudinary, Stripe, Mandrill etc… The most important data for us is our core database, and it gets backed up every 10 minutes and those backups are immediately uploaded offsite to a completely different provider. Which brings me to my next point…
Lesson 2 - Offsite Means At a Different Cloud Provider
Codespaces claimed to have offsite backups. Technically, the data was probably replicated to different AWS data centers, but it was still all in AWS, and it was all accessible through a single AWS account. That's not really "offsite" when it comes to the cloud.
In the cloud, offsite doesn't just mean relying on the cloud provider to have the data in a different physical location - you need to have critical data backing up at least to a separate account but really to a completely different provider.
At a minimum what Codespaces should have done is have their backups pushing to a separate AWS account, but ideally send backups to Azure, Rackspace or any other cloud storage provider. That way when their main AWS account was compromised, they wouldn't have lost everything. It may have taken them a couple of days to recover from back-ups which would have been bad, but having almost all of your customer data and therefore your business irrevocably wiped out is significantly worse. Which brings me to lesson 3...
Lesson 3 - Think About The Worst Case Scenario
Spend some time thinking about the worst case scenario and taking steps to mitigate the risk - what happens if a part of your infrastructure gets compromised? Security is all about layers.
For most small-mid size websites, the reality is a determined attacker can probably get in if they want to. Nothing is 100% secure, but you need to be taking reasonable steps to protect yourself and harden your site against attack. Even large websites or companies with huge teams dedicated to security get compromised from time to time (Google, Adobe, Target etc...).
Ask yourself what happens if someone does get in to part of your system? What's the worst that can happen? If for any one component of your infrastructure the answer is "we're completely screwed" then take steps to fix it and make it harder for your business to get wiped out. The answer needs to be more like "if A happened and B happened and C happened and D happened and E happened and F happened, then we'd be totally screwed".
Different applications require different degrees of redundancy and security, and building all of this costs time and money. You need to figure out what makes sense for your business.
If Codespaces had have asked themselves this simple question, they could have taken steps to mitigate the risk by having a backup at a different provider.
Incidentally, Codespaces could have suffered the same loss of data through operator error (ie accidentally deleting something from the AWS account), massive AWS failure or a disgruntled employee. That's far too many ways to completely wipe out their business.
Years ago at university, I remember a lecturer talking about how the digitization or information makes it not only more accessible, but easier to destroy. He used the analogy of a pallet full of papers vs a CD-ROM (back before the days of flash storage :)). Destroying a stack of papers takes effort. Sure, you could rip them, burn them, shred then, blow them up. But any of those options required a decent amount of time, effort or equipment to execute. Compare that with destroying the same amount of data on a CD, which you could just snap with your hands in less than 1 second. The cloud exacerbates this problem. Now it's possible to destroy a pallet load of CD's worth of data in seconds.
Lesson 4 - Follow Best Practices
There are plenty of resources on best practices for security. It's hard to know where to begin when analyzing Codespace's security failures, but one thing that likely wasn't in place and should have been was multi-factor authentication. AWS has it, Azure has it, Google has it, Facebook has it, Twitter has it. Pretty much all major sites have it these days and encourage people to use it even for personal accounts.
If thousands of customers are trusting you with their data, do yourself a favor and enable it. It probably would have saved Codespaces' ass.
Conclusion
Just because your data is in the cloud doesn't mean it's safe. As an individual or a business, never trust your data to a single provider, make sure you've got backups in multiple places. As a business, think about the worst case scenario and make sure it's not too easy to have your data (and consequently your business) wiped out. Security can be complicated and expensive, but there are a lot of simple, cheap things you can do to make your app more secure. If Codespaces had followed these, they'd still be in business.