The Sydney Morning Herald has reported (http://www.smh.com.au/technology/security/4800-aussie-sites-evaporate-after-hack-20110621-1gd1h.html) that Distribute.IT has irretrievably lost the data for 4,800 web sites hosted on its infrastructure.
I have no idea if this is accurate, but we can easily examine some possibilities...
There are many ways to back up these days, and a popular way is to use "snapshots" which capture (on the initial snapshot) a copy of the data, and on subsequent snapshots only the delta (or changes) are snapped. This is very quick and involves no downtime. Many snapshots can be kept and restores can be instant. Many storage network providers such as NetApp work in just this way. The primary backups are on spinning disc and it's fast and convenient.
However, prudence suggests that a belt and braces approach is best. Normally an enterprise will have a primary data centre and a redundant data centre at a separate physical location. If one centre loses power or collapses in a heap due to storage or networking issues the redundant centre comes on-line. Since they are on separate subnets and likely to be fire walled, a hack on one centre won't affect the other. Controls can also be put in place to prevent automatic mirroring of more than a certain percentage of changes without human intervention.
So, snapshots provide a great first line of defense, but there is no substitute for disconnected storage. Even Google uses tapes to back up data, as shown in some recent Gmail outages. It doesn't have to be tape, but either way you have disconnected storage, stored off-site, that can't be affected by a hack or a fire. A weekly offline backup in combination with 2 hourly snapshotting would seem to be an enterprise grade approach to Disaster Recovery.
What's Disaster Recovery? Just that. If you have a disaster, you can recover. What's an example of a disaster? Let's see, maybe a hacker getting in, trashing your servers, your SAN and your snapshots? What's your plan to recover from that? My corporation does very real, very detailed DR tests and they are audited. DR is a real problem, and there are real solutions.
If you did your tape or other offline backup weekly, you may lose up to a week's work, but that's better, way better, than nothing.
Now we also need to consider the terms of service between Distribute.IT and its clients. I have not seen it, but it is common to see any or all of the following clauses:
- disclaimer for any indirect or consequential loss arising out of system unavailability;
- limitation of liability to the equivalent of one year's hosting fees or re-supply of the services;
- disclaimer for any direct losses.
These all mean that customers will have minimal recourse to the web host, and even more so if they go out of business. You might check whether your web host is appropriately insured for events like this, and you should have a chat with your broker about your own insurance. Business interruption insurance may not cover something like this, so you need to treat your web site like a core business asset - just the same as a insuring your factory or buildings.
Post a Comment