My son just turned 20 months old. It’s funny how at that age we measure age in so much detail, rather than just saying he’s almost two. Nevertheless at this age his little brain is developing at an amazing pace. As such he’s reached what some have described to me as the most terrible phase of life… the “What’s that?” and “Why?” stage. Everywhere we go and everything he sees, he’ll inevitably ask me “Dada what’s that?” and if my explanation doesn’t satisfy him he follows up with a “Why Dada?”

The lesson I’ve learned from him is to take a minute before I answer him, and think about how to tell him in language that he understands what things are, and most of the time why he shouldn’t play with them.

I’ve also taken this same lesson to heart in my day job as well.

I’ve done a lot of work in disaster recovery. As such I tend to take for granted that people will care about things like RTO and RPO, or replication frequency. The reality is most people do not. That’s because most people don’t understand these things. In a sense they’re like my son.

So when it comes to disaster recovery I thought I’d put the topic in better context with something that everyone understands. Numbers.

Here are a couple big ones; According to Forrester in 2009 disasters were at the root of almost $41B or economic loss. Wow. The US alone accounted for almost $11B of that number.

So at a macro level clearly disasters happen, and disaster recovery probably deserves some examination.

At an administrator level, I think most IT people have a gut feeling for how valuable or how important the different areas of the datacenters they manage are. But has this been communicated, or quantified by the users and application owners?

Allowable Downtime for Workloads:

Required Availability Required Uptime hours/year Allowable Downtime /year
90% (0.9)
36.525 days
99% (0.99)
3.6525 days
99.9% (0.999)
8.766 hours
99.99% (0.9999)
52.596 minutes
99.999% (0.99999)
5.2596 minutes

Using this simple uptime/downtime chart are your users able to tell you the dollar cost impact these service levels might translate in to? If your order processing application as an example was classified as having three ‘9s’ of importance, how many transactions do you expect would be lost or delayed, and what does that cost if you allowed for 8 hours of downtime?

I think these are important conversations to have.

Maybe that chart can give you a start.

0 votes, average: 0.00 out of 50 votes, average: 0.00 out of 50 votes, average: 0.00 out of 50 votes, average: 0.00 out of 50 votes, average: 0.00 out of 5 (0 votes, average: 0.00 out of 5)
You need to be a registered member to rate this post.

Disclaimer: As with everything else at NetIQ Cool Solutions, this content is definitely not supported by NetIQ, so Customer Support will not be able to help you if it has any adverse effect on your environment.  It just worked for at least one person, and perhaps it will be useful for you too.  Be sure to test in a non-production environment.

Leave a Reply

No Comments
By: jasondea
Feb 11, 2011
7:41 am
Active Directory Authentication Automation Cloud Computing Cloud Security Configuration Customizing Data Breach DirXML Drivers End User Management Identity Manager Importing-Exporting / ICE/ LDIF Intelligent Workload Management IT Security Knowledge Depot LDAP Monitoring Open Enterprise Server Passwords Reporting Secure Access Supported Troubleshooting Workflow