I am just about to go down and see a clients IT department and was preparing a list of questions to ask them. I thought I would share it with the readers of this blog.
These are the questions I think all Business Continuity Managers (BCM) should know about their own IT systems. I believe you don’t need to know the finer details and how the technology works but you need a good understanding of the following points.
Data centres and IT hardware
- Where your main data centre or data centres are physically located.
- Is there anywhere else data is stored such as local servers (team and individual drives and e-mail servers) collocated with its users or servers which serve all the people in one building.
- If you have a data centre and a back data centre do they have the same capacity or what is the ratio of live to back-up
- If two (or more) data centres are mirrored or employ visualisation over the two sites how good is the network between the two and how much data could be lost if one data centre was lost
- Are there any known risks to the data centres or are they located in a risky area
- What has been done to protect them against power failure
- Are they manned 24 hours or do they have alarms on them to warn staff of a bust pipes or the centre overheating
- If VOIP telephony is used, where are the servers located and what capacity could be lost under different disaster scenarios
- If cloud computing is used, where is the location of the cloud data centre(s), which companies are involved in the running of the data centres and what are the backup plans and data loss if a data centre is lost
- Are there third party contracts for disaster recovery and what do they cover. Is there regular testing of the provision
- Ask for a network diagram and look at single points of failure
- Is the network in a loop enabling data to feed both ways or is the network a single strand
- Look for locations which house nodes on the network which if lost would cause the network to be lost at other connected locations as well
Backup and restoring
- As part of the understanding the organisation process the critical systems for the organisations should be established
- For each of the systems the backup regime should be known by the BCM
- The present Recovery Point Objective (RPO) should be known for each system. This is the amount of data which could be lost under a catastrophic failure and having to restore the system from the backup. This can vary from days and weeks if you don’t back up regularly, 24 hours if your backup is nightly tape back up, to no data loss if mirroring and other technologies is used.
- The time taken to restore systems under catastrophic failure / worst case scenario should be known and the order given of system recovery. This is looking at the loss of a data centre rather than the loss of individual systems.
- Recovery of individual systems should be known if they are critical to the organisation or they underpin activities with short RTOs (Recovery Time Objectives).
IT department’s plans
- Does the IT department have disaster recovery plans in place and what do they cover
- Are their plans purely technical or do they cover incident management and decision making
- How often are the plans tested and to what level do they test them
If you can think of any other questions I am happy to add them to the list.