You seldom get a say in when change strikes your life, but you can at least manage how much of an impact you allow it to have. In IT, a mismanaged change can trigger outages that kill productivity and company reputation. So in an article for TechRepublic, Scott Matteson describes 10 steps to avoid crisis and mitigate the risk of change:
- Plan the change.
- Estimate risk and which hosts or services will be affected.
- Include verification of success.
- Formulate a back-out plan.
- Test the process.
- Establish a dedicated change time window.
- Assign staff responsibilities.
- Document the change process via a request.
- Leverage multiple sets of eyes for review and approval.
- Conduct a post-mortem if an outage occurs.
In the first place, determine who will make the change and how it will be done. Ensure the intended process is well understood before committing to it, and document all the details in a request. Estimating the risk of change will basically just be a measure of how many related systems will be impacted, and how severe those impacts will be. Matteson gives the example of, “If you’re upgrading a router and need to engage technical support, will your ability to do so be hindered if the router fails?”
To verify a change has been successful, make it a nuclear submarine sort of deal—administrators and end-users both should have to confirm that things are well on their ends. And if things end up going wrong, you need a strategy to back out of the change. This may or may not be as simple as just conducting a quick undo:
Some changes are of the “fail forward” type where they cannot be reversed, so the backout plan may be as complex as rebuilding the entire server after an irreversible code change. Consider worst-case scenario situations and increase the odds in your favor by determining whether you need spare equipment or alternate systems on hand, or whether you can take a snapshot of virtual machines involved in the change (if applicable) to quickly revert to their prior state. Even copying critical files elsewhere for safekeeping or performing a full system backup can be a lifesaver.
You will of course use test environments in advance of setting a change into production, but be mindful that test environments typically cannot account for every quirk of production. Likewise, be eager to allow others to review your intended change; they might find problems that completely slipped your mind. It could be that, after all of these steps, the execution of the change still causes an outage. If that happens, conduct a post-mortem to find out what happened. The damage is already done, but you can ensure it never strikes again.
For additional thoughts on all of these points, you can view the original article here: http://www.techrepublic.com/article/10-essential-elements-of-change-control-management/