Businesses approach data conversion projects with apprehension, and perhaps rightfully so. But the Practicing IT Project Manager, Dave Gordon, has written a new book aimed at demystifying data conversion for all roles involved. We interviewed Dave about his book and his reliable, repeatable process for data conversion. Here’s what he had to say.
AITS: Your book, The Data Conversion Cycle, describes a generalized approach to data conversion that can be applied by nearly anyone involved in maintaining data records. Can you briefly discuss the steps that compose the data conversion cycle?
Dave Gordon: The first step is to define the scope of the conversion—in other words, what record types and range of records will be moved from the current system of record to the target system that will replace it. This step is normally only done once, unless the overall scope of the project changes.
Once the team has a finite list of record types to move, they can map the legacy record types to the record types in the target system. Note that this usually involves transforming the representation of the data in some of the legacy system record fields, so it can be accepted by the target system. One of the points that I make in the book is that data assumes the shape of its container, so you will likely need to change the values stored in fields such as status codes and possibly make other abstractions.
Next, it is necessary to develop the processes that will extract the data from the legacy system. This might involve database queries, reports, or even programs, but the goal is to create a set of processes that can be run repeatedly, as needed. The field value transformations are made in these processes, so it is important to include a means to identify unexpected values or combinations in the logs.
When they are ready, the team can make a static copy of the production system and run the extraction processes. Never run the extraction processes against production, as records are constantly being added and updated. You won’t be able get the same results twice, and in fact you may not be able to get a usable set of extracted data, if related records are out of sync.
Once the extracted and transformed records are available, they can be loaded to a candidate version of the target system and finally validated against the source. It is important to capture all the errors and rejections so you can update the mapping and transformations, extraction processes, and even make corrections to records in production.
These steps are iterated as often as necessary, based on the project needs for design, development, testing, and the eventual cutover to the target system.
AITS: Which stage in the data conversion cycle incurs the most risk, and why is that?
Gordon: Developing the legacy system extraction processes tends to be the riskiest, because it involves a lot of logical transformations that may not be completely understood. The wise development manager works closely with the database administrators who run the load processes and the data custodians who validate the data loaded to the target system to identify needed corrections before the next iteration.
AITS: In your experience, what is the most common mistake teams make in the data conversion process?
Gordon: I’ve seen many project teams fail to reach agreement with the data custodians on the process for reporting records that need to be corrected in production before the next copy is taken. As a result, they end up dealing with the same rejections, repeatedly. Therefore, I highly recommend that the data custodians work to clean up the legacy system records in production before the project even begins. I’ve also seen teams try to extract directly from production, and learn the hard way why they should always work with a static copy. And of course, I’ve seen problems noted in the extraction, load, and validation steps that never get fed back to the developers of the extraction processes.
AITS: Is there any one aspect of data conversion that is actually less complex than people might imagine? Likewise, is there any particular aspect that is significantly more complex than people might expect?
Gordon: The loading process is usually well-understood for modern enterprise software systems, so it tends to be less of a complicating factor. But incorporating the needed mappings and transformations into the extraction processes is usually a lot harder than the team expects.
AITS: Until now, very few have written at book length on the subject of data conversion—in spite of its growing importance. Why do you suppose that is, and what do you hope to change with this book?
Gordon: I suspect that most experienced IT managers believe every conversion is different, which is true. When I was preparing to write this book, I only found one other book on the subject, from 2005. However, following an iterative process that everyone on the team from business SMEs to programmers understands, and with everyone using the same vocabulary to talk about what they’re doing, can make the project a lot more manageable.
AITS: Are you planning to write another book?
Gordon: I’m taking notes for a book that will help people who don’t normally manage projects lead small teams to accomplish small-scope projects on a short timeline. The working title is, “Managing Itty-Bitty Projects Using Excel: A guide for smart people who need to know how.”
I also have two fiction books in first draft, and an advanced draft of the second edition of my book, “Microsoft Project Hacks.” All coming soon to a Kindle near you!
You can’t say I haven’t been warned! Thanks for you time, Dave!
The Data Conversion Cycle is available on Amazon in print and for Kindle.
Dave Gordon is also a member of the AITS Blogging Alliance.