Developing Legacy System Data Extraction Processes

This post was extensively updated and incorporated into my new book, The Data Conversion Cycle: A guide to migrating transactions and other records for system implementation teams. Now available on in both Kindle format for $4.49 and paperback for $6.99.

Last week, I continued this series on the data conversion cycle with an examination of mapping data records for conversion.  This week, I’ll address considerations in developing processes for the extraction and translation of legacy system records, in preparation for loading into the new system.  This all assumes that the new system has known and well-documented tools for loading records in bulk, so that they don’t have to be re-entered manually.  So with that proviso, our conversion scope, and the results of our mapping exercise, let’s consider what it takes to develop a reliable and reproducible data extraction process.

First, it is necessary to use a static copy of the production system.  You can’t have records being created and updated while you are in the process of extracting them, or you risk losing referential integrity.  So schedule a copy to be taken at an appropriate point in the operating cycle, and shut off access to the copy for everyone but the team members who need access for the conversion process.  If possible, set their access for read-only.  This will ensure a stable source for conversion.

Next, review the mapping documents for dependencies, to identify the logical sequence required to prepare the load files.  It might be that certain keys need to be established and re-used for related records.  Or it might be that the business rules for certain translations require examination of multiple records.  In any case, it is likely that there is an optimum order for execution of the processes you’ll develop.  Once you have the logical sequence, you can then decompose the overall conversion map into a series of extractions and format conversions.

At this point, you can define the record extraction queries.  Depending on the system and tools available, this might involve SQL queries, system reports, or any number of other methods.  But in all cases, you will specify the fields to retrieve as defined in the mapping document, and select records (the “where” clause in a SQL query) as defined in the scope document.  The combination of the two will determine what information is in the extracted data records.  Once you have tested these queries, and ensured that you are able to extract exactly what you need in order to prepare the load files, you can then create the translation and format conversion logic, as specified in the mapping document.  Finally, you can assemble the records into the format required for the loading tool.

Note that you will need to maintain an appropriate level of security for the data records, in transit and at rest.  This includes the copy of the production system, extracts and other work files, and the finished load files.  Also note that this overall process needs to be reproducible, from one conversion cycle to the next.  Based on the tools in use, develop scripts to ensure execution is consistent, so that lessons learned can be properly applied in the next iteration.

Next week, I’ll address actually extracting records from the legacy system, and loading the records to the target system.  In the meantime, if you have some lessons learned from past data conversion projects to share, please leave a comment.