Validating Data Conversion

This post was extensively updated and incorporated into my new book, The Data Conversion Cycle: A guide to migrating transactions and other records for system implementation teams. Now available on in both Kindle format for $4.49 and paperback for $6.99.

Continuing the series on the data conversion cycle: subsequent to any bulk load of data from another system, the accuracy and completeness of the data in the target system must be assessed.  This process is generally referred to as validation, and it has three objectives:

  1. Identify errors in mapping data elements between systems
  2. Identify errors in the source system extraction processes
  3. Identify errors in the source system records

As you can see from the diagram below, the validation process is the key feedback point in the data conversion cycle.

Data Conversion CycleValidation typically uses several techniques, selected based on the nature of the data being inspected.  They include:

  • Application level comparison – A user logs in to each system and compares the information in the source system with the information in the target system.  This is the most labor-intensive technique, but invaluable for confirming that the data elements were mapped to the correct fields in the target system (Objective 1).
  • Report level comparison – A user or analyst compares a report from the source system with a report containing the same records, prepared from the target system.  The reviewer looks for differences in the number of records, and incorrect or truncated values.  Once identified, a member of the conversion team should investigate whether the difference was caused by the extraction process (Objective 2) or resulted from issues in the source system (Objective 3).  This technique doesn’t require access to the system, or even knowledge of the data or subject matter.  However, it can be fairly labor intensive.  This technique is best used when reviewing data with a limited number of values, such as assignment to an organization, as arbitrary values are difficult to compare.
  • Excel vlookup automated comparison – An analyst prepares an Excel workbook using the data extracted from the source system in one worksheet, and the corresponding data extracted from the target system in another worksheet.  An automated comparison in a third worksheet is then possible using vlookup and other analytical functions within Excel.  This approach requires more preparation, but is usually fastest in execution, especially when inspecting very large numbers of records containing arbitrary values, such as strings (names and addresses), dates, and numerical values.  As with report level comparison, differences are investigated to determine whether the root cause was the extraction process (Objective 2) or an error in the source system (Objective 3).

Much like Chinese cooking, most of the labor in a data validation exercise is in the preparation.  To that end, validation planning should include an analysis of the data being loaded to the source system, to determine the following:

  • What user-accessible field mappings have been made?  It may be possible to identify one or two users with the access rights to check all of the mappings, using application level comparison.  Note that it is not necessary to check multiple records; the purpose is to ensure that the data is “landing on the correct runway.”
  • Based on the records types being loaded, are there delivered audit reports built into the target system that can facilitate a report level comparison with the source?  If there isn’t a complete correlation between the two systems, generally a custom report can be derived from an existing report, either on the source or target system, to facilitate the comparison.
  •  Which data elements would be difficult to compare visually, in a report level comparison?  It is useful to identify the specific fields requiring an automated comparison, so that custom reports can be created in advance to extract the records for loading into Excel.  It is also common practice to load the source records into the worksheets and prepare the comparison worksheet while the data is being loaded into the target system, to save time.
  • What dependencies exist?  Usually, there are no dependencies in the inspection process, because the data, once loaded, is static.  However, if there are issues that will impact the availability of conversion resources, they should be identified in advance.

A well-planned and coordinated validation process can proceed on a broad front, with a number of workers inspecting specific record types, using specified tools and techniques.  The team should have the goal of minimizing the time required to conduct validation, as it is the final step before a move to production.  This is critical, in that during the time between the final extract of data from the legacy system used as the source, to the move of the new system to production, transaction will continue to be processed.  These transactions must then be re-entered into the new system.

The validation process falls between loading the data to the target system and actually using it, in every cycle.  Whether that initial use is development, testing, or production, the process needs to be followed.  Note that validation findings of the final load before the move to production may have to be subjected to a triage process, in that any corrections will have to be made in that production system.  Consequently, the principal measure of success of validation in earlier cycles should be the elimination of sources of error.  A solid validation and correction process should reduce the number of corrections in production to nearly zero.

Next week, I’ll address creating a data conversion plan, based on the data conversion cycle, and integrating it into the overall project plan.