Mapping Records for Data Conversion

This post was extensively updated and incorporated into my new book, The Data Conversion Cycle: A guide to migrating transactions and other records for system implementation teams. Now available on in both Kindle format for $4.49 and paperback for $6.99.

Last week, I addressed defining the scope of a data conversion effort.  Note that such an effort is almost always part of a larger project to replace some legacy production system of record with a new solution.  Consequently, the existing current and historical records will almost certainly be in a different and likely incompatible format from the new system.  The challenge is then to properly translate, or map, the information in the current system to the locations and format required for loading to the new system.  There are several activities involved in mapping, including:

  • Identifying where the existing records will be stored in the data model of the new system.  This is a consideration for both the presentation layer, which is visible to the user, and for underlying business processes and interface processes.  Consequently, this task requires collaboration by people who understand how data is stored in each system, at a very detailed level.
  • Identifying whether key fields will need to be preserved or changed.  It is not uncommon for existing identifiers, such as employee or customer ID’s, department or account codes, or other object references to be replaced with newly assigned values.  In such situations, you’ll need to determine whether the old values will be stored in some non-key field for cross-reference with off-line records or external systems, and develop a strategy for preserving the relationship between the set of records and the object during and after conversion.  This may require experts such as database administrators (DBA’s) or administrators of other systems that contain records for the same objects.
  • Ensuring referential integrity is preserved between parent and child records.  The new data model may not have the same structure for one-to-many relationships.  In such cases, it may be necessary to create “linking” records that preserve the connections.  As data models have grown more complex, and applications have moved from purely relational databases to object models and other “No-SQL” storage methods, this can get complicated, and a DBA may be required to sort out the details.
  • Identifying any translations of data values required.  Required changes in format, such as the way date values need to be loaded, or conversion of string values from upper case to mixed case, need to be identified.  In some cases, a value substitution may be required, because the systems use different codes to represent the same condition; in other cases, it might actually require a computation or other logic to apply a business rule.  In complex situations, translation rules might require examination of multiple fields.
  • Establishing conventions for creating required data elements not present in the source.  In many cases, it will be enough to specify a valid value that applies to all records; in other cases, a value may have to be determined based on other data elements or even records from a related system.  It is not unheard of for records from multiple sources to be merged into a single load to a new system.

Keep in mind that the mapping exercise is commonly based on incomplete information.  It is normal for the initial load to identify gaps, incorrect assumptions, or outright errors, which are then updated prior to the next load.  Don’t feel you need to hold up the project in order to get the first load completely correct, especially if it will be followed by several iterations.  Also note that the validation process, subsequent to the actual load, will likely identify needed corrections to some records in production.  Conversion is nearly always structured as an iterative process, so the lessons learned from each conversion cycle can be applied to the next iteration.  Refer back to the data conversion cycle diagram to see all of the feedback loops.

Data Conversion CycleNext week, I’ll address considerations in the development of the processes used to extract data records from the legacy system.  In the meantime, please leave a comment if you have a data conversion experience you’d like to share.

This entry was posted in Theory and Practice and tagged , , , by Dave Gordon. Bookmark the permalink.

About Dave Gordon

Dave Gordon is a project manager with over twenty five years of experience in implementing human capital management and payroll systems, including SaaS solutions like Workday and premises-based ERP solutions like PeopleSoft and ADP Enterprise. He has an MS in IT with a concentration in project management, and a BS in Business. He also holds the project management professional (PMP) designation, as well as professional designations in human resources (GPHR and SPHR) and in benefits administration (CEBS). In addition to his articles and blog posts, he curates a weekly roundup of articles on project management, and he has authored or contributed to several books on project management.