A Risk Taxonomy for Data Conversion

This post was extensively updated and incorporated into my new book, The Data Conversion Cycle: A guide to migrating transactions and other records for system implementation teams. Now available on Amazon.com in both Kindle format for $4.49 and paperback for $6.99.

Continuing the series on the data conversion cycle.  In order to conduct a meaningful risk assessment of a planned data conversion effort, we need to consider the not just the legacy source system and the target, but the remainder of the project.  A risk taxonomy identifies and defines relationships between activities and resources as classes, activities, and attributes.  The team conducting the risk assessment can then use the taxonomy as a source of questions, to determine what is unknown. In this installment, I’ll outline a risk taxonomy for data conversion that includes sources of risk from the larger project and operating environment, as well as related activities.  This holistic view of risk will allow the project team to better identify the unknowns that matter.

For example, the team members charged with defining the requirements for the conversion can use the taxonomy to identify and anticipate unknowns in each of the activities in the Requirements Class.  A leading practice is to phrase a question, using the attributes of each activity.  “Is the project scope stable enough to let us accurately define what records should be converted?”  “Is the target system configuration complete and stable enough for us to map existing records to it?”

The Requirements Class

This class includes the activities that define the requirements for the data to be converted and loaded to the target system.

  • Define scope. This is the analysis done to determine what records will be migrated.  Attributes include stability of the project scope, completeness of the analysis, and accuracy of the detailed scope description.
  • Map data elements.  This is the analysis done to determine where each data element in scope will be placed in the target system, and what changes or transformations must be performed on the source data in order for it to be meaningful.  Attributes include stability of the target system configuration, correctness of the mappings, and completeness of the requirements.
  • Correct invalid mappings.  This is the corrective action taken, based on the validation of a load, to update the mappings.  Attributes include version control, quality control, and updates to the validation plan for the next load.
  • Staff the activities.  This is the people management process that assigns team members to perform the tasks.  Attributes include availability of functional subject matter experts, source system experts, and target system experts.

The Source Extraction Class

This class includes the activities that produce data records, formatted in a way that will allow them to be loaded to the target system.

  • Develop legacy system extraction processes.  This is the development of queries, reports, and related artifacts that will select specific transactions, format them appropriately, and make them available for loading to the target system.  Attributes include software quality, availability of hardware and network resources, and performance.
  • Extract records from the legacy system.  This activity consists of executing the extraction processes from a snapshot of production and making the resulting data available to the team members who will load the target system.  Attributes include access to the system containing the snapshot, sequencing, performance, defect handling, and secure file transfer.
  • Correct invalid processes.  This is corrective action taken, based on the validation of the load, to update the extraction, re-formatting, and file transfer processes that produce the data and make it available to the Load team.  Attributes include version control, quality control, and updates to the validation plan for the next load.
  • Provide resources.  This is the management activity that assigns team members and provides them with access to server resources required to perform their tasks.  Attributes include availability of operations team members and developers, availability of disk space, program access to the source system, and availability of secure file storage.

The Load and Validate Class

This class includes the activities required to load the records extracted from the source to the target system, validate that they were loaded correctly, and initiate corrective actions.

  • Load data records to the target system.  This activity consists of executing the target system load processes on the data records provided by the Source extraction team.  Attributes include software quality, extracted data quality, availability of hardware and network resources, and performance.
  • Validate loaded records.  This is an inspection activity, designed to ensure that the data stored in the source system has been properly moved to the target system, and is ready to support planned operations in production.  Attributes include coverage, accuracy, tools, and access to the target system.
  • Correct invalid records in production.  This is a records management activity.  Attributes include accuracy of historical records, quality of the validation process, and access to the production system.
  • Provide resources.  This is the management activity that assigns team members cleared to access the data, and provides them with access to the loaded system required to perform their tasks.  Attributes include availability of qualified team members, security controls, and access to the loaded target system.

The Project and Environmental Factors Class

This class includes those project activities outside of the data conversion cycle processes that influence or drive the activities listed above.

  • Manage project scope.  This is a project management activity.  Attributes include change control, configuration management, and communication.
  • Manage system design.  This is a target system development activity.  The design of the system is a primary input to the conversion scope and mapping activities.  Attributes include completeness, validity, and maintainability.
  • Manage implementation.  This is a target system development activity.  The configuration of the system greatly influences the mapping and transformation of the source data to make it usable by the target system.  Attributes include completeness, stability, modularity, and traceability of changes.
  • Manage Operations.  This is a data center activity.  Attributes include production system management, operating schedule, and network access.

Plainly, this list is just a starting point.  There are a number of potential risks in these classes, but based on the environment, the scope of the project and the state of development, there may be many more.  The project team should review the list, asking questions as they go, and identifying the unknowns as issues or potential risks requiring qualitative analysis.  Next week, I’ll conclude this series with a look at how we can measure progress in the iterative data conversion cycle.

Developing a Data Conversion Plan

This post was extensively updated and incorporated into my new book, The Data Conversion Cycle: A guide to migrating transactions and other records for system implementation teams. Now available on Amazon.com in both Kindle format for $4.49 and paperback for $6.99.

Continuing the series on the data conversion cycle.  I’ve been describing a generalized methodology for migrating data records from a legacy system to some new system of record.  However, this methodology needs to mesh with the rest of the project.  In this installment, I’ll identify considerations for both managing the iterative conversion process and integrating the data conversion plan into the larger implementation project plan.

First, it is important to clearly identify the number of iterations of the data conversion cycle that will be required for the project, based on use cases.  Typically, one or more environments are built for development; there may also be a limited-scope environment built to support a design workshop.  You will also need one or more environments built for testing.  Of course, you’ll need to convert one final time, in order to move to production.  Each of these use cases will require delivery on a particular date, so you’ll need to plan your way back from delivery of the build to the extraction of data from the source system.  Note all dependencies and entry and exit criteria for each iteration of the cycle, and incorporate these links into the larger project plan.

Next, consider the needs of each user group.  Will they require a full data load?  Or will a sub-set be good enough?  There’s no point in loading records that won’t be used.  Also, especially in the first one or two cycles, you may not have all of the design and scope decisions in place, so have metrics for completeness and quality that can communicate what will be delivered.  In our Workday implementation projects, we typically expect to load 80% of the record types in the first full prototype, and over 95% in the second prototype.  But for the integrations developers, we usually just provide enough transactions to facilitate development and unit testing.  Know what records are actually needed, so you don’t spend more time on a build than you really need.  And know what quality level is required, so you don’t waste time “in pursuit of an unappreciated perfection.”

Finally, consider the availability of resources.  Few projects have enough of the right people assigned to do every task in exactly the right time frame.  If you have people working on conversion as well as other related project tasks, consider making adjustments to the schedule to keep conversion in synch with the other activities.  There’s no point in delivering a build and then waiting two weeks for the team to catch up.  And if you’re going to need additional folks at some point, know about it in advance.

Each iteration of the data conversion cycle should both improve the quality of the data in the build and reduce the time required in order to complete the build.  As part of your conversion plan, establish metrics and procedures for collecting them.  Your goal should be to have the conversion process completely known and predictable by the time you are ready to make the final build for the move to production.  Next week, I’ll outline a risk taxonomy for data conversion, to help you identify risks specific to your conversion.

Validating Data Conversion

This post was extensively updated and incorporated into my new book, The Data Conversion Cycle: A guide to migrating transactions and other records for system implementation teams. Now available on Amazon.com in both Kindle format for $4.49 and paperback for $6.99.

Continuing the series on the data conversion cycle: subsequent to any bulk load of data from another system, the accuracy and completeness of the data in the target system must be assessed.  This process is generally referred to as validation, and it has three objectives:

  1. Identify errors in mapping data elements between systems
  2. Identify errors in the source system extraction processes
  3. Identify errors in the source system records

As you can see from the diagram below, the validation process is the key feedback point in the data conversion cycle.

Data Conversion CycleValidation typically uses several techniques, selected based on the nature of the data being inspected.  They include:

  • Application level comparison – A user logs in to each system and compares the information in the source system with the information in the target system.  This is the most labor-intensive technique, but invaluable for confirming that the data elements were mapped to the correct fields in the target system (Objective 1).
  • Report level comparison – A user or analyst compares a report from the source system with a report containing the same records, prepared from the target system.  The reviewer looks for differences in the number of records, and incorrect or truncated values.  Once identified, a member of the conversion team should investigate whether the difference was caused by the extraction process (Objective 2) or resulted from issues in the source system (Objective 3).  This technique doesn’t require access to the system, or even knowledge of the data or subject matter.  However, it can be fairly labor intensive.  This technique is best used when reviewing data with a limited number of values, such as assignment to an organization, as arbitrary values are difficult to compare.
  • Excel vlookup automated comparison – An analyst prepares an Excel workbook using the data extracted from the source system in one worksheet, and the corresponding data extracted from the target system in another worksheet.  An automated comparison in a third worksheet is then possible using vlookup and other analytical functions within Excel.  This approach requires more preparation, but is usually fastest in execution, especially when inspecting very large numbers of records containing arbitrary values, such as strings (names and addresses), dates, and numerical values.  As with report level comparison, differences are investigated to determine whether the root cause was the extraction process (Objective 2) or an error in the source system (Objective 3).

Much like Chinese cooking, most of the labor in a data validation exercise is in the preparation.  To that end, validation planning should include an analysis of the data being loaded to the source system, to determine the following:

  • What user-accessible field mappings have been made?  It may be possible to identify one or two users with the access rights to check all of the mappings, using application level comparison.  Note that it is not necessary to check multiple records; the purpose is to ensure that the data is “landing on the correct runway.”
  • Based on the records types being loaded, are there delivered audit reports built into the target system that can facilitate a report level comparison with the source?  If there isn’t a complete correlation between the two systems, generally a custom report can be derived from an existing report, either on the source or target system, to facilitate the comparison.
  •  Which data elements would be difficult to compare visually, in a report level comparison?  It is useful to identify the specific fields requiring an automated comparison, so that custom reports can be created in advance to extract the records for loading into Excel.  It is also common practice to load the source records into the worksheets and prepare the comparison worksheet while the data is being loaded into the target system, to save time.
  • What dependencies exist?  Usually, there are no dependencies in the inspection process, because the data, once loaded, is static.  However, if there are issues that will impact the availability of conversion resources, they should be identified in advance.

A well-planned and coordinated validation process can proceed on a broad front, with a number of workers inspecting specific record types, using specified tools and techniques.  The team should have the goal of minimizing the time required to conduct validation, as it is the final step before a move to production.  This is critical, in that during the time between the final extract of data from the legacy system used as the source, to the move of the new system to production, transaction will continue to be processed.  These transactions must then be re-entered into the new system.

The validation process falls between loading the data to the target system and actually using it, in every cycle.  Whether that initial use is development, testing, or production, the process needs to be followed.  Note that validation findings of the final load before the move to production may have to be subjected to a triage process, in that any corrections will have to be made in that production system.  Consequently, the principal measure of success of validation in earlier cycles should be the elimination of sources of error.  A solid validation and correction process should reduce the number of corrections in production to nearly zero.

Next week, I’ll address creating a data conversion plan, based on the data conversion cycle, and integrating it into the overall project plan.