Validating Data Conversion

Continuing the series on the data conversion cycle: subsequent to any bulk load of data from another system, the accuracy and completeness of the data in the target system must be assessed.  This process is generally referred to as validation, and it has three objectives:

  1. Identify errors in mapping data elements between systems
  2. Identify errors in the source system extraction processes
  3. Identify errors in the source system records

As you can see from the diagram below, the validation process is the key feedback point in the data conversion cycle.

Data Conversion CycleValidation typically uses several techniques, selected based on the nature of the data being inspected.  They include:

  • Application level comparison – A user logs in to each system and compares the information in the source system with the information in the target system.  This is the most labor-intensive technique, but invaluable for confirming that the data elements were mapped to the correct fields in the target system (Objective 1).
  • Report level comparison – A user or analyst compares a report from the source system with a report containing the same records, prepared from the target system.  The reviewer looks for differences in the number of records, and incorrect or truncated values.  Once identified, a member of the conversion team should investigate whether the difference was caused by the extraction process (Objective 2) or resulted from issues in the source system (Objective 3).  This technique doesn’t require access to the system, or even knowledge of the data or subject matter.  However, it can be fairly labor intensive.  This technique is best used when reviewing data with a limited number of values, such as assignment to an organization, as arbitrary values are difficult to compare.
  • Excel vlookup automated comparison – An analyst prepares an Excel workbook using the data extracted from the source system in one worksheet, and the corresponding data extracted from the target system in another worksheet.  An automated comparison in a third worksheet is then possible using vlookup and other analytical functions within Excel.  This approach requires more preparation, but is usually fastest in execution, especially when inspecting very large numbers of records containing arbitrary values, such as strings (names and addresses), dates, and numerical values.  As with report level comparison, differences are investigated to determine whether the root cause was the extraction process (Objective 2) or an error in the source system (Objective 3).

Much like Chinese cooking, most of the labor in a data validation exercise is in the preparation.  To that end, validation planning should include an analysis of the data being loaded to the source system, to determine the following:

  • What user-accessible field mappings have been made?  It may be possible to identify one or two users with the access rights to check all of the mappings, using application level comparison.  Note that it is not necessary to check multiple records; the purpose is to ensure that the data is “landing on the correct runway.”
  • Based on the records types being loaded, are there delivered audit reports built into the target system that can facilitate a report level comparison with the source?  If there isn’t a complete correlation between the two systems, generally a custom report can be derived from an existing report, either on the source or target system, to facilitate the comparison.
  •  Which data elements would be difficult to compare visually, in a report level comparison?  It is useful to identify the specific fields requiring an automated comparison, so that custom reports can be created in advance to extract the records for loading into Excel.  It is also common practice to load the source records into the worksheets and prepare the comparison worksheet while the data is being loaded into the target system, to save time.
  • What dependencies exist?  Usually, there are no dependencies in the inspection process, because the data, once loaded, is static.  However, if there are issues that will impact the availability of conversion resources, they should be identified in advance.

A well-planned and coordinated validation process can proceed on a broad front, with a number of workers inspecting specific record types, using specified tools and techniques.  The team should have the goal of minimizing the time required to conduct validation, as it is the final step before a move to production.  This is critical, in that during the time between the final extract of data from the legacy system used as the source, to the move of the new system to production, transaction will continue to be processed.  These transactions must then be re-entered into the new system.

The validation process falls between loading the data to the target system and actually using it, in every cycle.  Whether that initial use is development, testing, or production, the process needs to be followed.  Note that validation findings of the final load before the move to production may have to be subjected to a triage process, in that any corrections will have to be made in that production system.  Consequently, the principal measure of success of validation in earlier cycles should be the elimination of sources of error.  A solid validation and correction process should reduce the number of corrections in production to nearly zero.

Next week, I’ll address creating a data conversion plan, based on the data conversion cycle, and integrating it into the overall project plan.

New PM Articles for the Week of March 11 –17

New project management articles published on the web during the week of March 11 – 17.  Dave and Sandra read all of this stuff so you don’t have to!  Recommended:

  • Andy Jordan reminds us that the way we deal with stressful situations is what defines us as leaders.
  • Elizabeth Harrin and Phil Peplow identify six barriers to customer centricity.  Extracted from their book, “Customer-centric Project Management.”
  • Peter Tarhanidis asserts that the customer mindset is always right.
  • Shim Marom presents a decision tree on “Should you attend that meeting?” courtesy of Elizabeth Grace Saunders.
  • Gary Laverty reviews Charles Tryon’s third edition of “Managing Organizational Knowledge.”
  • Frank Saladis has a new book out, “Positive Leadership in Project Management.”
  • Samad Aidane interviews Lina Echeverria, author of “Idea Agent,” 63 minutes.  Also  Geoff Trickey on risk personality types, 45 minutes.  Both safe for work.
  • John Simko shares specific criteria when doing nothing is the appropriate action.
  • Dave Prior continues his interview with Personal Kanban author Jim Prior.  Just 25 minutes, safe for work.
  • LeRoy Ward notes that, if your organization isn’t good at project management, Agile practices will like make a bad situation worse.
  • Ben Ferris shares five tips for making better decisions.
  • Bruce Benson notes that finding the root cause of your current problems is the first step; looking at alternatives comes later.
  • Chuck Morton explains risk buffers, using the example of a daily commute.
  • APQC has released a best-practices study called “Effective Project Management Offices.”
  • Roz Baker lists the bare minimum fields required in a defect log.  Her definition of “PICNIC” is at the bottom, next to the smiley face.
  • Andrew Makar presents a great tutorial on how to create a custom status report within Project 2010.
  • Sue Cochran creates a Project Charter primer by defining the core charter
  • Whitson Gordon sings the praises of Evernote.  (And so do I!)
  • Brett Beaubouef explains the basic equation of requirements management: What + Why = How.
  • Patrick Richard shares insights from a Harvard Business Review blog post, on why employers aren’t filling their open jobs.
  • Toni Bowers has a short list of meaningless phrases you should remove from your resume.  Like “responsible for …”
  • Chip Camden addresses the question of whether age-ism is a factor for us IT consultants.  And then he shaves off his beard!
  • Amanda Augustine has some pointers for IT types who need to perfect their pitch.  Elevator pitch, that is.
  • Srinivasa Rao has been observing the behavior of his Generation Y colleagues, and thinks they will change the way projects are managed.
  • Venkat Rao addresses the biological aspects of curiosity, and ranges from boredom to the destruction of the universe.  Long, but a worthwhile read.


Perfection is Over-rated

People who have known me any length of time have probably heard me say that perfection is over-rated.  Some folks think I’m joking, but I’m not.  The plain truth is that there is a certain point – call it “good enough” – where you should stop polishing.  Generally, the quality / cost curve is an asymptote, meaning the next unit of cost gives you an ever smaller improvement.  Those who keep struggling for incremental improvements beyond that quality goal are simply wasting resources, especially time.

The challenge for practicing IT project manager is to work with the stakeholders to get agreement on what is good enough.  And then once that target is reached, to get the perfectionists to move on.  Because perfection, delivered too late, is a complete waste.