Everybody has dirty data. If you are lucky, on a normal business day it isn’t noticed very much, if at all. But, during a data migration, it can cause more than an embarrassment. It can cause a whole implementation project to come to a stop. Or, in extreme cases, it can make the whole dataset unusable.
Business data is valuable. Data is wanted for the convenience it brings. And, data is needed for legal reporting requirements. We can’t just throw dirty data away. We must respect the body of knowledge.
With respect and a good strategy, we can engage in a process that will turn a troublesome legacy dataset into a new robust system of information. So. . . Riddle Me, Query Me. Let us Start Asking the Right Questions:
Riddle: noun a question or statement intentionally phrased so as to require ingenuity in ascertaining its answer or meaning, typically presented as a game.
Query: verb to ask a question about something, especially in order to express one’s doubts about it or to check its validity or accuracy.
The first set of questions involves identifying the full dataset. How many tables, how many fields, what type of information, where is it, how is it to be accessed, how or when can we make the necessary changes? Some of these questions involve looking for the answers in the dataset. However, the most import questions involve our ongoing relationship with the legacy dataset until it is imported into the new system.
If we are not careful in our planning, we may choose to rashly export the dataset and spend considerable time making the right corrections on a copy of the original version. During the normal course of business, the original dataset is still in active use in the legacy system. Independent changes are being made to the old records, new records are added and some old records are deleted as found appropriate to do so. Soon the body of knowledge is out of control as there are two separate datasets that are both valuable for different reasons.
It is best, to find a process that allows the data analyst to not only find the dirty data but can then modify the rules for how data is entered so that the same issues cannot happen again. Another option is to define the dataset, such as to “archive” the records that will be exported and cleansed. Certain types of records allow for this better than others. For example, during normal business activity there is little to no reason to modify sales order information for last month’s sales. Historical sales order information could be exported and a system of cleansing developed so that if necessary the next month’s process can go more quickly.
Customer records are not a good candidate for this process. Customer records are more likely to be randomly modified. Valuable information could be lost if customer information was exported, modified and then set aside as complete and ready for the new system. During normal business activity new information can be added to their record, such as a second shipping address. Old information could modified, such as a changed phone number. Or some customer records may be deleted as it may be found that there were duplicates, such as a customer with a former name and address.
At the first stage of Riddle Me, Query Me we must carefully ask questions about our ongoing relationship with the legacy dataset as we prepare it for its new home. Next month, we will continue Asking the Right Questions for developing our strategy against “dirty data”.
Cathy Goetz | Business Software Consultant
Cathy is responsible for supporting clients with their Microsoft Dynamics GP solutions. From the beginning, she helps her clients determine the scope of their project and provides software demonstrations. Her expertise in gathering requirements will produce a carefully planned system for successful implementation and configuration.
Cathy holds a Bachelor of Arts in English and Elementary Education from Jamestown College in Jamestown, ND. Cathy has over 28 years and a wide range of industry experience in Agriculture, Retail Logistics, Construction, Contracting, Education, Finance, Food, Gaming, Real Estate, Property Management, Legal, Government and Nonprofit. Her roles included management positions, database analyst, application developer, software consultant and trainer, business owner, founder, teacher, and senior accountant. She has been involved with several ERP implementation projects, including Dynamics GP. Her specialties include data migration, Microsoft Access application development, data integration, data cleansing, database normalizing, and developing relational databases from legacy systems. She is highly skilled in report writing, documentation, developing training materials, and presentations. Cathy is certified as a Microsoft Certified Master Instructor for Microsoft Office and has recently completed an accounting program at Frederick Community College. She is currently preparing for the CPA examination.