Recently in data profiling Category

Achieving "Proactivity?"

| No Comments | No TrackBacks
By David Loshin

Standardizing the approaches and methods used for reviewing data errors, performing root cause analysis, and designing and applying corrective or remedial measures all help ratchet an organization's data quality maturity up a notch or two. This is particularly effective when fixing the processes that allow data errors to be introduced in the first place totally eliminates the errors altogether.

In the cases where the root cause is not feasibly addressed, we still have another standardized approach: defining data validity rules that can be incorporated into probe points in the processes to monitor compliance with expectations and alert a data steward as early as possible when invalid data is recognized.

This certainly reduces the "reactive culture" I discussed in one of the previous posts, and governing the data stewardship activities by combining automated inspection tools such as data profiling, automated data correction and cleansing tools, and incident management reduces replicated analysis efforts as well as repetitive fixes applied at different places and times. In fact, many organizations consider this level of maturity as being proactive in data quality management because you are anticipating the need to address issues that you already know about.

However, I might take a little bit of a contrarian view on this: to truly be proactive you'd have to go beyond anticipating what you know. In this light, we might say that instituting controls supporting inspection, monitoring, and notifications is less about being not proactive and more about being reactive much earlier in the process.

To really be proactive, perhaps it might be more worthwhile to attempt to anticipate the types of errors that you don't already know. Instead of only using profiling tools to look for existing patterns and errors, you might use these analytical tools to understand the methods and channels through which any types of potential errors could occur and attempt to control the introduction of flawed data before it ever leads to any material impact!

Standardizing Your Approach to Monitoring the Quality of Data

| No Comments | No TrackBacks
By David Loshin

In my last post, I suggested three techniques for maturing your organizational approach to data quality management. The first recommendation was defining processes for evaluating errors when they are identified. These types of processes actually involve a few key techniques:

1) An approach to specifying data validity rules that can be used to determine whether a data instance or record has an error. This is more of a discipline that can be guided by formal representations of business or data rules. Often metadata management tools and data profiling tools have repositories for capturing defined rules, leading to our next technique...

2) A method for applying those rules to data. This often will take advantage of the operational aspects of a data profiling or monitoring tool to validate a data instance against a set of rules. It may also incorporate parsing and standardization rules to identify known error patterns.

3) A means for reporting errors to a data analyst or steward. Some data analysis and profiling can be configured to automatically notify a data steward when a data validity rule is violated. In other situations, the results of applying the validation rule can be accumulated in a repository and a front-end reporting tool is used to provide visualization and notification of errors.

4) An inventory of actions to take when specific errors occur. As your team becomes more knowledgeable about the types of errors that can occur, you will also become accustomed to the methods employed for analysis and remediation.
In time, the repeated use of tools and the corresponding actions for remediation can be evolved into standardized methods, which can be documented, published, and used as the basis for training data quality analysts.


Four Pillars of Data Quality Improvement

| No Comments | No TrackBacks
By Elliot King

Elliot King
Almost all data quality management programs have four key elements that serve as the foundations for success--data profiling, data improvement, integration and data augmentation. Put in other words, data quality programs must determine what is broken; fix what can be fixed; consolidate what can be consolidated and enhance what needs to be enhanced. Sounds easy, right? If only.

Data profiling is the process for identifying what records are "broken." It consists of comparing your actual data to what you think you should have. Since data flows into organizations via so many routes, errors are inevitable. But if you never look for them, you won't know the data is flawed until something unexpected--usually unexpectedly bad--happens.

Once you know what's wrong, you can set about fixing it. But like a house that is in disrepair, you don't have to do everything at once. You may not want to correct some errors at all if they do not have a significant impact. Some mistakes in the data may be so fundamental that you simply cannot risk using it at all. Sometimes, a record may be incomplete but adding a placeholder--a standard substitute value--may be enough. And all the other errors you find, well, you will probably want to fix them.

The next two elements of data quality improvement programs go beyond finding and fixing what can be fixed. Many organizations have a boatload of redundant data--a single customer's name and address may be stored in numerous different databases. Those records should be consolidated. The more places data is stored, the greater the odds that inconsistencies will be introduced and inconsistencies inevitably lead to errors.

Finally, the data you have may not be sufficient to address your business needs. Good data quality improvement programs take steps to augment the existing corporate information. The more good information you have, the more value you can develop from it.


Authors