Data Quality Problems are Predictable

| No Comments | No TrackBacks
By Elliot King

Elliot King
The idea that poor data quality is costly and hurts performance is about as old as science itself. The seminal science writer Stephen Jay Gould wrote a whole book about how faulty data leads to faulty conclusions, often to the great detriment of society. And one of lasting aphorism in computing has been "garbage in, garbage out."

Moreover, the problems and risks of poor data quality have been studied, described and quantified for decades. Data scientists have explored how to ameliorate data quality problems; software vendors have developed the needed tools, and companies have invested heavily in technology to rectify the shortcomings in their data.

So why do these problems persist and are entirely predictable, despite efforts to correct them? The most obvious reason is that in most organizations, data quality issues are not top-line agenda items for those in a position to ensure that they are regularly addressed. Too often, nobody truly feels that they "own" specific data, so if there are problems, they assume that somebody else will fix them. Even worse, in some cases users may not even feel the need to address bad data.

But the most obvious reason is not always the most compelling. After all, employees can be trained to be alert to data problems. They can be educated about the impact of poor data quality on ongoing operations and company success. Safeguarding data quality can be assigned as part of their jobs.

All that will help immensely, but it won't completely prevent data quality challenges. The primary reason data quality problems continue to haunt us is that data flows are open. Data continually streams in from a wide variety of sources, many of which are only marginally controlled. Moreover, the internal use of data changes over time and that places new demands on data. Currently, corporate data should be thought of as being organic. It grows and changes over time, as does its uses.

So you should come to terms with data quality issues and accept that they will always arise. Data quality improvement must be a continual process designed to limit the negative impact of bad data. But bad data will never be entirely eliminated.


No TrackBacks

TrackBack URL: http://blog.melissadata.com/mt-tb.cgi/190

Leave a comment