By Elliot King
Fortunately, that is not case with data quality. Beyond accuracy, high quality data generally has three clear and measurable characteristics--consistency, completeness and compactness. Since information systems are complex, in many cases the same "fact" is represented inconsistently. Inconsistent or dirty data is introduced into the information system because integrity and domain constraints and data rules are not rigorously enforced.
A numerical representation of a month, for example, must fall between 1 and 12. If the system requires two digits in the month field, the representation of a month must be between 01 and 12.Since data is captured through various methods in many organizations, too frequently, a month can be represented in different ways.
Another common source of inconsistent data is when companies fail to adhere to business rules. For example, an "order due" date should not be earlier than an "order placed" date and so on. Inconsistent data can cause significant problems in downstream processing and analytics.
The second characteristic of high quality data is completeness. Different parts of an organization needs different kinds of information and data records should provide the information needed by all the stakeholders. For example, the maintenance department of a new car dealership may want to link maintenance records to model type and owner. The sales department may be most interested in the number of customer visits prior to closing a sale, per individual. The marketing department might be most interested in basic customer information. A good information system will capture all of that data.
The third characteristic of high quality data is compactness. Redundant data--multiple records reflecting the same person, for example--helps fuel significant data problems. Perhaps most damaging, redundant records can be very misleading. A company may overestimate the number of customers it has or underestimate the value of an individual customer if multiple records represent a single customer.
Consistency, completeness and compactness are essential characteristics of high quality data. They can be identified, measured and rectified if needed. But it takes effort, attention and commitment to do so.
In many cases, quality, like beauty, is in the eyes of the beholder. The exact characteristics that define quality can be hard to describe. For example, a news report recently described a new, synthetic method for producing diamonds. Would those diamonds be of the same quality? And would they be as desirable, as diamonds mined and refined in the regular way? The answer depends on whom you ask.
Fortunately, that is not case with data quality. Beyond accuracy, high quality data generally has three clear and measurable characteristics--consistency, completeness and compactness. Since information systems are complex, in many cases the same "fact" is represented inconsistently. Inconsistent or dirty data is introduced into the information system because integrity and domain constraints and data rules are not rigorously enforced.
A numerical representation of a month, for example, must fall between 1 and 12. If the system requires two digits in the month field, the representation of a month must be between 01 and 12.Since data is captured through various methods in many organizations, too frequently, a month can be represented in different ways.
Another common source of inconsistent data is when companies fail to adhere to business rules. For example, an "order due" date should not be earlier than an "order placed" date and so on. Inconsistent data can cause significant problems in downstream processing and analytics.
The second characteristic of high quality data is completeness. Different parts of an organization needs different kinds of information and data records should provide the information needed by all the stakeholders. For example, the maintenance department of a new car dealership may want to link maintenance records to model type and owner. The sales department may be most interested in the number of customer visits prior to closing a sale, per individual. The marketing department might be most interested in basic customer information. A good information system will capture all of that data.
The third characteristic of high quality data is compactness. Redundant data--multiple records reflecting the same person, for example--helps fuel significant data problems. Perhaps most damaging, redundant records can be very misleading. A company may overestimate the number of customers it has or underestimate the value of an individual customer if multiple records represent a single customer.
Consistency, completeness and compactness are essential characteristics of high quality data. They can be identified, measured and rectified if needed. But it takes effort, attention and commitment to do so.





Leave a comment