By David Loshin
Let's start with the consumption side, and look at two different use cases and consider the reasons behind completeness expectations: transaction processing and analytical processing. There should be little doubt about the need for completeness for transaction processing purposes - in most transactions, there are some data values that are required for the transaction to complete successfully. For example, your online order won't complete if you don't provide a method of payment.
However, as more organizations begin to examine how their business processes go across different functions in the business, there is a greater recognition of requirements for data values that might not be needed immediately but eventually would be used downstream. To continue our example, once your online order has been placed, the items can't be delivered to you if you did not provide a shipping address, and that means that the shipping address data is required (and must be complete) when the order transaction takes place.
From the analytical perspective, we also have data completeness expectations, and they become relatively pertinent for aggregation and roll-ups. Consider a report that combines measures for total sales and for average sales, but some of the records are missing sales amounts. Both the total amounts and the averages are going to be inaccurate as a result of missing values.
In both usage scenarios, missing data is an issue, and our next set of entries will examine missing data in more detail.
One of the most frequently referenced dimensions of data quality is completeness. At a formal level, completeness implies rules specifying mandatory assignment of values to particular data elements. In layman's terms, that specifies rules to make sure critical attributes are populated with values.
Now there are a few things to think about here regarding the critical nature of
completeness rules for data validity, from the data creation side and from the
data consumption side. Let's start with the consumption side, and look at two different use cases and consider the reasons behind completeness expectations: transaction processing and analytical processing. There should be little doubt about the need for completeness for transaction processing purposes - in most transactions, there are some data values that are required for the transaction to complete successfully. For example, your online order won't complete if you don't provide a method of payment.
However, as more organizations begin to examine how their business processes go across different functions in the business, there is a greater recognition of requirements for data values that might not be needed immediately but eventually would be used downstream. To continue our example, once your online order has been placed, the items can't be delivered to you if you did not provide a shipping address, and that means that the shipping address data is required (and must be complete) when the order transaction takes place.
From the analytical perspective, we also have data completeness expectations, and they become relatively pertinent for aggregation and roll-ups. Consider a report that combines measures for total sales and for average sales, but some of the records are missing sales amounts. Both the total amounts and the averages are going to be inaccurate as a result of missing values.
In both usage scenarios, missing data is an issue, and our next set of entries will examine missing data in more detail.





Leave a comment