Data quality

Ensuring data quality should be seen as an ongoing improvement program that is managed throughout data's lifecycle.

Data Management Association (DAMA) defines common characteristics (dimensions) of data quality as:

  • accuracy
  • completeness
  • consistency
  • integrity
  • reasonability
  • timeliness
  • uniqueness/deduplication
  • validity.

Data quality management is a continuous process which involves managing data from its initial creation to its potential destruction. The quality of your agency's data should always be fit for purpose. You can support this by establishing a data quality strategy that facilitates proactive monitoring and managing of data quality. Eg, data quality assessments are embedded in data migration activities.

A data quality strategy should link to your broader data and information governance environment, including your information governance framework.

Data quality assessment

A good data quality strategy defines appropriate standards, requirements and specifications for data quality controls. This includes developing data dimensions relevant to your business needs to monitor, measure, and report on quality levels of your data.

Data quality assessment tells you how effective data is in meeting your stakeholders' requirements and also helps you prioritise remediation on high value datasets.

Data quality is assessed by measuring specific dimensions of your data.

They provide a:

  • vocabulary for defining data requirements
  • way to determine data quality assessment results
  • metric for ongoing measurement and improvement. (DAMA, 2017)

There are different dimensions that can be used to assess data quality, eg:

  • common dimensions of data quality from DAMA's Body of Knowledge
  • the Australian Bureau of Statistics (ABS) provides guidance on assessing against ABS dimensions, to determine the quality of statistical data
  • ISO8000 – a global standard for data quality and enterprise master data. You can use this to inform your agency’s data quality standards.

Data quality tools

Tools can be used as a guide to understand the different dimensions of data quality and generate data quality statements. An example is the NSW Government, data quality reporting tool that can be used to generate data quality statements in various document formats.

Tools that automate data profiling and cleansing are also available and can help your agency enhance large amounts of data.

These tools can:

  • profile, clean and monitor data quality over time
  • assist in the validation of data
  • provide statistics on agencies data
  • help to identify patterns and provide direction on future data remediation.

TIP: Data remediation can be achieved through the use of ETLsoftware which can process data based on business rules and transform the data into the required format.

Poor data quality

Common culprits for poor data quality

Outcome

Incorrect data entry validation

Invalid data is entered into the database

Change in business rules

New rules are not correctly propagated throughout existing data

Changes to the source data structure

Third-parties implement changes without notifying downstream users; business rules are not updated on systems following notification of changes.

Requirement for uniqueness of instances

Incorrect identifiers being created

Incorrect business rules being applied to data

Loss of data

Incorrect temporal information

Difficulty to identify latest version of information and data, resulting in duplication

Data quality and metadata

Good metadata is essential in understanding and assessing the quality of your data. Data quality assessments determine if your data meets the expectations of its consumers and metadata plays a key role in clarifying those expectations. Eg, you can look at a record’s metadata to see if it meets format requirements or if it has been updated according to business rules.

Metadata can also be used to record data quality assessments; this means metadata repositories can be used for storing and sharing data quality assessment results across your organisation.

Your metadata and data quality teams can work closely together to develop these processes. Their combined expertise can ensure that business rules, measurements or issues related to data quality are documented, developed and managed as per your agency's data strategy.

Copyright National Archives of Australia 2019