Improving data quality in data warehousing applications

Li, Lin; Peng, Taoxin; Kennedy, Jessie

doi:10.5220/0002903903790382

Improving data quality in data warehousing applications

Li, Lin; Peng, Taoxin; Kennedy, Jessie

Authors

Lin Li

Dr Taoxin Peng T.Peng@napier.ac.uk
Lecturer

Prof Jessie Kennedy J.Kennedy@napier.ac.uk
Emeritus Professor

Contributors

Joaquim Filipe
Editor

Jos� Cordeiro
Editor

Abstract

There is a growing awareness that high quality of data is a key to today’s business success and dirty data that exits within data sources is one of the reasons that cause poor data quality. To ensure high quality, enterprises need to have a process, methodologies and resources to monitor and analyze the quality of data, methodologies for preventing and/or detecting and repairing dirty data. However in practice, detecting and cleaning all the dirty data that exists in all data sources is quite expensive and unrealistic. The cost of cleaning dirty data needs to be considered for most of enterprises. Therefore conflicts may arise if an organization intends to clean their data warehouses in that how do they select the most important data to clean based on their business requirements. In this paper, business rules are used to classify dirty data types based on data quality dimensions. The proposed method will be able to help to solve this problem by allowing users to select the appropriate group of dirty data types based on the priority of their business requirements. It also provides guidelines for measuring the data quality with respect to different data quality dimensions and also will be helpful for the development of data cleaning tools.

Citation

Li, L., Peng, T., & Kennedy, J. (2010, June). Improving data quality in data warehousing applications. Presented at Proceedings of the 12th International Conference on Enterprise Information Systems

Presentation Conference Type	Conference Paper (published)
Conference Name	Proceedings of the 12th International Conference on Enterprise Information Systems
Start Date	Jun 8, 2010
End Date	Jun 12, 2010
Publication Date	2010
Deposit Date	Feb 4, 2011
Publicly Available Date	May 16, 2017
Peer Reviewed	Peer Reviewed
Volume	1
Pages	379-382
Book Title	Proceedings of the 12th International Conference on Enterprise Information Systems
ISBN	9789898425041; 9789898425058; 9789898425065; 9789898425072; 9789898425089
DOI	https://doi.org/10.5220/0002903903790382
Keywords	Data quality; dirty data; data cleaning tools; data warehousing;
Public URL	http://researchrepository.napier.ac.uk/id/eprint/3886
Contract Date	May 16, 2017

Files

Improving data quality in data warehousing applications.pdf (84 Kb)
PDF

A comparison of techniques for name matching (2012)
Journal Article

A framework for data cleaning in data warehouses (2008)
Journal Article

An evaluation of name matching techniques. (2011)
Presentation / Conference Contribution

The VoIP intrusion detection through a LVQ-based neural network. (2009)
Presentation / Conference Contribution

Combining dimensional analysis and heuristics for causal ordering. (2006)
Book Chapter

Downloadable Citations

HTML

BIB

RTF