A rule based taxonomy of dirty data.

Li, Lin; Peng, Taoxin; Kennedy, Jessie

A rule based taxonomy of dirty data.

Li, Lin; Peng, Taoxin; Kennedy, Jessie

Authors

Lin Li

Dr Taoxin Peng T.Peng@napier.ac.uk
Lecturer

Prof Jessie Kennedy J.Kennedy@napier.ac.uk
Emeritus Professor

Abstract

There is a growing awareness that high quality of data is a key to today’s business success and that dirty data existing within data sources is one of the causes of poor data quality. To ensure high quality data, enterprises need to have a process, methodologies and resources to monitor and analyze the quality of data, methodologies for preventing and/or detecting and repairing dirty data. Nevertheless, research shows that many enterprises do not pay adequate attention to the existence of dirty data and have not applied useful methodologies to ensure high quality data for their applications. One of the reasons is a lack of appreciation of the types and extent of dirty data. In practice, detecting and cleaning all the dirty data that exists in all data sources is quite expensive and unrealistic. The cost of cleaning dirty data needs to be considered for most of enterprises. This problem has not attracted enough attention from researchers. In this paper, a rule-based taxonomy of dirty data is developed. The proposed taxonomy not only provides a mechanism to deal with this problem but also includes more dirty data types than any of existing such taxonomies.

Citation

Li, L., Peng, T., & Kennedy, J. A rule based taxonomy of dirty data. Presented at Annual International Academic Conference on Data Analysis, Data Quality and Metadata Management

Presentation Conference Type	Conference Paper (published)
Conference Name	Annual International Academic Conference on Data Analysis, Data Quality and Metadata Management
Publication Date	2011
Deposit Date	Feb 4, 2011
Publicly Available Date	May 16, 2017
Journal	GSTF JOurnal on Computing
Print ISSN	2010-2283
Peer Reviewed	Peer Reviewed
Volume	1
Issue	2
Pages	140-148
Book Title	Proceedings of Annual International Academic Conference on Data Analysis, Data Quality and Metadata Management
ISBN	978-981-08-6308-1
Keywords	Data warehousing; dirty data; data cleansing; rule-based taxonomy;
Public URL	http://researchrepository.napier.ac.uk/id/eprint/3887
Contract Date	May 16, 2017

Files

A rule based taxonomy of dirty data (239 Kb)
PDF

A comparison of techniques for name matching (2012)
Journal Article

A framework for data cleaning in data warehouses (2008)
Journal Article

An evaluation of name matching techniques. (2011)
Presentation / Conference Contribution

The VoIP intrusion detection through a LVQ-based neural network. (2009)
Presentation / Conference Contribution

Combining dimensional analysis and heuristics for causal ordering. (2006)
Book Chapter

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

Citation

Files

You might also like

Downloadable Citations