Lin Li
A rule based taxonomy of dirty data.
Li, Lin; Peng, Taoxin; Kennedy, Jessie
Authors
Dr Taoxin Peng T.Peng@napier.ac.uk
Lecturer
Prof Jessie Kennedy J.Kennedy@napier.ac.uk
Enhanced Associate
Abstract
There is a growing awareness that high quality of data is a key to today’s business success and that dirty data existing within data sources is one of the causes of poor data quality. To ensure high quality data, enterprises need to have a process, methodologies and resources to monitor and analyze the quality of data, methodologies for preventing and/or detecting and repairing dirty data. Nevertheless, research shows that many enterprises do not pay adequate attention to the existence of dirty data and have not applied useful methodologies to ensure high quality data for their applications. One of the reasons is a lack of appreciation of the types and extent of dirty data. In practice, detecting and cleaning all the dirty data that exists in all data sources is quite expensive and unrealistic. The cost of cleaning dirty data needs to be considered for most of enterprises. This problem has not attracted enough attention from researchers. In this paper, a rule-based taxonomy of dirty data is developed. The proposed taxonomy not only provides a mechanism to deal with this problem but also includes more dirty data types than any of existing such taxonomies.
Citation
Li, L., Peng, T., & Kennedy, J. (2011). A rule based taxonomy of dirty data. GSTF journal on computing, 1(2), 140-148
Journal Article Type | Conference Paper |
---|---|
Conference Name | Annual International Academic Conference on Data Analysis, Data Quality and Metadata Management |
Publication Date | 2011 |
Deposit Date | Feb 4, 2011 |
Publicly Available Date | May 16, 2017 |
Journal | GSTF JOurnal on Computing |
Print ISSN | 2010-2283 |
Peer Reviewed | Peer Reviewed |
Volume | 1 |
Issue | 2 |
Pages | 140-148 |
Book Title | Proceedings of Annual International Academic Conference on Data Analysis, Data Quality and Metadata Management |
ISBN | 978-981-08-6308-1 |
Keywords | Data warehousing; dirty data; data cleansing; rule-based taxonomy; |
Public URL | http://researchrepository.napier.ac.uk/id/eprint/3887 |
Files
A rule based taxonomy of dirty data
(<nobr>239 Kb</nobr>)
PDF
You might also like
Improving data quality in data warehousing applications
(2010)
Conference Proceeding
An evaluation of name matching techniques.
(2011)
Conference Proceeding
A comparison of techniques for name matching
(2012)
Journal Article
Data quality and data cleaning in database applications
(2012)
Thesis
Multi-Objective Evolutionary Optimisation for Prototype-Based Fuzzy Classifiers
(2022)
Journal Article