Dr Taoxin Peng T.Peng@napier.ac.uk
Lecturer
It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when performing data cleaning? This paper challenges these two questions by presenting a novel framework, which provides an approach to managing data cleaning in data warehouses by focusing on the use of data quality dimensions, and decoupling a cleaning process into several sub-processes. Initial test run of the processes in the framework demonstrates that the approach presented is efficient and scalable for data cleaning in data warehouses.
Peng, T. (2008). A framework for data cleaning in data warehouses. Enterprise Information Systems, 473-478
Journal Article Type | Article |
---|---|
Publication Date | 2008 |
Deposit Date | Feb 8, 2010 |
Publicly Available Date | May 16, 2017 |
Print ISSN | 1751-7575 |
Publisher | Taylor & Francis |
Peer Reviewed | Peer Reviewed |
Pages | 473-478 |
Keywords | data cleaning; data warehouse; performance efficiency; automation; data quallity; decoupling; scalable; |
Public URL | http://researchrepository.napier.ac.uk/id/eprint/3467 |
A framework for data cleaning in data warehouses.pdf
(76 Kb)
PDF
A comparison of techniques for name matching
(2012)
Journal Article
An evaluation of name matching techniques.
(2011)
Presentation / Conference Contribution
The VoIP intrusion detection through a LVQ-based neural network.
(2009)
Presentation / Conference Contribution
Combining dimensional analysis and heuristics for causal ordering.
(2006)
Book Chapter
Towards a framework for dealing with data quality in data warehouses.
(2006)
Book Chapter
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search