Dr Taoxin Peng T.Peng@napier.ac.uk
Lecturer
Towards a framework for dealing with data quality in data warehouses.
Peng, Taoxin
Authors
Contributors
P Petratos
Editor
Abstract
The popularity of data warehouses (DWs) in recent years confirms the importance of data quality in today’s business success. It is estimated that as high as 75% of the effort spent on building a data warehouse can be attributed to back-end issues, such as readying the data and transporting it into the data warehouse. In order to improve the efficiency of building up a data warehouse, other than issues about design and implementation, data cleaning is a crucial task. Regarding this task, there are at least two questions needed to be answered: How can we manage to reduce the time used for data cleaning? How can we manage to improve the degree of automation when performing data cleaning? This paper attempts to answer these two questions by presenting a novel framework, which provides an approach to managing data cleaning in data warehouses by focusing on the use of data quality factors, and decoupling the cleaning process into several sub-processes. Initial test run of the processes in the framework demonstrates that the approach presented is efficient and scalable for cleaning data in data warehouses.
Publication Date | 2006 |
---|---|
Deposit Date | Mar 16, 2010 |
Peer Reviewed | Peer Reviewed |
Pages | 241-256 |
Book Title | Current Computing Developments in E-Commerce, Security, HCI, DB, Collaborative and Cooperative Systems |
ISBN | 960-6672-07-7 |
Keywords | Data warehouses; quality; cleaning; framework; scalable; |
Public URL | http://researchrepository.napier.ac.uk/id/eprint/3429 |
You might also like
Feature selection Inspired classifier ensemble reduction.
(2014)
Journal Article
A comparison of techniques for name matching
(2012)
Journal Article
A framework for data cleaning in data warehouses
(2008)
Journal Article
An evaluation of name matching techniques.
(2011)
Presentation / Conference Contribution
Improving data quality in data warehousing applications
(2010)
Presentation / Conference Contribution
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search