Dr Taoxin Peng T.Peng@napier.ac.uk
Lecturer
Towards a Synthetic Data Generator for Matching Decision Trees
Peng, Taoxin; Hanke, Florian
Authors
Florian Hanke
Abstract
It is popular to use real-world data to evaluate or teach data mining techniques. However, there are some disadvantages to use real-world data for such purposes. Firstly, real-world data in most domains is difficult to obtain for several reasons, such as budget, technical or ethical. Secondly, the use of many of the real-world data is restricted or in the case of data mining, those data sets do either not contain specific patterns that are easy to mine for teaching purposes or the data needs special preparation and the algorithm needs very specific settings in order to find patterns in it. The solution to this could be the generation of synthetic, “meaningful data” (data with intrinsic patterns). This paper presents a framework for such a data generator, which is able to generate datasets with intrinsic patterns, such as decision trees. A preliminary run of the prototype proves that the generation of such “meaningful data” is possible. Also the proposed approach could be extended to a further development for generating synthetic data with other intrinsic patterns
Citation
Peng, T., & Hanke, F. (2016). Towards a Synthetic Data Generator for Matching Decision Trees. In Proceedings of the 18th International Conference on Enterprise Information Systems (135-141). https://doi.org/10.5220/0005829001350141
Presentation Conference Type | Conference Paper (Published) |
---|---|
Conference Name | 18th International Conference on Enterprise Information Systems |
Start Date | Apr 25, 2016 |
End Date | Apr 28, 2016 |
Acceptance Date | Feb 20, 2016 |
Online Publication Date | Apr 25, 2016 |
Publication Date | Apr 25, 2016 |
Deposit Date | Dec 13, 2017 |
Publicly Available Date | Dec 15, 2017 |
Publisher | Scitepress Digital Library |
Pages | 135-141 |
Book Title | Proceedings of the 18th International Conference on Enterprise Information Systems |
Chapter Number | 135-141 |
ISBN | 978-989-758-187-8 |
DOI | https://doi.org/10.5220/0005829001350141 |
Keywords | Synthetic, Data Generator, Data Mining, Decision Trees, Classification, Pattern |
Public URL | http://researchrepository.napier.ac.uk/Output/947202 |
Publisher URL | http://www.scitepress.org/DigitalLibrary |
Contract Date | Dec 13, 2017 |
Files
Towards a Synthetic Data Generator for Matching Decision Trees
(476 Kb)
PDF
You might also like
A comparison of techniques for name matching
(2012)
Journal Article
A framework for data cleaning in data warehouses
(2008)
Journal Article
An evaluation of name matching techniques.
(2011)
Presentation / Conference Contribution
Improving data quality in data warehousing applications
(2010)
Presentation / Conference Contribution
A rule based taxonomy of dirty data.
(2011)
Presentation / Conference Contribution
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search