Dr Taoxin Peng T.Peng@napier.ac.uk
Lecturer
It is popular to use real-world data to evaluate data mining techniques. However, there are some disadvantages to use real-world data for such purposes. Firstly, real-world data in most domains is difficult to obtain for several reasons, such as budget, technical or ethical. Secondly, the use of many of the real-world data is restricted, those data sets do either not contain specific patterns that are easy to mine or the data needs special preparation and the algorithm needs very specific settings in order to find patterns in it. The solution to this could be the generation of synthetic, "meaningful data" (data with intrinsic patterns). This paper presents a novel approach for generating synthetic data by developing a tool, including novel algorithms for specific data mining patterns, and a user-friendly interface, which is able to create large data sets with predefined classification rules, multilinear regression patterns. A preliminary run of the prototype proves that the generation of large amounts of such "meaningful data" is possible. Also the proposed approach could be extended to a further development for generating synthetic data with other intrinsic patterns.
Peng, T., & Telle, A. (2018, October). A tool for generating synthetic data. Presented at DATA '18 First International Conference on Data Science, E-learning and Information Systems, Madrid, Spain
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | DATA '18 First International Conference on Data Science, E-learning and Information Systems |
Start Date | Oct 1, 2018 |
End Date | Oct 2, 2018 |
Acceptance Date | Oct 1, 2018 |
Publication Date | 2018 |
Deposit Date | Nov 6, 2018 |
Publisher | Association for Computing Machinery (ACM) |
Book Title | DATA '18 Proceedings of the First International Conference on Data Science, E-learning and Information Systems |
ISBN | 9781450365369 |
DOI | https://doi.org/10.1145/3279996.3280018 |
Public URL | http://researchrepository.napier.ac.uk/Output/1342370 |
A comparison of techniques for name matching
(2012)
Journal Article
A framework for data cleaning in data warehouses
(2008)
Journal Article
An evaluation of name matching techniques.
(2011)
Presentation / Conference Contribution
The VoIP intrusion detection through a LVQ-based neural network.
(2009)
Presentation / Conference Contribution
Combining dimensional analysis and heuristics for causal ordering.
(2006)
Book Chapter
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search