Skip to main content

Research Repository

Advanced Search

Reproducibility in machine Learning-Based studies: An example of text mining

Olorisade, Babatunde K; Brereton, Pearl; Andras, Peter

Authors

Babatunde K Olorisade

Pearl Brereton

Profile Image

Prof Peter Andras P.Andras@napier.ac.uk
Dean of School of Computing Engineering and the Built Environment



Abstract

Reproducibility is an essential requirement for computational studies including those based on machine learning techniques. However, many machine learning studies are either not reproducible or are difficult to reproduce.
In this paper, we consider what information about text mining studies is crucial to successful reproduction of such studies. We identify a set of factors that affect reproducibility based on our experience of attempting to reproduce six studies proposing text mining techniques for the automation of the citation screening stage in the systematic review process. Subsequently, the reproducibility of 30 studies was evaluated based on the presence or otherwise of information relating to the factors.
While the studies provide useful reports of their results, they lack information on access to the dataset in the form and order as used in the original study (as against raw data), the software environment used, randomization control and the implementation of proposed techniques. In order to increase the chances of being reproduced, researchers should ensure that details about and/or access to information about these factors are provided in their reports.

Presentation Conference Type Conference Paper (Published)
Conference Name ICML 2017 RML Workshop: Reproducibility in Machine Learning
Start Date Aug 11, 2017
Publication Date 2017
Deposit Date Nov 9, 2021
Book Title ICML 2017 RML Workshop: Reproducibility in Machine Learning
Keywords Text mining, reproducibility, citation screening
Public URL http://researchrepository.napier.ac.uk/Output/2809095
Publisher URL https://openreview.net/forum?id=By4l2PbQ-