A critical analysis of studies that address the use of text mining for citation screening in systematic reviews

Olorisade, Babatunde K; de Quincey, Ed; Brereton, Pearl; Andras, Peter

doi:10.1145/2915970.2915982

A critical analysis of studies that address the use of text mining for citation screening in systematic reviews

Olorisade, Babatunde K; de Quincey, Ed; Brereton, Pearl; Andras, Peter

Authors

Babatunde K Olorisade

Ed de Quincey

Pearl Brereton

Prof Peter Andras P.Andras@napier.ac.uk
Dean of School of Computing Engineering and the Built Environment

Abstract

Background: Since the introduction of the systematic review process to Software Engineering in 2004, researchers have investigated a number of ways to mitigate the amount of effort and time taken to filter through large volumes of literature.

Aim: This study aims to provide a critical analysis of text mining techniques used to support the citation screening stage of the systematic review process.

Method: We critically re-reviewed papers included in a previous systematic review which addressed the use of text mining methods to support the screening of papers for inclusion in a review. The previous review did not provide a detailed analysis of the text mining methods used. We focus on the availability in the papers of information about the text mining methods employed, including the description and explanation of the methods, parameter settings, assessment of the appropriateness of their application given the size and dimensionality of the data used, performance on training, testing and validation data sets, and further information that may support the reproducibility of the included studies.

Results: Support Vector Machines (SVM), Naïve Bayes (NB) and Committee of classifiers (Ensemble) are the most used classification algorithms. In all of the studies, features were represented with Bag-of-Words (BOW) using both binary features (28%) and term frequency (66%). Five studies experimented with n-grams with n between 2 and 4, but mostly the unigram was used. χ2, information gain and tf-idf were the most commonly used feature selection techniques. Feature extraction was rarely used although LDA and topic modelling were used. Recall, precision, F and AUC were the most used metrics and cross validation was also well used. More than half of the studies used a corpus size of below 1,000 documents for their experiments while corpus size for around 80% of the studies was 3,000 or fewer documents. The major common ground we found for comparing performance assessment based on independent replication of studies was the use of the same dataset but a sound performance comparison could not be established because the studies had little else in common. In most of the studies, insufficient information was reported to enable independent replication. The studies analysed generally did not include any discussion of the statistical appropriateness of the text mining method that they applied. In the case of applications of SVM, none of the studies report the number of support vectors that they found to indicate the complexity of the prediction engine that they use, making it impossible to judge the extent to which over-fitting might account for the good performance results.

Conclusions: There is yet to be concrete evidence about the effectiveness of text mining algorithms regarding their use in the automation of citation screening in systematic reviews. The studies indicate that options are still being explored, but there is a need for better reporting as well as more explicit process details and access to datasets to facilitate study replication for evidence strengthening. In general, the reader often gets the impression that text mining algorithms were applied as magic tools in the reviewed papers, relying on default settings or default optimization of available machine learning toolboxes without an in-depth understanding of the statistical validity and appropriateness of such tools for text mining purposes.

Citation

Olorisade, B. K., de Quincey, E., Brereton, P., & Andras, P. (2016, June). A critical analysis of studies that address the use of text mining for citation screening in systematic reviews. Presented at EASE '16: 20th International Conference on Evaluation and Assessment in Software Engineering, Limerick, Ireland

Presentation Conference Type	Conference Paper (published)
Conference Name	EASE '16: 20th International Conference on Evaluation and Assessment in Software Engineering
Start Date	Jun 1, 2016
End Date	Jun 3, 2016
Online Publication Date	Jun 1, 2016
Publication Date	2016
Deposit Date	Nov 10, 2021
Publisher	Association for Computing Machinery (ACM)
Pages	1-11
Book Title	EASE '16: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering
ISBN	978-1-4503-3691-8
DOI	https://doi.org/10.1145/2915970.2915982
Public URL	http://researchrepository.napier.ac.uk/Output/2809199

Steering angle sensorless control for four-wheel steering vehicle via sliding mode control method (2023)
Journal Article

Federated Learning for Short-term Residential Load Forecasting (2022)
Journal Article

Scalability resilience framework using application-level fault injection for cloud-based software services (2022)
Journal Article

A Preliminary Scoping Study of Federated Learning for the Internet of Medical Things (2021)
Book Chapter

Compounding barriers to fairness in the digital technology ecosystem (2021)
Presentation / Conference Contribution

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

Citation

You might also like

Downloadable Citations