Improving Classification of Metamorphic Malware by Augmenting Training Data with a Diverse Set of Evolved Mutant Samples

Babaagba, Kehinde; Tan, Zhiyuan; Hart, Emma

doi:10.1109/CEC48606.2020.9185668

Improving Classification of Metamorphic Malware by Augmenting Training Data with a Diverse Set of Evolved Mutant Samples

Babaagba, Kehinde; Tan, Zhiyuan; Hart, Emma

Authors

Dr Kehinde Babaagba K.Babaagba@napier.ac.uk
Lecturer

Dr Thomas Tan Z.Tan@napier.ac.uk
Associate Professor

Prof Emma Hart E.Hart@napier.ac.uk
Professor

Abstract

Detecting metamorphic malware provides a challenge to machine-learning models as trained models might not generalise to future mutant variants of the malware. To address this, we explore whether machine-learning models can be improved by augmenting training data-sets with samples of potential variants. These variants are generated using an evolutionary algorithm that evolves a behaviourally diverse set of mutants, optimised to avoid detection by a large set of existing detection-engines. Using features calculated from the behavioural trace of a sample as input, we evaluate the ability of five machine-learning methods to detect the new variants, show that the detection rate is considerably improved by including the new samples as training data, and that the classifiers still generalise over a range of malware. We then repeat this experiment using a sequence-based deep-learning method as the classifier, which is shown to out-perform the feature-based classifiers.

Citation

Babaagba, K., Tan, Z., & Hart, E. (2020, July). Improving Classification of Metamorphic Malware by Augmenting Training Data with a Diverse Set of Evolved Mutant Samples. Presented at The 2020 IEEE Congress on Evolutionary Computation (IEEE CEC 2020), Glasgow, UK

Presentation Conference Type	Conference Paper (published)
Conference Name	The 2020 IEEE Congress on Evolutionary Computation (IEEE CEC 2020)
Start Date	Jul 19, 2020
End Date	Jul 24, 2020
Acceptance Date	Mar 20, 2020
Online Publication Date	Sep 3, 2020
Publication Date	Sep 3, 2020
Deposit Date	Apr 28, 2020
Publicly Available Date	Sep 3, 2020
Publisher	Institute of Electrical and Electronics Engineers
DOI	https://doi.org/10.1109/CEC48606.2020.9185668
Keywords	Machine-learning; Evolutionary computing; Malware and Computer security
Public URL	http://researchrepository.napier.ac.uk/Output/2656040