Detecting metamorphic malware provides a challenge to machine-learning models as trained models might not generalise to future mutant variants of the malware. To address this, we explore whether machine-learning models can be improved by augmenting training data-sets with samples of potential variants. These variants are generated using an evolutionary algorithm that evolves a behaviourally diverse set of mutants, optimised to avoid detection by a large set of existing detection-engines. Using features calculated from the behavioural trace of a sample as input, we evaluate the ability of five machine-learning methods to detect the new variants, show that the detection rate is considerably improved by including the new samples as training data, and that the classifiers still generalise over a range of malware. We then repeat this experiment using a sequence-based deep-learning method as the classifier, which is shown to out-perform the feature-based classifiers.
Babaagba, K., Tan, Z., & Hart, E. (2020). Improving Classification of Metamorphic Malware by Augmenting Training Data with a Diverse Set of Evolved Mutant Samples. . https://doi.org/10.1109/CEC48606.2020.9185668