In recent times, malware detection and analysis are becoming key issues. A dangerous class of malware is metamorphic malware which is capable of modifying its own code and hiding malicious instructions within normal program code. Current malware detectors are susceptible to metamorphic malware as they are pre-trained to recognize only predicted versions of code. However, if detectors could be trained on a larger set of data that included potential mutant variants, they could be more accurate. The task of finding new evasive variants is challenging - many variants might exist.
In this research, a two-phase system is proposed. First, a mutation only Evolutionary Algorithm (EA) is used to search for a diverse set of new, malicious mutants, that evade detection by existing detection algorithms. While this is shown to be successful, it requires multiple runs of the algorithm to produce multiple variants without explicit guarantee of diversity. To address this, a Quality Diversity (QD) algorithm — MAP-Elites, that traverses a high-dimensional search space in search of the best solution at every point of a feature space with low dimension, is then developed to return a large and diverse repertoire of solutions in a single run. This method produces a larger and more diverse archive of solutions than the mutation only Evolutionary Algorithm (EA) and sheds insight into the properties of a sample that lead to them being undetectable by a suite of existing detection engines.
Having created a set of evasive and diverse variants, detectors are then trained using a set of classical classification methods (feature-based and sequence-based models) with results showing that classification of metamorphic malware can be improved by augmenting training data with the diverse set of evolved variant samples. This also includes the use of a pretrained Natural Language Processing (NLP) model in a transfer learning setting to show improved classification of metamorphic malware, using the evolved variants as part of the training data.
Babaagba, K. O. Application of evolutionary machine learning in metamorphic malware analysis and detection. (Thesis). Edinburgh Napier University. Retrieved from http://researchrepository.napier.ac.uk/Output/2801469