Category
Applied
Description
Accurately detecting malicious programs is an expanding field of research for machine learning (ML), with a novel approach incorporating a bytecode-to-image pipeline that produces images representative of software. These images are provided to convolutional neural networks (CNNs) to be examined for malicious pattern indicators. However, CNNs struggle to generalize these patterns effectively while still being robust against adversarial data, an issue which this research addresses with adversarial training. In this paper, three unique CNN architectures (a DBFS-MC-inspired baseline, MIRACLE, and PSP-CNN) are trained for binary classification with 15,000 benign and malicious software samples encoded into images for Android, Windows, and Linux operating systems. After benchmarking their performance on these original samples, these CNNs are tested against 15,000 obfuscated versions of the same software to quantify their robustness. Next, these architectures are retrained with these obfuscated samples and retested with both the original test data and obfuscated test data. The DBFS-MC-inspired CNN performs best of the three, with initial F1 scores of 98.9 for plain malware and 61.1 for obfuscated malware, the plain F1 dropping to 77.4 and the obfuscated F1 rising to 95.0 after the retraining step. MIRACLE and PSP-CNN performed well only on the original data initially and then the obfuscated data after retraining but were incomparable to DBFS-MC in terms of robustness. Although this research highlights the effectiveness of adversarial training for CNN robustness, the dramatic decrease in the ability of all the CNNs to simultaneously detect both ordinary and obfuscated malware strongly indicates that better methods are needed for cross-OS malware detection.
The Texture of a Threat:: Convolutional Neural Networks & Obfuscated Malware Detection
Applied
Accurately detecting malicious programs is an expanding field of research for machine learning (ML), with a novel approach incorporating a bytecode-to-image pipeline that produces images representative of software. These images are provided to convolutional neural networks (CNNs) to be examined for malicious pattern indicators. However, CNNs struggle to generalize these patterns effectively while still being robust against adversarial data, an issue which this research addresses with adversarial training. In this paper, three unique CNN architectures (a DBFS-MC-inspired baseline, MIRACLE, and PSP-CNN) are trained for binary classification with 15,000 benign and malicious software samples encoded into images for Android, Windows, and Linux operating systems. After benchmarking their performance on these original samples, these CNNs are tested against 15,000 obfuscated versions of the same software to quantify their robustness. Next, these architectures are retrained with these obfuscated samples and retested with both the original test data and obfuscated test data. The DBFS-MC-inspired CNN performs best of the three, with initial F1 scores of 98.9 for plain malware and 61.1 for obfuscated malware, the plain F1 dropping to 77.4 and the obfuscated F1 rising to 95.0 after the retraining step. MIRACLE and PSP-CNN performed well only on the original data initially and then the obfuscated data after retraining but were incomparable to DBFS-MC in terms of robustness. Although this research highlights the effectiveness of adversarial training for CNN robustness, the dramatic decrease in the ability of all the CNNs to simultaneously detect both ordinary and obfuscated malware strongly indicates that better methods are needed for cross-OS malware detection.
