Category

Applied

Description

Accurately detecting malicious programs is an expanding field of research for machine learning (ML), with a novel approach incorporating a bytecode-to-image pipeline that produces images representative of software. These images are provided to convolutional neural networks (CNNs) to be examined for malicious pattern indicators. However, CNNs struggle to generalize these patterns effectively while still being robust against adversarial data, an issue which this research addresses with adversarial training. In this paper, three unique CNN architectures (a DBFS-MC-inspired baseline, MIRACLE, and PSP-CNN) are trained for binary classification with 15,000 benign and malicious software samples encoded into images for Android, Windows, and Linux operating systems. After benchmarking their performance on these original samples, these CNNs are tested against 15,000 obfuscated versions of the same software to quantify their robustness. Next, these architectures are retrained with these obfuscated samples and retested with both the original test data and obfuscated test data. The DBFS-MC-inspired CNN performs best of the three, with initial F1 scores of 98.9 for plain malware and 61.1 for obfuscated malware, the plain F1 dropping to 77.4 and the obfuscated F1 rising to 95.0 after the retraining step. MIRACLE and PSP-CNN performed well only on the original data initially and then the obfuscated data after retraining but were incomparable to DBFS-MC in terms of robustness. Although this research highlights the effectiveness of adversarial training for CNN robustness, the dramatic decrease in the ability of all the CNNs to simultaneously detect both ordinary and obfuscated malware strongly indicates that better methods are needed for cross-OS malware detection.

Share

COinS
 
Apr 21st, 11:30 AM Apr 21st, 12:00 PM

The Texture of a Threat:: Convolutional Neural Networks & Obfuscated Malware Detection

Applied

Accurately detecting malicious programs is an expanding field of research for machine learning (ML), with a novel approach incorporating a bytecode-to-image pipeline that produces images representative of software. These images are provided to convolutional neural networks (CNNs) to be examined for malicious pattern indicators. However, CNNs struggle to generalize these patterns effectively while still being robust against adversarial data, an issue which this research addresses with adversarial training. In this paper, three unique CNN architectures (a DBFS-MC-inspired baseline, MIRACLE, and PSP-CNN) are trained for binary classification with 15,000 benign and malicious software samples encoded into images for Android, Windows, and Linux operating systems. After benchmarking their performance on these original samples, these CNNs are tested against 15,000 obfuscated versions of the same software to quantify their robustness. Next, these architectures are retrained with these obfuscated samples and retested with both the original test data and obfuscated test data. The DBFS-MC-inspired CNN performs best of the three, with initial F1 scores of 98.9 for plain malware and 61.1 for obfuscated malware, the plain F1 dropping to 77.4 and the obfuscated F1 rising to 95.0 after the retraining step. MIRACLE and PSP-CNN performed well only on the original data initially and then the obfuscated data after retraining but were incomparable to DBFS-MC in terms of robustness. Although this research highlights the effectiveness of adversarial training for CNN robustness, the dramatic decrease in the ability of all the CNNs to simultaneously detect both ordinary and obfuscated malware strongly indicates that better methods are needed for cross-OS malware detection.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.