Summary of Tesseract: Eliminating Experimental Bias in Malware Classification Across Space and Time (extended Version), by Zeliang Kan et al.
TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time (Extended Version)
by Zeliang Kan, Shae McFadden, Daniel Arp, Feargus Pendlebury, Roberto Jordaney, Johannes Kinder, Fabio Pierazzi, Lorenzo Cavallaro
First submitted to arxiv on: 2 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Cryptography and Security (cs.CR); Performance (cs.PF)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Machine learning models have achieved high F1-scores in detecting malicious software, but the issue is not yet fully solved due to the constantly evolving nature of operating systems and attack methods. This paper argues that reported results are often inflated due to experimental biases, including spatial bias caused by non-representative data distributions and temporal bias resulting from incorrect time splits. To address these biases, the authors propose constraints for fair experiment design, a new metric for classifier robustness (AUT), and an algorithm to tune training data. They also present TESSERACT, an open-source framework for realistic classifier comparison, and evaluate both traditional ML and deep learning methods on an extensive Android dataset and case studies in Windows PE and PDF domains. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Malware detectors are getting better at finding bad software, but they’re not perfect yet. One reason is that the bad guys keep changing their tricks, so what worked last year might not work this year. This paper looks at why reported results for malware detection might be too good to be true and how we can make them more realistic. They suggest some rules to follow when designing experiments, a new way to measure how well a detector does (AUT), and an algorithm to improve detector performance over time. They also share an open-source tool called TESSERACT that makes it easier to compare different detectors fairly. |
Keywords
* Artificial intelligence * Deep learning * Machine learning