Library and Function Identification by Optimized Pattern Matching on Compressed Databases: A close to perfect identification of known code snippets
The goal of library and function identification is to find the original library and function to a given machine-code snippet. These snippets commonly arise from penetration tests attacking a remote executable, static malware analysis or from an IP infringement investigation. While there are several tools designed to achieve this task, all of these seem to rely on varied methods of signature-based identification. In this paper, we argue that this approach is not sufficient for many cases and propose a design and implementation for a multitool called KISS. KISS uses lossless compression and highly optimized pattern matching algorithms to create a very compact but substantial database of library versions. In practice, KISS shows to achieve remarkable compression rates below 30 percent of the original database size while still allowing for extremely fast (sublinear) snippet identification. We use statistical tests to show that its code snippet recognition is very successful while having a number of false positives close to the lowest theoretical bound. Finally, we also argue how our approach improves the security of existing techniques as our design relies fully on complete function body verification, which prevents analysis-resilient malware from disguising as external and trusted library code. This has recently been shown to be a problem for malware analysis with existing identification solutions.
Library and Function Identification by Optimized Pattern Matching on Compressed Databases: A close to perfect identification of known code snippets
Proceedings of the 2nd Reversing and Offensive-oriented Trends Symposium
| Authors: | Maximilian Tschirschnitz |
| Year/month: | 2018/ |
| Booktitle: | Proceedings of the 2nd Reversing and Offensive-oriented Trends Symposium |
| Fulltext: | click here |
Abstract |
|
| The goal of library and function identification is to find the original library and function to a given machine-code snippet. These snippets commonly arise from penetration tests attacking a remote executable, static malware analysis or from an IP infringement investigation. While there are several tools designed to achieve this task, all of these seem to rely on varied methods of signature-based identification. In this paper, we argue that this approach is not sufficient for many cases and propose a design and implementation for a multitool called KISS. KISS uses lossless compression and highly optimized pattern matching algorithms to create a very compact but substantial database of library versions. In practice, KISS shows to achieve remarkable compression rates below 30 percent of the original database size while still allowing for extremely fast (sublinear) snippet identification. We use statistical tests to show that its code snippet recognition is very successful while having a number of false positives close to the lowest theoretical bound. Finally, we also argue how our approach improves the security of existing techniques as our design relies fully on complete function body verification, which prevents analysis-resilient malware from disguising as external and trusted library code. This has recently been shown to be a problem for malware analysis with existing identification solutions. | |
Bibtex:
@inproceedings {author = { Maximilian Tschirschnitz},
title = { Library and Function Identification by Optimized Pattern Matching on Compressed Databases: A close to perfect identification of known code snippets },
year = { 2018 },
booktitle = { Proceedings of the 2nd Reversing and Offensive-oriented Trends Symposium },
url = { https://dl.acm.org/doi/abs/10.1145/3289595.3289598 },
}
