Classification of Obfuscation Techniques in LLVM IR: Machine Learning on Vector Representations
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsI have some major points the authors should address in their revision:
1. Enforce program-level splits and report leave-program-out results. Also test robustness across compiler levels (train on O0–O2, test on O3, and vice versa).
2. Demonstrate generalizability. Add an open-set/OOD evaluation (held-out transformation families or a different obfuscator) and support an "unknown" rejection option with calibrated thresholds.
3. Strengthen baselines and ablations, by comparing IR2Vec against simple IR features (opcode/CFG stats) and ablate vector dimensionality.
4. Improve class design and first-stage detection. Report a two-stage pipeline (Obfuscated? Y/N -> Which class) and address the low recall for non-obfuscated samples.
5. Quantify sensitivity to pass ordering and explain recurrent confusions using feature attributions or class-centroid similarity.
6. Release a reproducible package. Share code and data.
7. Clarify novelty vs prior work.
8. Add scalability evidence on a realistic corpus.
The manuscript is generally understandable, but it needs a careful copy-edit to fix scattered typos, article/preposition usage, inconsistent capitalization of technical terms, and minor formatting issues in references/URLs.
Author Response
file attached
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper “Classification of Obfuscation Techniques in LLVM IR: Machine Learning on Vector Representations” proposes a machine learning framework for identifying code obfuscations applied at the LLVM intermediate representation level. The method introduces two main components: the use of IR2Vec embeddings to capture structural and semantic properties of obfuscated IR code, and the application of ensemble classifiers (CatBoost and ExtraTrees) to recognize both single and layered obfuscations. Furthermore, a comprehensive dataset is constructed using Tigress obfuscations on handcrafted programs and GNU Coreutils. Experiments on this dataset demonstrate that the approach achieves over 90% classification accuracy, effectively distinguishing diverse obfuscation types and combinations.
- In the dataset construction, since the data is primarily derived from Tigress, could this single-source origin introduce bias, leading the classifiers to learn Tigress-specific artifacts rather than more generalizable patterns of obfuscation?
- The description of IR2Vec is overly brief and lacks sufficient detail. It would be beneficial to elaborate on the training principle of IR2Vec and explain how it captures instruction-level and semantic relationships, as well as to include a schematic illustration showing the transformation process from code snippets to IR2Vec embeddings.
- A more detailed description of the hardware and software environment is needed in the experimental setup, including specifications of the CPU/GPU configurations and the training parameters used for IR2Vec.
- Only CatBoost and ExtraTrees are compared, while baseline methods and ablation studies are lacking.
- The experimental results are primarily reported in terms of accuracy, without reflecting statistical confidence. Please consider including measures such as statistical significance tests or error bars to demonstrate the stability of the results.
Author Response
file attached
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have partially answered to the questions. However, they have improved the article enough to be published.
Reviewer 2 Report
Comments and Suggestions for AuthorsI have no further comments.
