基于学习的相似性技术在恶意软件检测中的统一评估 / A Unified Evaluation of Learning-Based Similarity Techniques for Malware Detection
1️⃣ 一句话总结
这篇论文首次在统一的实验框架下,系统性地比较了多种基于机器学习的恶意软件相似性检测技术,发现没有一种方法能在所有方面都表现最佳,因此有效的安全分析平台需要结合多种互补的技术。
Cryptographic digests (e.g., MD5, SHA-256) are designed to provide exact identity. Any single-bit change in the input produces a completely different hash, which is ideal for integrity verification but limits their usefulness in many real-world tasks like threat hunting, malware analysis and digital forensics, where adversaries routinely introduce minor transformations. Similarity-based techniques address this limitation by enabling approximate matching, allowing related byte sequences to produce measurably similar fingerprints. Modern enterprises manage tens of thousands of endpoints with billions of files, making the effectiveness and scalability of the proposed techniques more important than ever in security applications. Security researchers have proposed a range of approaches, including similarity digests and locality-sensitive hashes (e.g., ssdeep, sdhash, TLSH), as well as more recent machine-learning-based methods that generate embeddings from file features. However, these techniques have largely been evaluated in isolation, using disparate datasets and evaluation criteria. This paper presents a systematic comparison of learning-based classification and similarity methods using large, publicly available datasets. We evaluate each method under a unified experimental framework with industry-accepted metrics. To our knowledge, this is the first reproducible study to benchmark these diverse learning-based similarity techniques side by side for real-world security workloads. Our results show that no single approach performs well across all dimensions; instead, each exhibits distinct trade-offs, indicating that effective malware analysis and threat-hunting platforms must combine complementary classification and similarity techniques rather than rely on a single method.
基于学习的相似性技术在恶意软件检测中的统一评估 / A Unified Evaluation of Learning-Based Similarity Techniques for Malware Detection
这篇论文首次在统一的实验框架下,系统性地比较了多种基于机器学习的恶意软件相似性检测技术,发现没有一种方法能在所有方面都表现最佳,因此有效的安全分析平台需要结合多种互补的技术。
源自 arXiv: 2602.15376