Recently in WP4-Data enrichment and characterization Category

D21/D4.7 Consolidated report with evaluation results

This is the final deliverable for Workpackage 4 within the wombat project. In this document we discuss the final extensions and improvements to our data collection and analysis techniques that were implemented as part of wombat. Furthermore, we present some additional results obtained from the analysis of data collected within wombat.

This deliverable is a final report on the experimental results obtained by using structural
features to characterize executable code. It discusses and evaluates a number of tech-
niques, based on these features, that have been developed in the context of the wombat
project, and aim to provide a deeper understanding of malicious code and of the relations
between malicious code samples.

Wombat Deliverable D16/D4.2 Analysis Report of Behavioral Features

This deliverable provides a discussion of the features used to characterize the behavior
of code, and a discussion of preliminary results of applying these features to a set of
malicious code. It discusses the project's results in behavior-based clustering, malware
detection at end hosts in different ways, system call analysis, but also our work on
shellcode behavior.


Wombat Deliverable D15/D4.5 Intermediate Report on Contextual Features

The objective of this Workpackage 4 is to develop techniques to characterize the malicious code that is collected in the previous workpackage. The main idea is to enrich the collected code thanks to metadata that might reveal insights into the origin of the code and the intentions of those that created, released or used it. This deliverable provides a preliminary discussion of possible contextual features of malware, and for each feature, an estimate on its effectiveness and the difficulty to obtain it. Some of these features can be used to analyze potential threats and discriminate collected samples that are mere variations of already known threats.


This deliverable provides a preliminary discussion of structural features that can be used to characterize executable code. Furthermore, it discusses a number of techniques, based on these features, that are being developed in the context of the wombat project, and aim to provide a deeper understanding of malicious code and of the relations between malicious code samples.


WOMBAT Deliverable D08/D4.1 Specification language for code behavior

This document provides a specification language to describe the behavior of code. Consistently with the requirements for an extensible, layered architecture for the behavioral analysis of malware, four different languages are defined, ranging from a complete, low-level description of the code's behavior to a high-level analysis report that is suitable for a human analyst. Furthermore, current approaches to behavioral malware analysis and detection within the wombat project are discussed, most of which already take advantage (or can be extended to take advantage) of the provided specification language.


WOMBAT paper accepted at NDSS2009

The following paper has been accepted at the Network and Distributed Systems Security (NDSS) 2009 conference:

Title: Scalable, Behavior-Based Malware Clustering
  • Ulrich Bayer, TUV
  • Paolo Milani Comparetti, TUV
  • Clemens Hlauschek, TUV
  • Christopher Kruegel, UCSB
  • Engin Kirda, Eurecom

Anti-malware companies receive thousands of malware samples every day. To process this large quantity, a number of automated analysis tools were developed. These tools execute a malicious program in a controlled environment and produce reports that summarize the program's actions. Of course, the problem of analyzing the reports still remains. Recently, researchers have started to explore automated clustering techniques that help to identify samples that exhibit similar behavior. This allows an analyst to discard reports of samples that have been seen before, while focusing on novel, interesting threats. Unfortunately, previous techniques do not scale well and frequently fail to generalize the observed activity well enough to recognize related malware.

In this paper, we propose a scalable clustering approach to identify and group malware samples that exhibit similar behavior. For this, we first perform dynamic analysis to obtain the execution traces of malware programs. These execution traces are then generalized into behavioral profiles, which characterize the activity of a program in more abstract terms. The profiles serve as input to an efficient clustering algorithm that allows us to handle sample sets that are an order of magnitude larger than previous approaches. We have applied our system to real-world malware collections. The results demonstrate that our technique is able to recognize and group malware programs that behave similarly, achieving a better precision than previous approaches. To underline the scalability of the system, we clustered a set of more than 75 thousand samples in less than three hours.

About this Archive

This page is a archive of recent entries in the WP4-Data enrichment and characterization category.

WP3-Data collection and distribution is the previous category.

WP5-Threat Intelligence. is the next category.

Find recent content on the main index or look in the archives to find all content.