You are hereMalwise - Malware Classification and Variant Detection

Malwise - Malware Classification and Variant Detection


Malicious software presents a significant challenge to modern desktop computing. According to the Symantec Internet Threat Report, 499,811 new malware samples were received in the second half of 2007. F-Secure additionally reported, “As much malware [was] produced in 2007 as in the previous 20 years altogether“. Detection of malware before it adversely affects computer systems is highly desirable. Static detection of malware is still the dominant technique to secure computer networks and systems against untrusted executable content.

Detecting malware variants improves signature based detection methods. The size of signature databases is growing exponentially, and detecting entire families of related malicious software can prevent the blowout in the number of stored malware signatures.

Malwise is a system for detecting malware based on a using a different kind of signature. A more powerful and robust signature. In Malwise, the structure of a program is used instead of traditional string signatures that are used in Antivirus. Program structure doesn't change much when malware evolves or mutates. Program structure can effectively fingerprint an entire family of malware and detect new family members even if they haven't been seen before by the system.

Technical background to malware classification for researchers.
A survey in static detection of malware.

Flowgraph-based signatures

Malwise's signature is based on the control flow in a program. The control flow describes the paths of possible execution through the program code. It is represented as a directed graph which looks a bit like a network diagram. These graphs are known as flowgraphs. There are two types of control flow possible, control flow inside a procedure which is represented by control flow graphs, and control flow between procedures which is represented by a call graph.

The two types of control flow are shown in the figure above right.

Malware unpacking

Malware is generally encrypted, compressed or obfuscated to hide the real content. This encryption layer should be unpacked. The following clip demonstrates an analyst manually initiating Malwise's automated unpacking system. Before unpacking, only a couple of procedures are visible. After unpacking a large number of procedures are visible and the relationships between those functions can be seen in the call graph.

A comprehensive academic survey of unpacking can be found in this article. For more information on the emulator Malwise uses to perform unpacking read this article. We have also experimented with unpacking using other approaches.

Software similarity

Two programs can be compared by the control flow graphs they contain. The signature represents the birthmark of a program which stays the same in evolved or mutated versions like in polymorphic and metamorphic malware. Similarity not only tells us if two software are the same, but gives a measure on how similar they are. This is useful, because it can enable us to know which variants of malware are more closely related, even if they all belong to the same family. Software similarity is the basis for Malwise's malware variant detection.

Software similarity search

Malwise takes a sample of unknown status and performs a software similarity search from its database of malware. If the unknown sample is similar to a malware in our database, we know the sample is also malicious. The basis for the similarity search is using a measure related to similarity - distance. Distance is a measure of dissimilarity and if two software are within a specific distance or radius, they are variants of each other.




Publications

Theses

  1. Silvio Cesare, "Fast Automated Unpacking and Classification of Malware", Masters Thesis, Central Queensland University, 2010. [slides and thesis].

Journal Papers

2012

  1. Silvio Cesare, Yang Xiang, Wanlie Zhou, "Malwise - An Effective and Efficient Classification System for Packed and Polymorphic Malware", IEEE Transactions on Computers (TC), 2012. (to appear)

Refereed Conference Papers

2011

  1. Silvio Cesare, Yang Xiang, "Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs", IEEE Trustcom, IEEE, 2011. [slides and paper]

2010

  1. Silvio Cesare, Yang Xiang, "Classification of Malware using Structured Control Flow", 8th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2010), 2010. [slides and paper]
  2. Silvio Cesare, Yang Xiang, "A Fast Flowgraph Based Classification System for Packed and Polymorphic Malware on the Endhost", IEEE 24th International Conference on Advanced Information Networking and Application (AINA 2010), IEEE, 2010. [slides and paper]

Industry Conferences

  1. Silvio Cesare, Ruxcon, "Faster, More Effective Flowgraph-based Malware Classification", 2011. [slides]
  2. Silvio Cesare, Ruxcon, "Fast Automated Unpacking and Classification of Malware", 2010. [slides and thesis]
  3. Silvio Cesare, Ruxcon, "Security Applications for Emulation", 2008. [slides]

Media

  1. Risky Business #177 -- Silvio Cesare discusses his AV PhD, 2010.