You are hereBlogs


Introducing the Simseer, Bugwise, and Clonewise Web Services

FooCodeChu is pleased to announce 3 new online web services that are free for the public to use. These web services can aid in malware classification, incident response, plagiarism detection, software theft detection, software quality assurance, and vulnerability research.

Simseer is a free online service that tells you how similar to each other are the software that you give it. It is built using the technology of Malwise. There are a number of applications where it is useful to know if software is similar such as malware classification, incident response, plagiarism detection, and software theft detection.

This service performs bug detection in Linux executable binaries. It does this by using static program analysis. More specifically, it is performed using decompilation and data flow analysis. Currently, the service checks for the presence of some double frees in sequential code that use the libc allocator functions.

Clonewise is an open source project to identify clones of packages embedded in other software source. Identifying package clones enables us to automatically infer outstanding vulnerabilities from out of date clones.

Survey in Static Detection of Malware

Survey in Static Detection of Malware is based on the literature review in my 2010 Masters thesis.

Abstract - Malware continues to be a significant problem facing computer use in today’s world. Historically Antivirus software has employed the use of static signatures to detect instances of known malware. Signature based detection has fallen out of favour to many, and detection techniques based on identifying malicious program behavior are now part of the Antivirus toolkit. However, static approaches to malware detection have been heavily researched and can employ modern fingerprints that significantly improve on the simple string signatures used in the past. Instance- based learning can allow the detection of an entire family of malware variants based on a single signature of static features. Statistical machine learning can turn the features extracted into a predictive Antivirus system able to detect novel and previouslyunseen malware samples. This paper surveys the approaches and techniques used in static malware detection.

Automated Static Unpacking Using Speculative Decompression

Automated Static Unpacking Using Speculative Decompression is some work I did towards the end of 2009 during my Masters degree. It is a small contribution and not strong enough for a full length conference paper. It does however present an interesting approach to automated unpacking.

Abstract - Malware is a significant problem on the internet. Automated and manual analysis of malware is important in detection and remediation. However, malware authors understand this processand try to hinder static analysis by introducing a malware transformation that hides their code and intent. This process is known as malware packing and must be reversed before an analystor automated system can understand the intent of the malicious software. Automated unpacking attempts to solve this problem ona large scale and has been partly successful, but there is still muchto be done. In this work we propose a system for automatically and statically unpacking some forms of packed code. We identifythe compression algorithm used to pack the malware and then decompress the high entropy, compressed binary blob within thesample. This is effective for a small minority of malware samplesin the wild.

Survey of Unpacking Malware

"Survey of Unpacking Malware" is from my 2010 Masters thesis "Fast Automated Unpacking and Classification of Malware".

Abstract - Malware is a significant problem in distributed and host-based computing environments. Static detection and classification of malware serves as a useful mode of defense. To hinder detection and classification, code packing is used by malware authors to hide and obscure the malware’s real content. This paper surveys the removal of those malware obfuscations.

Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs

I have published a new paper on Malware variant detection, "Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs".

This work is built ontop of Malwise, and is the basis for my Ruxcon talk, "Faster, More Effective Flowgraph-based Malware Classification".

Abstract - Static detection of polymorphic malware variantsplays an important role to improve system security. Control flow has shown to be an effective characteristic that represents polymorphic malware instances. In our research, we propose a similarity search of malware using novel distance metrics of malware signatures. We describe a malware signature by the set of control flow graphs the malware contains. We propose two approaches and use the first to perform pre-filtering.Firstly, we use a distance metric based on the distance between feature vectors. The feature vector is a decomposition of the set of graphs into either fixed size k-subgraphs, or q-gram strings of the high-level source after decompilation. We also propose a more effective but less computationally efficient distance metric based on the minimum matching distance. The minimum matching distance uses the string edit distances between programs’ decompiled flow graphs, and the linear sum assignment problem to construct a minimum sum weight matching between two sets of graphs. We implement the distance metrics in a complete malware variant detection system. The evaluation shows that our approach is highly effective in terms of a limited false positive rate and our system detects more malware variants when compared to the detection rates of other algorithms.

[ slides and paper ]

Ruxcon 2011

I will be giving two presentations at Ruxcon this month. Ruxcon is an annual computer security conference here in Australia and is held over the weekend of the 19th and 20th of November in my home town of Melbourne. The Friday before the official Ruxcon conference will be a half day of talks for professional delegates. I will present in both the main conference and the professional delegates day.

The work I'm presenting covers some of my Ph.D research on Clonewise and Malwise. This is the first talk I've given which looks at Clonewise. Clonewise is an opensource project and its results have been used by vendors for documentation and vulnerability fixes. The talk will be given on the Saturday at Ruxcon. The Malwise talk for this year improves the work from last Ruxcon with faster more effective classification. The Malwise talk is only for professional delegates to the conference and given on the Friday, but once over the content will be made available to the general public.

Automated Detection of Software Bugs and Vulnerabilities in Linux

Abstract: Developers sometimes statically link libraries from 3rd party projects, maintain an internal copy of 3rd party software or fork development of an existing 3rd party project. This practice can lead to software vulnerabilities when the embedded code is not kept up to date with upstream sources. As a result, manual techniques have been applied by Linux vendors to track embedded code and identify vulnerabilities. In this talk, Silvio will release an automated solution to identify embedded packages without any prior knowledge of such relationships. This approach identifies similar source files based on file names and content to identify relationships between source packages. Graph theory is used to perform the analysis. Silvio's tool also automates identifying if embedded packages have outstanding vulnerabilities that have not been patched. Using this system, over 30 previously unknown vulnerabilities were identified in Linux distributions. These results are now starting to be used by vendors to track embedded packages.

This work is based on Clonewise.

Faster, More Effective Flowgraph-based Malware Classification

Abstract: Static string signatures in Antivirus don't effectively fingerprint unknown malware variants. One approach which has seen some success is using the structural information of a program's control flow to build a signature. The control flow describes the possible flow of execution a program may take. It is represented by what's known as a directed graph - basically a network diagram of how execution moves from one set of instructions to another. Control flow doesn't change much in variants even if the byte level content changes like in polymorphic and metamorphic malware. A real advantage of using graphs is that we can compare these graphs to show if they are approximately similar. We can quantify how similar two programs are and set a threshold to identify related or mutated malware. I have implemented a system using these ideas to perform malware detection in real-time. The system improves previous work by performing more efficiently and detecting more variants. It replaces the classification system that I presented at Ruxcon 2010 and uses several new ideas that make it better. This presentation discusses how the system works, its implementation, and its evaluation.

This work is based on Malwise.

Looking forward in seeing everyone at the conference. I'm always happy to talk to individuals and vendors about this and other work I've done.