You are hereWire: An Intermediate Language for Executable Objects

Wire: An Intermediate Language for Executable Objects


Wire is the intermediate language used in Malwise, Binalyze and Bugwise.

Static program analysis is a useful tool that provides manybenefits and applications. In summary, static analysis identifies the runtime behaviour of software. It does this analysis statically, meaning that the program is not executed. Applications of static analysis include detecting plagiarism of software code, optimising code during compilation, verifying software by proving the absence of certain bug classes, or in a weakened form, to identify software bugs. Static analysis is generally performed at the source level, but applications exist when we only have access to low level object code. The applications of low level static analysis include the analysis and detection of malware, detecting the theft of proprietary or licensed software, or detecting bugs in binaries which are the result of compilation or link-time conditions.

Malware analysis and detection is a large motivation for why low level static analysis is required. Traditional static malware detection employed in commercial Antivirus has ignored program structure and semantics. Instead, pattern recognition on the raw byte-level content has been the dominant technique in signature based detection. However, program structure such as that exhibited by the static control and data flow of the malware results in more robust and predictive characteristics. These characteristics or fingerprints are often invariant in large malware families and strains. Thus, by employing static analysis techniques, signature based detection is much more resistant in the detection of variants such as polymorphic and metamorphic malware. Moreover, the use of program structure and semantics to extract robust features allows machine learning to detect novel samples of malware that we can predict as being malicious, but not belonging to known families of malicious software. Malware is almost always in binary form so a low level static analysis system that examines the binary form of executables content is required.

Software theft detection is another motivation for why low level static analysis is needed. Detecting unauthorized use of software code is desirable to protect industry investment. Similar to the malware variant detection problem, software theft detection extracts program structure and semantics and identifies unauthorized software copies by finding those same features in illegitimate software. It is necessary then to be able to examine closed source software by using low level static analysis.

More motivation is that of detecting the presence of software bugs in binaries. The purpose of this form of bug detection is not to replace traditional source level analysis, but complement it by providing an increased level of assurance. Source level analysis by definition is the unfinished form of a software that is lacking detail of how the program will be physically executed after assembly and linking. Bug detection in binaries by nature has access to the final form of the program where assembling and link time editing has been performed. This also provides additional assurance that the compiler has done what it was designed to do. This type of assessment is not only useful for development and quality assurance; it is also beneficial to system auditors who by requirements do not have access to software source.

Analysing binaries is hard. Many simple problems such as separating code from data are undecidable. Our first motivation stems from the desire of representing a binary in a manner that makes analysis easier. The native assembly in a binary is unfavourable for analysis. The reasons that native assembly is difficult to use are:

  • Native CISC assemblies such as x86 have hundreds of instructions which requires significant and duplicate efforts to model for each class of static analysis.
  • Native assemblies have instructions with side effects which make analyses require hidden information and assumptions.
  • Native assemblies are platform dependent which requires separate static analysis implementations for each architecture.

This motivates us to use an intermediate language to represent native assembly. The intermediate language should be low level enough so that translation from assembly is not complex. It should also be high level enough so that traditional static analysis techniques can be applied.

Publications

Theses

  1. Silvio Cesare, "Fast Automated Unpacking and Classification of Malware", Masters Thesis, Central Queensland University, 2010. [slides and thesis].

Refereed Conference Papers

2011

  1. Silvio Cesare, Yang Xiang, "Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs", IEEE Trustcom, IEEE, 2011. [slides and paper]

2010

  1. Silvio Cesare, Yang Xiang, "Classification of Malware using Structured Control Flow", 8th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2010), 2010. [slides and paper]
  2. Silvio Cesare, Yang Xiang, "A Fast Flowgraph Based Classification System for Packed and Polymorphic Malware on the Endhost", IEEE 24th International Conference on Advanced Information Networking and Application (AINA 2010), IEEE, 2010. [slides and paper]