Archive for July, 2010

Graph theory for Malware Analysis

July 1, 2010

Thanks to Ero Carrera.

Last few days, I have been practicing applying Graph Theory for malware classification and analysis. The basic idea is to convert the disassembly of the sample into a directed graph and calculate the following four metrices.

  1. Cyclomatic Complexity
  2. Indegree of the procedures
  3. Outdegree of the procedures
  4. Histogram of the significant instructions

I have built a simple utility on top of IDA Python that can query the IDB file and create a easy navigatable HTML file with interactive javascripts to load the procedures.

Disassembly of the malware

Disassembly of the malware

Once the auto-analysis is completed, run the utility.

Once the utility completes its execution it produces a DHTML file which will look like this while starting.

Output DHTML file

Output DHTML file

Then, selecting the core payload procedure and loading it results in displaying the plotted graph for that function.

And processing different variants (probably created by some Trojan Development Kits) yield the same metrics regardless of their different structure and size.

I am further studying this approach to come up with some more solid results.

Cheers 🙂