The sample here comes from a quick search supplied by ANY.RUN @anyrun_app of #emotet-doc to filter quickly on documents you want to look at. Twitter reference Here and the link to the file we are going to use Here.
One of the first things I always do is to always look at the file in a hex editor to verify what type of file I am dealing with. Never trust a file extension for the type of file.
As we can see here this is a “Zip” style so we can simply decompress it to get the contents out to look at them.
One other thing we notice too is when we just scroll down in the hex editor is the script we are after can be seen clearly.
If we just search for “var” in the hex editor it will take us right to the beginning of the script and we can just copy it out without even having to run or unzip the file.
On a side note, the “var” is a java script keyword and would not normally be found in a vba file.
Looking at Any.Run we see this.
If the run looks like this with the “jse” then it may be this type. Unless they change it after seeing this blog post. (It wouldn’t be the first time)
If we left click on the first process Winword.exe then click on the More Info we see this.
Here we see the Jse File. Lets click on that.
One note about the “JSE” file here. This is not a JScript Encoded file as the scripting environment would encode it and is usually associated with this file extension.
A JSE file looks like this.
This is just plain java script that is obfuscated by a tool.
Here is our script that we will be working with. So Lets download that.
Now lets look at 1 more way to extract the file.
After we Unzip the file we see
Select the word folder
Now lets look at the vbaProject.bin file in a hex editor.
Her it looks like a “OLE” header so this is compressed in that format
If we try our search for “var” again we see.
Now we can stop here and just copy the entire text section to your text editor and then clean this up to leave just the script.
Lets go 1 step further incase this gets more obfuscated later.
We can use 7Zip to extract the contents of the vbaProject.bin .
Lets look in UserForm1 folder.
Looking at the size of the “o” object this looks promising. If it is not very large then there will not be much in the files but we need to check them anyway to verify there is not anything useful in it.
This is a binary file so we need to look at it in a hex editor.
This is the lowest level we can go to get the script out. We could also use Office and extract it from the textbox or properties box in the VBA tools.
There are also the Decalage @decalage2 python tools Here and a few others.
Here is what the script looks like Normally.
That is to difficult to see what it is doing so lets do some java script formatting on a “Copy” to get a better view.
This is a very distinct format that has ben used for some time now, and if you understand the basics of these then they can be very easy to decode if you want to take the time to build tools to help with the boring parts.
There are basically 3 things we are looking for when we see this format. We look at the array at the top. It is usually either base64 , \x encoding , and I’ve even see a modified base64 encoding also in past samples that were not Emotet.
The next thing we are looking for is this function just below the array.
As you can see it has a push shift function and a value of 0xd3. What this will do is rotate this array 1 place that many times in a circular fashion. Not doing the math to be sure , but if that number happened to be the same count as the array it should just go back where it started from, just an example.
The last piece we need is the part with the index numbers.
This “b(‘0x0’, ‘ILb*’) is the index and a key to decode the base64 string in the array.
If it only has the index number then it either does not need another level of decoding or the same key is used for everything and you will have to verify /locate it.
Here is the function where the index value and the key are passed to.
Here we see the base64 decode with the atob() function and then the RC4 decode below that. This version uses the “Mod 256” you might run across some code that uses “AND” 256. So just to be aware. (Also 255 is used in some scripts)
So now we have a pretty good understanding how the decoding works lets decode this.
Using this tool I had written for the Neutrino EK we extract the whole command and just the index and key and save both to separate files.
This is where that value of 0xd3 comes in at.
This tool will take the base64 string array split them and rotate them to the proper place by that value. It will then use the list of Indexes/Keys from the last step to do the decoding of the array.
If all we want to do is extract the list of urls we could just stop here.
But lets see about doing the replacements in the rest of the script.
Here we are lined up with the decoded and encoded.
Although we have done the replacements of the encoded file there are actually more layers.
Notice the “\x” encoded characters lets fix that.
Does that string look familiar now. The error box in Anyrun ? (This screenshot was borrowed from the text report section and saved as png)
There is still 1 more trick used here that I have previously seen in in the pages that led to Angler EK.
We have 1 more layer of separated values. As we can see here we have a key value pair where the “eb” is an array of variables, the key is “imKcX” and the value is
“Not Supported File Format”.
We can verify that by looking at the screenshot above.
There are several places it will do this.
Anyone that has tried to step thru one of these in a debugger knows how much of a pain and long winded these can be before they finally spit out the decoded page.
And that is if there are no debugger checks to throw you a curve.
One more note. I have seen some samples in the past that used this style that have been run thru this style of encoder twice. So you may need to look close and repeat the process to get it decoded as fully as possible.
That is as far as I’m going on this. My challenge to you is to build your tools to be able to quickly decode these also.
Mine are to fragile to release for this post.
In conclusion, once you understand the layout of how these decode you can apply that knowledge to the various “Types” you may run across.
Learn it and help make this type of encoding obsolete.
If you have any questions place contact me on Twitter at @Ledtech3.