Peeling away the layers of a word document macro

The sample used in this one was first brought to my attention from the blog post by @HerbieZimmerman  and the blog post is here. https://www.herbiez.com/?p=1028 and the link to the doc file is here https://www.hybrid-analysis.com/sample/0de3f4380b642e59d0cde5570ed13bfc727000b94a034ce10e1f87bfac3fac79?environmentId=100

This one peaked my interest because it had 3 obfuscated scripts plus a “ThisDocument”  script instead of just 2 that I have normally been seeing .

There are several ways to extract the script(s) but I have been trying Open Office and it has worked fairly well for this type.

Here is what we see when we open the script that contains the “AutoOpen” sub.

autoopen-b

When opening up the script in Open Office they are default disabled. The “Rem” is commenting out those lines so the code can not run. But it can also make it harder to read so you can use Notpad++ or your favorite editor and do a  find replace of “REM “ (include the space) with “” (Nothing).

We see the “Sub AutoOpen()” which will cause this script to get fired when the document opens.

As the notes show the “Shell” will call the function listed with the parameter of “0” Zero or hidden. You can read about it here . https://msdn.microsoft.com/en-us/vba/language-reference-vba/articles/shell-function

If you look just under the green box you can see where the function starts.

If we keep scrolling down we see where the value of the function gets set.

FuncArray

We see here that there are several strings that get tacked onto each other including the third item of “Chr(34)” AKA the double quote.

These strings are the different function names that gets called to build the final string that gets added in at that position.

Lets get the fourth item and take a closer look.

fourthItem-b

Here we see the fourth item corresponds to a function above and calls a “Mid” function on a string above that.

So this takes the string in “APwTzEijQ” and extracts a substring from it starting at character 19 and takes 71 characters and then returns the string.

But as you can see there are way to many of these to do by hand so either extract and echo the result of building the string(s) or build a new tool.

Tool-1

So using this if we cleaned out almost everything we don’t need we end up with this.

FourthitemOnly

The output is the value that will get tacked on in this position.

After some trial and error I added a new section to my tool to tell me which functions were not present in the current script.

If we notice from the screenshot above and here
”XZGDjkTXd = iXjTfDtwt + MfwkTAlmO + Chr(34) +”
those are the first 2 values that get tacked on to this new script.

Those will be found in the other 2 scripts.

The first value

Script1st

FirstDecoded

Second

Script2nd

SecondDecoded

Now we have to piece this back together in the correct order.

FullSecondLevel

This is the full reassembled script. This is still hard to read so lets format it in a way we can read it better.

One thing you may notice is all of the  ‘+’  and + in this part of the script. We will have to remove those before we start the string replacements.

When the script is run they will automatically be taken care of but we have to remove them for the method we are using here.

FullFormated

Lets take a closer look at the top of this.

Top

If you look closely at the set variables the word “powershell” is broken up into different variables and reassembled in the last “set’.
The ComSpec at the end will decode to “IEX”

ComSpec We want Chars 4,26,25 which builds “IEX”

On the bottom section this is where we are peeling away more layers in the form of string replacements.

The trick is to do it in the correct order.

Bottom

The Highlighted area is the inner most section and the final section left.

If you can zoom on this and the shot above you will notice the way that the “ ( ) “ are highlighted in red. This tells us where the section will begin and end.

A normal nested function will start evaluating from the most inner section working out.

Here we start at the most outward section and do the replacements working towards the middle.

As we find in my last article the “-cReplace” means it is a case sensitive replacement meaning you only replace the exact string, which could be in the middle of another string as long as it is the exact string. the plain “-Replace” is not case sensitive so will find the string if it is upper, lower or even random mixed case.

Here you can also note that the “[Char]” is in the proper case as I have already done a string replace for the mixed case versions so let use a new tool to find out what all of the char codes evaluate to.

CharRepl

So here we have 3 items to replace.
“-cREplACe  (tiQ),$-REPLacE’3Gj’,’-REPLacE(PBw),|)“   If you have trouble seeing, the 3GJ actually gets replaced by the single quote.

After doing the replacements we move in another level.

Level-2

The highlighted area is the new string section we will be doing the string replacements with.

Just below are the strings that we will be working with.

Here is an issue we can run into while de-obfuscating by hand.

issue-b

In the earlier replacement we replaced some characters with the single quote now it stops us from doing a proper replacement using notepad++ so we have to remove the dual single quotes to do the final string replacement. We will have to keep an eye on this problem with each layer or it will not get some of the characters replaced properly.

ThirdLevel

We can see we had to remove/ replace the dual single quotes here also.

And finally we have.

FinalLayer

And a more cleaned up formatted version.

Final-Formatted 

This defiantly would not run as is but you can see what it is doing now.

The part from $franc to the catch statement is the final decoded section.The rest in the screenshot is left for reference and also artifacts left after decoding the other levels.

We finally see that we have 5 different URL’s that it will attempt to download a file from and put it in the “Public” folder with a random Numeric name from 1 to 3453245 plus a “.exe” and attempt to run it.

If it has an error it will write the error to the console window even though it was launched as hidden.

One last important note. Always use a “working” copy so if you mess something up you don’t have to go back and re-extract the the scripts. You could just make a new copy and start over.

So why would we want to decode the script in this manner rather than just running it in a sandbox? It could have possibly had some sandbox checking routine and just exit if it finds what it does not want to run with.

If the routine did not loop thru all of the sites in an attempt to download the file then you may miss some of the possible download sites.

To get a better understanding of “How” the obfuscation works, it’s strengths and it weaknesses.

Finally, another excuse to build more tools and hone my programing skills.

Well that’s it for this one I hope we all learned a few things.

Advertisements

About pcsxcetrasupport3

My part time Business, I mainly do system building and system repair. Over the last several years I have been building system utility's in vb script , HTA applications and VB.Net to be able to better find the information I need to better understand the systems problems in order to get the systems repaired and back to my customers quicker.
This entry was posted in Malware, PowerShell, security, VBScript and tagged , , . Bookmark the permalink.