A look at Stomped VBA code and the P-Code in a Word Document

This sample comes from a Twitter discussion here and a second part of the thread here on April 22 2019.

This discussion was started by “My Online Security @dvk01uk “.

Although it appears to have a vba file in it it didn’t work in a few different sandboxes as mentioned by @dvk01uk.

Lets take a closer look at the sample found here on ANY.RUN @anyrun_app .

If we look at the document in a hex editor we can see that it starts with a “PK” so this is a ZIP File version and we can just decompress it and take a closer look.


After unzipping the document we see this folder layout.


Lets look at the word folder.


We can see here we do have a vbaProject.bin file. Lets look at that.


This is a OLE file so we can decompress this with 7Zip.


Lets take a closer look at Module1


If we scroll down to the bottom of this file we can see that it appears to be Zeroed out.

If we look at the “ThisDocument” we can see the “Attribut” string which tells us it contains compressed VBA Code.


If you don’t have that string in the file then it does not have compressed VBA Code in it.

So how does this work then.

If we go back to the Twitter discussion “Vess @VessOnSecurity” has a python tool called pcodedump to extract the “P-Code” from the document which can be found here .

This tool currently only requires the “Decalage @decalage2” oletools.

The command I ran was this.
“C:\Python27\python.exe” “C:\Users\Joe User\Desktop\pcodedmp\pcodedmp.py” “Opticsense New Order.doc”

In order to dump it to a file just add to the above command.  “ > DumpedPcode.txt” or what ever name you want.

I have both versions of python installed on this vm so I have to use the full path to to it. I also discovered the hard way that you have to put it in double quotes in order for it to work.

Since I didn’t use the pip install for the pcodedump tool I just downloaded it and used the full path to the script I also put double quotes around that path. The final parameter was the file name in double quotes since it has a space in the name.

I just opened a cmd window in the folder where the document was and ran that command.

Here is what we see when we run  the command and dump it to a file.


This is the part we are most interested in at the moment.


If you can zoom in on that you see a bunch of “Line #:” so lets clean those out and format this a bit better to be readable.

Here we find the AutoOpen Function.


The “Ld F_WH” appears to load the function above.


Although it is not real clear looking at this for the first time we can take an educated guess on what the names mean like “st” I would assume it means string, “ld” would be load ?

So here is appears to take the string in “E_MO” and pass it to the function “B_RA” and when it returns it will set the value of “F_DC” as an object.


So what this does is take the string of numbers and uses 3 numbers at a time then subtracts 0x1A (26) from the value then converts that number to a Character.

So after decoding the first string we see.


So the object that gets passed is “Wscript.Shell”.

The rest of the longer strings appear to be junk code until you get down to here.


Here we see it is getting the string “SP_LL” from the active document.  When we search for it we find it in the “settings.xml” .


So now we need to take this string and run it thru the same B_RA function and see what is output. It will then get executed after passing back to the AutoOpen function.


If we go back to the AutoOpen function and Continue on now that the strings are decoded.


It will use WMI’s Win32_Process to load “Cmd.exe” and the rest of the script.

Lets take a closer look at the decoded powershell script.


If we look at the highlighted area in the screenshot we can see above it that there were 3 variables “set”. This will rebuild the string “powershell”.

As we can see here this just downloads an exe from a site and runs it.

Anyone interested in getting a better understanding of the P-Code I would suggest looking at the source code of pcodedump and this file to get better handle on how it works.

I would have liked to went in deeper on how the P-Code works from the byte level but I’m still learning that myself.

That’s it for this one.

Here are the full list of resources and a few extra not covered in the post.

Twitter threads for this sample:
Main thread Here , Second thread Here , Third Thread Here

Didier Stevens @DidierStevens ISC Diarys :
Here and Here

Vess @VessOnSecurity pcodedump tool”

Decalage @decalage2” oletools :

Derbycon 2018 talk “VBA Stomping – Advanced Malware Techniques”

Posted in Malware | Tagged , , | Leave a comment

A look at a bmp file with embedded shellcode

The sample today is from PaulM @melsonp

While watching his BSIDES Augusta talk from 2018  Here,  at that the end he shows a picture file that gets downloaded from a layered PowerShell script. He was kind enough to send me a copy of a similar one to take a closer look.

I originally thought it was one of the PowerShell only decoder scripts for picture files but here is what we first see. This is the first layer .


After Base64 Decoding this we get.


Here we can see this is base64 –> decompress to get the next level. But they have one more trick.


Before we can Bas64 Decode –> Decompress this we first have to do a string replacement of  “!” with “A” in order to get a proper Base 64 encoded string.

After Decoding we get this.


This appears to be a normal Meterpreter PowerShell Shellcode loader but in this case it is only downloading a bmp file.

The other ones I have looked into have either had the Shellcode on this page base64 encoded or hex encoded or downloaded it as this has with the picture file.

After a discussion with Paul he was able to locate the pdf of the presentation of the builder for this here and I found the video for the presentation  here and the Github for the project is  here.

Here is what we see when we open the downloaded file.


The first 2 bytes are normal for the bmp file format. If we open the file as a picture it is indeed the the default picture of a cat from the builder  “flipping you off”. (Which I won’t show)

So lets dig into the pdf to see how this works.

Note: I’m still learning how to read assembly. But we learn by doing.

On this page we see we have the 2 byte header “BM” 0x424D then a Jump instruction of 0xE9 then a 3 byte offset. According to This page there are more possible “jmp”  instructions that could possibly be used.


In our file we have the offset in  little Endian byte order of 0x30C403 ,and if we reverse that to 0x03C430 that is our offset to jump to.

If we jump to that offset we can see it is at the end of the file.


Now scrolling down the pdf a little bit more we see that they also attempted to obfuscate the decoding key.


What this is doing is setting ebx to Zero and then looping a counter until it matches the “Magic” value that was randomly generated on build.

After it matches, it reverses that hex value and will use that value to xor the first 4 bytes of the encoded data to produce a decoding key which will get reversed again for decoding the remainder of the bytes.

I first wrote a brute forcer to work like the function here but after looking at this longer and getting a better understanding of what was in the registers I finally realized that this entire brute force routine was a waste of time and CPU power. No matter what the Random “Magic value” turns out to be the index value will always end up equal to the “Magic value”.

So when building an offline decoder we can just bypass this and and just use that found value for the “Magic” in our calculations saving a lot of time and CPU cycles.

In order to figure this out I also had to take a closer look at the builder.

If we look in the source file of gen.py we can see the layout of the decoder bytes.


So lets just use this CyberChef recipe Here to get the assembly for the bytes starting at the offset we jumped to in our downloaded file.

And we get this.


For me this is a little harder to understand so lets go back and just put the data starting with the decoding routine to the end of the data  into CyberChef and see what we have.

This looks a little different.


In order to get a better handle on what was in what registers I ran it thru Scdbg.


If we look close at this report we see it fails at the op code 0x0FC9 . The “BSWAP ECX”

It was still enough to help me understand the values in the registers at the time.

I may not fully understand all of what the assembly is doing but I’m able to understand enough to work out how to decode it.

If you look at the above screenshot of the assembly you can see the notes from what I think I  understand on how it works.

If we look back at the the source code we can see it lines up where I have commented as random.

Here are my notes on how the function works to decode the bytes.


Here I am just reversing the first 4 bytes of the encoded data instead of the “Magic” Value” as it appears in the assembly.

The next step is to build a tool to extract the shell code.

I first start by importing the entire bmp file into the tool. I then extract the offset. Next Jump to the offset.

Next I extract the data from the offset to the end of the file. We no longer need the bytes before the offset.

Since I write all of my tools in vb.net and I have not found a good way to do byte array searches in byte arrays. So I will convert these remaining bytes to a hex string and work with the data as a hex string.

Just a note It is very resource intensive to convert a file that size to a hex string to try and parse it that way. (I tried)

Since I am now working with strings of hex I can now search for the unique byte sequence as a string instead of a byte array to do the compare with the byte code before the “Magic value” in order to find and extract it.


Since this sequence will be in every file we can do a search for it and then locate the Magic value in the hex string. Once we find that sequence before the “Magic” we can then extract the next  4 bytes (8 Chars) for the “Magic”.

Next we have to locate the start of the encoded data. For that we can find what this function ends with.


You may also notice another value we could extract. The size of the encoded data. We could get that so there is not extra nonsense data in the decoded shellcode.

So after we put all of this together we end up with the new tool.


If we load the hex string shellcode into another tool I’m working on we get.


One thing to note. For this type of shellcode the first byte is always 0xFC and the second byte will vary depending on if it is a 32 bit or 64 bit shellcode.

So the question would be how do you find a file encoded with this.

With a few pointers from Florian Roth @cyb3rops I was able to create this Yara rule.

rule DKMC_Picture_File {
  description = “Detects DKMC encoded bmp file with shell code”
  author = “David Ledbetter @Ledtech3”
  reference = “https://github.com/Mr-Un1k0d3r/DKMC”
  date = “2019-27-02”

     $my_hex_string1 = { 424DE9 }
     $my_hex_string2 = { 31D981F9 }
     $my_hex_string3 = { E8B7FFFFFF }

$my_hex_string1 at 0 and $my_hex_string2 and $my_hex_string3


After sending this to him he modified it to do the first 3 byte search as  UInteger.

Here is the modified version.

rule DKMC_Picture_File {
      description = “Detects DKMC encoded bmp file with shell code”
      author = “David Ledbetter @Ledtech3”
      author = “Florian Roth @cyb3rops” // modified first 3 bytes to be detected as Uint.
      reference = “http://github.com/Mr-Un1k0d3r/DK …”
      date = “2019-27-02”
      $my_hex_string2 = { 31D981F9 }
      $my_hex_string3 = { E8B7FFFFFF }
      uint16(0) == 0x4d42 and uint8(2) == 0xE9 and
      $my_hex_string2 and $my_hex_string3

I’m not sure if it is faster or not but both do find the sample I have.

A Search on Hybrid Analysis didn’t find anything using  the yara rules.

A retro hunt by Florian Roth @cyb3rops On VirusTotal resulted in several hits for this rule.

Here is the Pastebin of the found hashes here .

Well that is it for this time I hope you learned as much as I did.

Posted in Malware, PowerShell, security | Tagged , , | Leave a comment

A deeper look into a wild VBA Macro

This Sample comes from Brad Duncan @malware_traffic from his SANS ICS Diary located Here and the Files on His blog Here.

For this session I will be using “2019-01-23-example-of-attached-Word-doc-1-of-7” word document.

I ended up looking at this from different directions so that is what I want to try and show here.

The first thing I always do is to look at the file in a hex editor to verify what type of file I am dealing with. Never trust a file extension.


As we can see by the 8 byte file header we are dealing with a OLE file vs. say the XML or the Zipped style or RTF form of a document.

My next step is usually to drop the file it into Office Libre to see if it will even open.

Here is what the Document looks like.


Next let’s look and see if there are any macros available. Some times no macros are detected using this program so alternate methods / programs need to be tried to verify there are no Macros.

So when this first loads even before the “AutoOpen()” Sub, it does a “GetTickCount” call to the Windows API.


Since we are here lets take a closer look at this function.


The “#If VBA7 Then” is what caught my eye. According to This question on StackOverflow it is checking for 64 bit Office on a 64 bit system.

Another Odd thing I noticed was when you click from the Module1 tab at the bottom to the ThisDocument tab then back the function name changes to the AutoOpen one.


So now we can use the “Save Basic” button to save this Module1 as a “.bas” file to take a closer look.

But lets go further Now that I have the Decalage @decalage2 and Didier Stevens @DidierStevens tools installed lets see what they tell us.

We start with Olevba


As we can see here it outputs the macro for us and also gives us more information about what happened when it was checking it including the decoded IP Address.

Not all of the Information in the box is “Always” correct. So you may need to verify.

Now lets take a look with Oledump.py We start with the basic command to see what streams are in here.


We can see in stream 7 there is a upper case “M”. That lets us know that there is code in the macro. So lets look at that.


That looks like the data is compressed so lets add the –v switch to decompress this stream.


Now that is much better. We can now output that to a text file and take a closer look in our favorite text/ code editor tool.

Lets look at 1 more method before we dig in deeper to how the rest of the code works.

I’ll use 7Zip to decompress the document and we see the folder/ file system.


Lets dig into the Macros folder and see what we have.


We have files and a folder. In the VBA folder we have .


Now here is what I’m looking for. Lets take a look at module1 in a hex editor.


We can see here that there is some plain text but this “Stream” is compressed.

Before I learned how to use oledump.py I had wondered how you extract the data in this file /stream.

I had read This article in that past but didn’t understand every thing it was telling me.

But using the code provided there and with some modifications I was able to build a tool to decompress the single stream. I wrote the tool mainly to “Try” and understand how the encoding/ compression worked.


So that now gives me 1 more way to extract the macro(s) from the document.

I also Installed and Ran Vipermonkey today to see how that worked since I have never tried it before.



As we can see here it also extracted the script but seemed to have a problem with the VBA7 code.

Here is a list of the commands I used for olevba and oledump.

All commands are run from opening a CMD prompt in the folder where the document was located. (Shift + right click on folder , select Open command window here)


Let’s dig into this code some more because it is crazy.

The first part of the code you can see in the screenshot of my tool above is just a large block of junk comment data.

If we start checking for references of declared variables before the the “AutoOpen()”  we can find that there are several that are never used so they are most likely just junk filler to make it harder to read.


This code does a series of converting the “Val” and “Len” values all of the way thru this code.  Even once we convert those values we still have to do the math for each line.

So I wrote a tool to understand how the “Val” works. This Link will give you and Idea.


As we can see it will input that string and return the numerical value. Basically cleans all non numerical values. But this value could have also been “&H” for Hex or “&o” for Octal.

We know “Len” is the length of the string so somewhat easy. The hard part is to parse this code and do the replacements for the numeric value.

My tool still has a bug or 2 but will parse this well enough for us to get a better Idea of what this is doing.



Now that some of the extra obfuscation is out of the way we can look closer at what we have.

After going thru and doing the math by hand we see this. The part with “****” next to them is where the two main values  are reset to a new value.


At the end of the lines I also calculated the values for the “Left”, “Mid”, and “Right” values. These get used to get the sub string from those functions and the output gets appended onto the final string that get run in the “Shell” command at the end.

If we zoom in on these values we can see they are only taking a few characters from each string.

The first number (green text) is the position to start taking from, and the second is the length to take.


If we keep scrolling down we can see the IP that gets called out to.


We also see towards the bottom this interesting code.


We can see where it will possibly insert a break or clear formatting.

The GetTickcount  to me seems like this might be some type of anti debugging or just another time waster. ( Without verifying , you would think the tick count would always be greater. Tick count Explanation)

If it is less than 1.2 then it will change the the output value to the garbage string to that will get run by the “Shell” and fail.

Now the “Shell” which will run what got put back together.Shell

The first part of that before the “+” is just junk code. It doesn’t do anything that I could find. In the Shell it is passing the rebuilt string and the numeric value that gets passed. (I didn’t do the math all of the way thru. )

Now that we have a real good idea of how this works how do we output this so we can see what the final string is before it gets executed ?

I tried to open it up in Office Libre and modify the Macro code but that didn’t work.

After building a new Clean VM I installed a copy of Office personal in there.

Lets see what it looks like in the real office.


We already have a pretty good idea of how this macro works so lets open this up and make some changes then save them.

I’m not sure if it would make any difference but lets comment out the section looking for is wow64.


Lets also make a change to the GetTickcount to make sure it is not an issue.


We change the value to greater than the “1.2008” that gets checked later on.

And the final Change to the “Shell” lets replace that with a MsgBox call instead.


And after saving the changes and clicking “Enable Content” we get this.


We can then left click on the MsgBox and hit Ctl+C to copy the data and then paste it into notepad.


One strange thing that happened was, when I clicked “OK”  the Document looked like this afterwards.


What Happened here ?


It looks like there is code here for clearing the formatting and the image.

Up higher it looks like this would work for an Excel sheet also.

And when we go to close it it just ask us if we want to save the changes.


So the macro calls out to the IP with random 7 Character string and “.jpg”

The function will choose a value between 97 and 122 which is the ASCII code range for lower case letters. For each random Number it will convert it to a lower case letter (ASCII Char code) and add that to the final value for a final length of 7.

So that is that for the Decoding part.

The next problem was after enabling that content it would not “Un-Enable” no matter what the settings were.

So what is the Problem with that ?

After enabling the content once it now becomes a “Trusted Document” , the problem is how do you Un-Trust it again ?

We have to go to File –> Options –> Trust Center –-> Trust Center Settings (Button) –> Trusted Documents –> Clear all Trusted Documents …… (Button “Clear”)


I’m not finding a way to see a List of what is trusted or even that there are any trusted documents. Perhaps there is a screen I’m not seeing somewhere.

Also I don’t know if this is a bug or not, but the “Allow documents on a network to be trusted” seem to automaticity recheck itself after I Uncheck it close the document and reopen it.

So I used the Mantra “When in doubt run Procmon” to locate where these are.


I first set a filter for “Category is write” then looked for the string “Trust” once I found this registry key I added a filter for “Begins with” on the registry key and removed the other filter and got the above view.

And If we look at it in the registry.


And if we Dump the Key.


I did a hash calc of the document after it was saved so I have another question, what it the hash they are using ?


I’ll also Have to Figure out the Time format too.

Once we clear these (By clicking the Clear Button)  this key will be deleted and the Documents will no longer be trusted.

Well that’s it for this one. I hope you learned as much as I did.

Posted in Malware, Programming, VBScript | Tagged , | Leave a comment

A Look under the hood of a batch encrypted file

The sample in question today is thanks to a Twitter thread by Nick Carr @ItsReallyNick and Daniel Bohannon @danielhbohannon of FireEye located Here about this builder being used to encode batch scripts.

After downloading the sample from VirusBay @virusbay_io that Nick linked to, and after removing the first 2 bytes (byte order mark) from the file I was able to open it up in Notpad++.

Here is what we are greeted with.


That is a lot to deal with so lets take a closer look.


Looking at this we can see several things that stand out. It is using environment variables  in the form of  “%os:~-4,1%” .

This is actually a 2 part operation. The left part “%os%” will get the expanded environment variable for the OS the right part separated by the “:”  “~-4,1” will get the position to start getting characters at and the length. Notice here though that the first value is “-4” so this means we start from the end of the  expanded value and work back 4 characters and then get 1 character.

Lets see this in action on the command line.


So here we can see that the 4th value back is a “s”. We do the same for the others.

The other thing you may notice from the boxes in the screenshot above is some plain text in between the environment variables. When the text is encountered it will be passed onto the output as plain text and no need for other processing.

One final thing you may also notice is there is a very large block at the bottom of similar looking strings like the environment variable from above but instead of something like “os” we have ‘ just a single quote. The only problem is we have to decode the top part to see if it tells us how  to decode the bottom part.


So now we have enough information to build a tool to decode this based on the observations so far.

So after a day of building, testing and bug hunting we end up with this.


Here we can see the top seems to somewhat decode to something but the bottom part is just gibberish. So so what is the problem and how to figure it out ?

Well thankfully Michael Bailey @mykill of the FireEye FLARE Team came out with a tool called “De-DOSfuscator” that works for this type and a blog post Here . After studying the blog post several different times I was noticing that the output of my tool was similar to what his tool output in Figure 7 was in his blog post . So I guess I’m on the right track but how to get the rest of this to decode.

If we take a closer look at the first part that gets decoded we can see that there is a set variable in 2 places  to set ‘ single quote  = [long string of characters] and then an “&” at the end.

We can see this better if we split all of the strings at a single “&” .



Now we can see how the value for working with the bottom part is set.

So after trying several variations of this string and no luck decoding any of those values in the bottom section I finally break down and install “De-DOSfuscator” on a VM. After after several false starts and some help from @mykill I get it set up and running the way it is shown in the blog post.

By using this tool you don’t have to understand how cmd.exe parses the files as it lets the cmd.exe interpreter do the parsing and just logs the results.

You may notice that one of the commands in this decoded part of the script was a shutdown command. Upon running the tool and the batch file I was not disappointed and the VM started to shut down, but not before saving a log file thru  “De-DOSfuscator”  of what commands it had run up to then. Here is what I saw upon restarting the VM.


Although the output is very similar in my tool as it is here, something is different.


If you are able to zoom in, my tool output a much longer string. So what is the difference ?

As the “De-DOSfuscator” intercepted the parsed values cmd peeled off the the extra “^” characters.

If we download the “Dosfucation ” White Paper from Here we can see some information about the use of the character “^”on page 13  and on page 18 we can see a screenshot of a script similar to what we are working on here.

So the next step is to hard code this value into my tool instead of extracting it and using the raw string and see if it will decode the remaining values correctly.


Great, it looks like it decoded part of it but the rest is still a mess so another new tool to just work with this part.

We now take the key/ string value from this tool and load it into the new tool along with the full section of remaining index values that start with the “%’:” (percent, single quote, colon.)


This tool will extract the decoded string thru to the final “echo “ and double quote  and then also return the remaining unused variable indexes.

Oh , this looks like there is another layer with a different index string/key.

While comparing the output from the  “De-DOSfuscator” to the decoded value from my first tool what I discovered what I needed to do was, do a string replace of “^^^” with “^” to get the correct index string/key. I added this option to automatically do this in the tool. I did it by hand the first time thru.

So after multiple passes we get to level 11 and we can see that we have whittled down on the array values quite a bit.


And finally pass 12.


Further testing of this tool to figure out what the last remaining values were revealed that it was a bug in the way the it extracts the remaining values to output. The program had reached the end and wrapped back around to Zero so it output the entire input string instead of returning nothing. I’ll have to fix that.

A closer view.


This took 1 pass to get the original “key” and 11 passes to get the final decoded string.

So thanks Nick Carr @ItsReallyNick  for trolling Daniel Bohannon @danielhbohannon .

This was a very interesting learning experience.

That’s it for this one I hope you learned as much as I did.

Thanks for reading.

Posted in Malware, Programming, security | Tagged , , | Leave a comment

Understanding Invoke- “X” Special Character Encoding

I say Invoke- “X” because it can be found in both Invoke-Obfuscation and in

We can find a reference to the encoding scheme in this Twitter thread Here where @danielhbohannon references the the blog post from 2010 by @mutaguchi where they demonstrate a “Hello World” encoded string. I had to translate the post to view it. You can find the post here .

We can also find the link to the site in the Invoke-Obfuscation master folder in the script “Out-EncodedSpecialCharOnlyCommand.ps1”.

The script we are going to be working with today is from another Twitter thread on September 12 2018 located Here . It is a pastebin link from @James_inthe_box.

Here is what this script looks like.


And a smaller sample view.


Just looking at this it looks like total junk code.

After reading the other blog post we have a few ideas of how to work with this so lets clean this up a bit. The first thing we want to remember is that the character “;” is used as a command separator so let separate these to a new line to make it easier to read.


Now that we have the commands on there own line we need to understand what the first one is doing.


What this first command is doing is creating a hash table to contain the values on the left side of the “= ++” to the hash table name of “${‘].}” on the right hand side.

As it goes down the list it will set the index position in the has table equal to the value Inside of “{ }” on the left.

What this will do next , or as it sets the values it will do a string replace or “lookup” of the value and the string like “${$}” will get replaced with the number 0 on the rest of the script.

Here we see what happens when we replace each value with the index number.


(I’ve restored our left hand values after doing the replacements. Always do replacements on a copy)

Now lets take a look at the next command and see what it is doing.


As we can see we now have some number inside of the “[]” like this “$(@{})”[  7  ]”

The best I understand is that this taking the the hash function name of “System.Collections.Hashtable” and in this case  taking the 7th character to build a string.

So if we take a indexed list of that string and get the 7th Character we end up with “C”.


So we go thru and replace the 3 characters and then get to this one.


In Short the “$?” will evaluate to true or false if something succeeds or failed. In this case what it gives us is the string “True” and then we take the character at index 1 of that string which  = “r”.

So now that gives us. “${*@}  =  “[Char]”  ;” and we can do the replacements for that.


Our next line will do replacements similar to the one we just did so lets do those.


So now we can see that it decoded to the string “insert”. But the way it is called it will set the value of “${‘].}” to the Signature of the function in the form of 
“string Insert (int startIndex, string value)” and you can find a list here.

So now we have 2 “+” on this next line. The first 2 are like the last 2 lines so lets do those replacements.


Now for the last value we are setting the value of  “${‘].}” to (“ie” + the Insert Sig.) character at index 27.


So index number 27 = “x”  so that makes our string now “iex”


So the last step for this level of encoding is to do the char code replacements.

There are multiple ways to get the char codes decoded from this point but I will go thru and format it so I can just run it thru my tool.


You may also notice that there is a “|” and the variable name for “iex’ at the end here.


The Final Decode.


In the usual fashion after going thru this by hand I like to build a programs to be able to just copy paste the encoded string , click a button and get the decoded value back.


As you can see from the output, the decoding for this piece of malware is far from being complete but this is as far as we will go with it in this post though.

Thanks for reading.

Posted in Malware, PowerShell, Programming, security | Tagged , | Leave a comment

What is in this file ?

The other day I was pinged about a very large .jason file that appeared to contain a large  Base 64 string that took up almost all of the file. There was a problem extracting the base64 string do to the size of the file. See the Twitter thread Here and the location for the Jason file Here on https://beta.virusbay.io/ .

When we first look at this file it is in fact a json file.


The size of this file is 2,7151,069 (0x19E4ADD) bytes / Chars long. In my experience anything over 500K chars will take to longer to render in a “Windowed” tool which also explains why Notepad++ choked on this file and sent my CPU into overdrive trying to render the text. I do have some tools that I had to increase to 700K to handle the sample code but takes a bit longer to render and refresh.For what ever reason the Hex editor seems to handle very large files fairly quickly. For this one reason is why I usually reach for a hex editor first to look at files.

After scrolling thru this file with a hex editor to verify that this is just one long base 64 string instead of many smaller ones, we need to make a working copy.

Once we have a copy of this file so we don’t mess up the original and can always start over in case we make a mistake. We need to open it in write mode and remove everything that is not part of the base 64 string and save this file.

Next we need a tool that can input this file and then output a new file that is base 64 decoded.

I wrote a tool just for dealing with large base 64 string like in this case. You could also use python or even PowerShell or another language to do the base 64 decoding. Just stay away from a windowed tool for this file.

After we get the file base 64 decoded we can then open this file up in the hex editor again and we see this.


Going into this blind with no information to go on all we have to figure this out is the way this file “Looks”.

What is the first thing we notice ? There are several repeating strings.

So this is encoded somehow but by what method ?

If this was Aes or RC4 or some other type of encryption there should be more diffusion of the encoded characters.

What else could this be ? From experience my first guess would be Xor. It could also be adding or subtracting a value for the Char code.

So what happens if we take the first 4 bytes in the third line that repeats several times and Xor the first part of this file with those values.


At the top we see the “MZP” and the highlighted part we se the “PE” part of an executable file header.

So we are on the right track but don’t yet have the full key so lets let take 12 bytes up to the first “FF” and see what we get.

That is worse so lets reduce the length until we get back to 4 bytes or we get something better. Well that didn’t help so lets just go to 16 bytes and see what that does.

Well 16 bytes (just a random amount to try) gave us some more information on this file.


If we look at a normal file that start with “MZP” we see this.


So since this appears to be what it looks like decoded we can take all of the bytes from the start to the end of the “Win32” and Xor the equal amount of bytes of the encoded file to possibly get the decoding key.


As we can see here we have a repeating pattern which is our decoding key.

Here we are doing a ‘Plain text attack” any known plaintext Xor’ed by it’s corresponding  encoded text will reveal they pattern and thus the key.


Our final step is to use a tool that will Xor the full file by the key we just extracted.

Again do to the size of the file I would warn away from Windowed tools for this.

Here we have the final decoded binary.


If you looked at the Twitter thread you may have noticed this turned out to be “banload” malware.

That’s it for this one. I’m sure there is a easier way to do this but that is just what I tried in real time on the twitter thread.

Going in with Only the file to look at we can not be sure of the contents.

So take what we know and apply it to what we don’t know and keep learning.

Posted in Malware, security | Tagged , | Leave a comment

A look at a Word document macro using Invoke-DOSfuscation

The sample from this one comes from  Packet Wire @packet_Wire. Twitter thread here 

After getting the location of the Word document and downloading it. The file name was “Auditor-of-State-Notification-of-EFT-Deposit” with hash values of.

Sha1: 4C7C8B1897CA22E4E477C361DAF676D471A4F4AF
Sha256: EBDA287F6B33A0C7A689E1D8FDE7ABC708C9DFBCA2759A56CD055868B2CC0911
MD5: 35756ECC87405E42F62DEEEEF18FD43A

Let’s dive into the macro.

Using LibreOffice we can see in the code under Document objects –> vUwdwkwHZAwSRz –> Private Sub Document_open() .  It will eventually launch VBA[.]Shell with a reassembled string.


Looking at the bottom we see it contains 3 scripts.


So we can now extract these 3 scripts and open them up in Notepad++ or your favorite text editor.

Before we leave the document we also want to check if there are any property’s we may need for decoding or any forms that may have any pre-filled values that we may also need later. In this case there are none.


The screenshot above goes over the various parts that makes up this “Style” of encoding. I have decoders/string builders for 5 other versions and I am sure I have missed a few more in-between.

If we search for the two values in the beginning of the “VBA.Shell” line and to the left of the “CVar(“C”)” there are no other hits on the names so those are just junk inserted and will get handled when not found with the “On Error Resume Next” routine.

If we take a look at the script we saved as “BlUkafEw” then we can see that this entire script contains junk code that does not get used except as a distraction/time waster.


One of the next things we may notice is the heavy use of  “CStr”.


Here we see the “CStr” wraps the “Chr” function to convert a numeric char code to string. Searching the other value names inside currently return no other hits so in this case they can safely be ignored.

That leave us with the value of  CStr(Chr(99))  which is “c”.

Next let us put both of the remaining scripts into 1 document to make searching for strings easier between the 2 scripts.

As we look and search for the string to get rebuilt we can get a count of how many times the string is found. If it is only found once then it most likely junk. That does not mean that earlier in the code it does not decode to the string you are looking for.

In this case no. We still have junk as the second value after “CStr”.


The next value though has 3 hits.


So lets take a closer look at this first function that will reassemble part of the final string.


Doing a few string searches we can see that this first function contains a lot of junk code so lets clean it up to get a better idea of how the function works.


Now we can have a better understanding of how this works. The original line of code will call each function name and then each function will reassemble a string and return that to be tacked onto the original call.

So one of the first things we will want to do is go thru all of the “CStr” function and replace those with the char for that char code.

So another new tool. We input the combined 2 scripts and the string we are looking for “CStr” and it will find and replace all of them with the char.


Now that the char codes are replaced with the Char we can go onto the next tool that will reassemble the final string.



Now that the easy part of rebuilding this long encoded string is complete we now move on to the harder part, interpreting what this is doing.

After viewing this Twitter thread from Shiao Qu @ShiaoQu17 here it got me looking at this in a different lite. We need to break up the string in sections.


The first section that is highlighted will build the cmommand for cmd.exe to launch the decoded part.

The next section is the Encoded string.

The final section is the directions how to decode the center part.

Here is part 1 cleaned up a bit. Although I still don’t fully understand “exactly” everything it does. I do have a real good idea how it works.

Part 1Cleaned

In the second part, everything after the first equals sign to the closing parentheses “)” is the encoded text we will be working with.


In part 3 we will get an idea of how this string in section2 gets decoded.


That is still hard to read so lets clean that up some.


I may have gotten carried away cleaning this string but the important part for me is the 2 values highlighted “2153, –6”.

What this does is takes the encoded string and set an index value of “2153” which is the string length and takes the last character in the string then advances backwards 6 chars to the next value. So our first char will be “p” and then count backwards every 6 chars to get our decoded string.

In my tool I just reverse the string and count forwards.

I seen the video here by Karsten Hahn @struppigel for a different type and he does a better job of explaining the way that the call works at the end for that type and returns the decoded value.

The trick here is that when it will automatically remove the extra characters as it runs under cmd.exe. If the string is not cleaned correctly when decoding by hand the the index will be off and return the wrong output value for the decoded script. One other thing I found was in Powershell we generally get rid of the tick mark “`” as an escape char but here it appeared that it was paired with “^”  looking like it would be removed also. After more testing I only ended up removing the Cmd escape char of “^”. Or in the case of my current tool I do a string replace of “^`” with “*” which is one character that was not used in order to keep the proper length. The replacement really was not needed in this case.

Here is the decoded script and tool.


And the final decoded script formatted for better reading.


Although this looked very complicated from the start. Especially if you already read the scripts of the builder here by Daniel Bohannon @danielhbohannon or read the white paper located here it did not turn out to be as difficult as it first appeared to beat the encoding .

If you don’t care to really understand how it works you can always drop the document on ANY.RUN @anyrun_app Like this sample was here .

Just scroll down the right till you find the poweshell.exe process , select on that. Then view more information to get this screen.


I still have a lot more to learn about how some of the obfuscation works. I was still able to extract the final payload which was the whole point anyway.

That’s it for this one I hope everyone learned as much as I did.

Posted in Malware, PowerShell, VBScript | Tagged , , | Leave a comment