Shellcode comes in various forms for different operating systems. Some can just be dropped into a hex editor and get the needed understanding what it is doing , some may require looking at the generated assembly code generated by a disassembler or require a specialized tool that understands the type of shellcode you are working with.
The one constant that seems to be the same will the various samples I’ve looked at is that the shellcode is used as a form of obfuscation to download the final malware.
Here we will just be concentrating on the Windows PowerShell versions.
Lets start by taking a look at a “Daily Script” from December of 2017. Here is the Twitter reference for this sample.
First we need to convert the char Codes to Chars.
After that we get a base64 string.
After that we get a Powershell script Gzip stream. After we decompress that we see this.
Here we see a base64 encoded string. This is our encoded shellcode. It will get loaded into virtual memory and run. The exact implementation may vary a little but this is what I mostly see.
That Brings us to the shellcode which is what we are after.
Now we can we base64 decode to hex.
So now what do we do with it.
So now we drop the hex into a hex editor and we can now see the url it was calling out to and if we look higher we can also see a User Agent string.
Next we look at this sample found here on Virus total form November of 2018.
Here we only start with a base64 encoded script.
Now we have a Base64 encoded GZip script.
Now we see the familiar base64 encoded shellcode so lets decode that to hex and drop it into a hex editor like last time.
Well that not to helpful. now what ?
Lets try CyberChef here and look at the assembly.
Well that dosen’t look like much help either.
What else can we do ? We have John Lambert’s “PyPowerShellXray” here . Or we have SCDBG found here
After working with these the “PSXray” requires the powershell script with the shellcode to work and the SCDBG requires only the Cleaned hex of the Shellocde so you still have to base64 decode to hex to use it in scdbg. Lets see what those 2 Show us.
Here we can see some Windows API calls using psxray but something doesn’t look quite right. the ws2_32 which gets pushed backwards is not showing it all, but if we modify the python script to use the 64 bit version of the backend API for this tool we get the full Api name but the rest of the values don’t look the same.
So what About scdbg then ?
It didn’t find anything because scdbg only work on 32 bit shellcode and this is 64 bit.
So now what.
In order to save a step we can also just input the base64 string.
Looking at the way John lambert’s tool parsed the hashed api calls I wanted to be able to do the same thing but as a copy paste instead having to run it thru the vm/python process.
Another new tool.
But how do we find these hashes.
As it turns out psxray had a prebuilt list of hashes for the function calls. I had to convert those to individual dictionary items for each API to be able to use them in this new program, but first do the sheer number of them I had to build a program to do the conversion and then generated the vb.net code for me. Then I could use the generated code to do the search for the API calls.
If we take a closer look at the output of my tool we see found at index, this is the string index not the byte index. You would have to divide that by 2 if you were searching in a hex editor for the byte offset. Another thing you will notice is that the order it is found in the file is reversed to what you will find it in the assembly or the database with the tool.
That is why I put both the normal order found in the file and the “ASM Order” in the output.
Another odd thing I ran across in a sample was a hash value was found but at an “ODD” offset and closer inspection of the assembly and the found value showed it was a false positive. All of the normal offsets are divisible by 2 so any odd value may be false.
While investigating how the hashed API names worked for my Office Equation Blog post here I found a FireEye post from 2012 here about using precalculated string hashes and instructions on how to generate your own Sqlite database of known hashing algorithms and values. I will include the ones I generated for reference as a lookup database for looking up unknown hashes.
I was able to use this database to generate the remaining code for the tool above that the list form John Lambert’s tool didn’t include that I had ran across.
In this sample found on Virus Total here this was a strange one. It was originally found on pastebin by Paul Melson’s (PaulM @pmelson) ScumBots @ScumBots bot and uploaded to Virus Total.
When we first look at this script one thing we will notice is that it starts with a very large ase64 string. The second thing is it is broken up with the string of ‘+’ to mess with automated base64 decoders that can’t deal with putting the string back together and remove those first.
After we clean up the base64 string and base64 decode we see this.
NOTE: I have tested this in psxray and it will fail to parse this type.
If you zoom in on this picture you can see the this has a base64 encoded executable file embedded into. Let’s extract and take a quick look at that first.
It looks like the script will load this Dll which is a AMSI Bypass method which will then load the shellcode.
Now let’s take a closer look at this shellcode. It doesn’t start with the normal “0xFC” .
That’s hard to read so lets format it a little bit to better view what is happening.
Looking where the blue dot is we can see that this shellcode has been split apart into arrays and will get reassembled at run time.
So lets reassemble it. (New Tool)
Now that is it reassembled we can now input it into our Tool to get the IP/URL.
And also the API calls. I created this tools so it would also help give more insight as to what gets called so it may help to get a better understand of what it is doing not just the IP or Url that may show up by just running in a sandbox.
One other thing to note is that I have a checkbox for each API that gets parsed so the ones that show up as “No Hashes Found” can be unchecked and then you can rerun it to get a cleaner output.
This is another strange sample As of this writing is still on Pastebin here which is another sample found by Paul Melson’s (PaulM @pmelson)
We start out like normal with Powershell and a large base64 string .
After base64 Decoding we now have a Base64 GZip string.
Now we have decompressed this level we can just take the base64 encoded shellcode and drop it in our tool to extract the IP/URL.
Ok so What is “Shikata Ga Nai encoded shellcode” ? This one had me stumped for a bit because there where no real “clear” explanation’s on how this decoded from the byte level without using other tools.
Note: psxray has the function to decode this type of shellcode. scdbg does not work for this type.
This article here was the Closest one that helped me work this encoding out. It is found in the “metasploit-framework” found here .
The Description of it is a “polymorphic XOR additive feedback encoder” yeah that description really helps.
After reviewing the Article and anything else you can find online about it lets drop the hex cleaned shellcode into our friend CyberChef. You will also notice a difference in Cyberchef output and what psxray outputs.
(This screenshot is from my original research.)
The cyberchef is before and the psxray is after it is decoded.
Here are my decoding notes for how this decodes. It will start out with a xor key which will change from sample to sample and a addition value that gets added to each round.
You add the decoded byte with the current key to get a 32 bit value for the next key.
The next thing that needs to be figure out is where the encoded data starts at. In this case if you look at the difference screenshot it will tell you where it starts by the difference.
Another way is to look for odd/ messed up assembly instructions at the beginning of the CyberChef assembly.
Now we can just drop the decoded shellcode back into out IP/Url parser tool.
One other thing to note, if you can not figure out where the encoded shellcode starts just drop the entire shellcode into the decoder after the key and decode and remove 1 byte (2 chars) at a time from the beginning until you see this value show up or more plain text in the output.
That is the string representation of “LoadLibraryA”
There are some more strange types I would like to go thru but this is starting to get long.
Here is a list of the tools I am including in the release.
All of these tools have been used in the decoding and extraction of the shellcode.
In the base64 decode tool there are 2 buttons on the left decode as utf8 and decode as unicode . Most of the powershell scripts that base64 will use the unicode button.
To extract as hex you have to check the box and select the encoding type to extract as. Most of the time it will be 1252 from the dropdown list. This list id filled by a function to get the supported encodings for the system it is run on.
If there are any Question or problems just contact me on Twitter @Ledtech3 .
VT Link for sample
CyberChef Link for X86 assembly.
ScDbg Link to site
FireEye post on precompiled hashes Link
My Blog post on Equation Editor Shellcode Link to
VT Link for this sample
Pastebin Link to sample
Github Link to the tools and files used here.
Again there was a lot more that I would like to have gone thru.
I hope you learned as much I did.