Here I will be trying to deep dive on how the obfuscation works and what is required to de-obfuscate it.
This sample comes from @James_inthe_box posted here https://twitter.com/James_inthe_box/status/928644055054946305 on November 9th 2017.
Here is the link to the “pastebin” of the script if you want to follow along. https://pastebin.com/P5AK7div
When we first look at it we see this.
That is the full script. The first thing you may notice here is that it is in 3 separate parts so we will tackle 1 part at a time.
If we look at the top of Section-1 we see it is being set as a variable and there is a long array of binary strings.
And the bottom.
This is fairly straight forward. It will take each binary string in the array and convert it to an Integer 16 and then to a string/Char. Lets take a closer look at that little piece on the end.
So what does this last part do? It calls “$ShellID” to get the string but only uses the Characters in positions 1 and 13. So lets do that.
So we call $shellid and it returns a string as seen here and then we convert that to an indexed list to see better what letters are used.
So in the end this first level gets binary decoded then piped to IEX after we concatenate them together. IEX is short for the cmdlet “Invoke-Expression”.
This would be to much to do by hand so you could just extract the the binary array and do an echo on it to see what it decodes to. But in usual fashion I like to make drag and drop programs to do it with so I’m not having to “run” anything if possible.
I thought I had messed up my decoder at first but lets take a closer look.
Here they are using the function where the values were disassembled to a “random” array so we have to rebuild the string from the array.
So another new tool. (Still a work in progress and a few bugs yet)
Now we have most all of the arrays replaced with the strings. the next step is to do the Char conversions that we will be doing the string replacements with. In the screenshot you may see something like this “-F [ChaR]92)” So we have to convert that char code to a actual Character in this case 92 = “\”.
We also have to remove the “ ‘+’ “ from the string which tells us to concatenate the string in order to do our string replacement from strings like “ (‘+'((B’+’5d@*B5d+B5d ”.
You you will also note that there are some “-c” in with the replace function and thanks to @danielhbohannon I found out the it stands for Case Sensitive. So we only replace the string that exactly matches.
-replace (same as -ireplace)
We can also see at the bottom of the script section that there are more string replacements.
After doing all of the replacements and some cleaning up, we get this as the final output.
Here we can see an array of partial URL’s and some other variables that get set up.
In section 2 we see this.
At the top, again it is using the Shell ID to spell out “IEX” and also we have an array of Binary values.
In this case it is using Random characters as split Char’s rather than “,”. This is still fairly easy to overcome. Looking at the way this is implemented I way say there would be only a handful of characters that could not be used for the split chars.
After decoding the first level we see this.
Here again we have the mixed up arrays and at the bottom we have several string replacements.
Once we rebuild the array strings and do the replacements and clean up a few things we end up with this.
As you can see, there is still some random string cases used here which makes it more difficult to read, but still readable.
In the final section we see this.
This is similar to decode as section 2 but it uses different separator characters.
Once they got to the third level they did not add extra obfuscation like the first 2 sections.
Below we see that they are checking for tool names. I would assume no average person would have those on their system so it must be checking for a sandbox or researcher.
After viewing all 3 sections I do not believe that it is possible to automate decoding the way I have done here. There are just to many option variables to be able to account for in a script or a program. Each layer would have to be viewed and peeled away as you go until the original code is revealed.
What I have done here in a windows programs could also be done in PowerShell or python script passing in each section to decode. The only other way would be to possibly echo it out after it decoded itself which could get dangerous depending on what it would end up doing. In the case above it could get caught and end or exit without revealing what it does.
Most likely I still have a few minor mistakes in my decoded versions that would not allow them to run, but should be clear enough to tell what it is doing now.
That’s it for this one. I hope this shows it is possible to decode even this type of encoding with some patience and persistence. (And a few new tools to help)