The other day I was pinged about a very large .jason file that appeared to contain a large Base 64 string that took up almost all of the file. There was a problem extracting the base64 string do to the size of the file. See the Twitter thread Here and the location for the Jason file Here on https://beta.virusbay.io/ .
When we first look at this file it is in fact a json file.
The size of this file is 2,7151,069 (0x19E4ADD) bytes / Chars long. In my experience anything over 500K chars will take to longer to render in a “Windowed” tool which also explains why Notepad++ choked on this file and sent my CPU into overdrive trying to render the text. I do have some tools that I had to increase to 700K to handle the sample code but takes a bit longer to render and refresh.For what ever reason the Hex editor seems to handle very large files fairly quickly. For this one reason is why I usually reach for a hex editor first to look at files.
After scrolling thru this file with a hex editor to verify that this is just one long base 64 string instead of many smaller ones, we need to make a working copy.
Once we have a copy of this file so we don’t mess up the original and can always start over in case we make a mistake. We need to open it in write mode and remove everything that is not part of the base 64 string and save this file.
Next we need a tool that can input this file and then output a new file that is base 64 decoded.
I wrote a tool just for dealing with large base 64 string like in this case. You could also use python or even PowerShell or another language to do the base 64 decoding. Just stay away from a windowed tool for this file.
After we get the file base 64 decoded we can then open this file up in the hex editor again and we see this.
Going into this blind with no information to go on all we have to figure this out is the way this file “Looks”.
What is the first thing we notice ? There are several repeating strings.
So this is encoded somehow but by what method ?
If this was Aes or RC4 or some other type of encryption there should be more diffusion of the encoded characters.
What else could this be ? From experience my first guess would be Xor. It could also be adding or subtracting a value for the Char code.
So what happens if we take the first 4 bytes in the third line that repeats several times and Xor the first part of this file with those values.
At the top we see the “MZP” and the highlighted part we se the “PE” part of an executable file header.
So we are on the right track but don’t yet have the full key so lets let take 12 bytes up to the first “FF” and see what we get.
That is worse so lets reduce the length until we get back to 4 bytes or we get something better. Well that didn’t help so lets just go to 16 bytes and see what that does.
Well 16 bytes (just a random amount to try) gave us some more information on this file.
If we look at a normal file that start with “MZP” we see this.
So since this appears to be what it looks like decoded we can take all of the bytes from the start to the end of the “Win32” and Xor the equal amount of bytes of the encoded file to possibly get the decoding key.
As we can see here we have a repeating pattern which is our decoding key.
Here we are doing a ‘Plain text attack” any known plaintext Xor’ed by it’s corresponding encoded text will reveal they pattern and thus the key.
So our decoding key for this is “9ECF6733190C86C3E1F0F8FCFEFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF”
Our final step is to use a tool that will Xor the full file by the key we just extracted.
Again do to the size of the file I would warn away from Windowed tools for this.
Here we have the final decoded binary.
If you looked at the Twitter thread you may have noticed this turned out to be “banload” malware.
That’s it for this one. I’m sure there is a easier way to do this but that is just what I tried in real time on the twitter thread.
Going in with Only the file to look at we can not be sure of the contents.
So take what we know and apply it to what we don’t know and keep learning.