After my last post Some data on Angler Exploit Kit I had received a request to write up a tutorial on decoding the Angler EK. The Question is where to start ?
Since they seem to be on vacation or are in the middle of a new version development I’ve decided to write this up. The basics presented here could be used for any exploit kit.
In order to even start reversing this or any exploit kit you have to have some basic understanding of Java Script / Html and how they relate to each other. I always hated trying to write a web page but find myself going back to relearn how things worked to find out how the malware is working.
Most of these exploit kits and redirect pages are using some form of obfuscation technique for the decoding functions whether it is just adding a lot of white space, converting to escaped characters, or even adding in random comments like “/* this is a comment*/” in order to mess with the beautify tools to get the code back to some type of readable form.
In this post we will be working with the Pcap from http://www.malware-traffic-analysis.net/2016/06/01/index2.html and using the pcap 2016-06-01-pseudoDarkleech-Angler-EK-after-hideandseek.leadconcept[.]net .
Before we can start decoding the landing page we must first find and extract the landing page using Wireshark.
The fastest way I have found to Identify the landing page is by using the filter of “http.response”. The latest Angler EK pages were always just before a “404 Not found” in the info section. When viewing the traffic and the code you find that there are 2 identical request and the first seems to almost always fail with the 404. On occasion you will find that you have 2 Flash request instead of just 1 and no 404.
If you see a large gap between the packet/frame of the 404 and the previous one then the landing page will most likely not be shown and has somehow gotten corrupted and possibly missing some bytes like some of them I encountered in the last post.
After locating and verifying that you do have the Angler EK landing page then you need to extract it. Personally I always extract it as a text file so when I’m tired I don’t accidently run it. We need to go to File –> Export Objects –> HTTP
Then we see this.
Next we click the save as button to get the location where you want to save it to (I won’t show mine here : ) ) Another personal preference of mine is to save it with the Packet number , like here I would save it as “ Packet-150.txt” in what ever folder I chose, this way if I need to find it again I can just jump to that packet or in the case of multiple landing pages per pcap I can be sure I have the correct packet.
Also take note of what it will suggest to save as. I have seen it want to save with several things “\” , a name and php, png , jpg or even html. They try and hide the landing page with several extensions.
Once we extract the landing page and open it up in our favorite text editor we see this. My personal choice at the moment is Notepad++.
If we keep on scrolling down we then see this.
By now you have lots of questions but the first is always, what the heck is this stuff ?
In this last screen shot the code is in script tags, so it must be used somewhere, right ?
So now what ? Lets start by extracting the entire script from the the center of the page and get it into some sort of readable state. When you are first de-obfuscating the script you will want to work with a copy so you can compare if something went wrong in the process.
Here we see that even after using a Java Script Format function it is still hard to read.
I wrote a tool to fix this but it still has a few bugs so it is just as easy to fix it by hand.
Also notice at the bottom is 2 types of comments . So lets clean this up and see what we end up with.
Now that we have it somewhat de-obfuscated lets take a look around and get our bearing as to what the script might be doing and where the starting point is.
Hmm what is this ?
It looks like this concocts to an “Eval” and look there is a RegX expression below it.
Hmm what is this item “kTPjPVb = ‘QnJNUUxhQW5PWUdWT3RS’;” it is used in a function above, could this be a variable name or a decoding key ? We will have to follow along with the code to find out. But what is the “biwi” in the function above ? We will have to trace that too.
I’m not showing the part where you take several different strings / variable names and see where they are used in the code.
If we have a slight understanding of java script we notice that these are nested functions and work together, so where is it actually called from ?
If we do a string search for the variable at the top , “uTGlITcQsYrl” then we end up at the bottom of the script section.
So the last 2 lines of this script is where it actually starts.
When a page is run it starts evaluating from the top down and it will concatenate the variables and get them ready to use.
So now what is that value in the last line ? We do a string search in our extracted script and don’t find it, so we go to the original (copy) of the extracted page and find it here .
Looking back at the script, it is extracting the inner html of this ID for the first string to decode.
So now what ? We now have enough information that we can now start building our Html/Java Script decoder.
But what needs to go into it ? First we need a Html page with a Document.Write function to write the output to the page. (borrowed from http://www.w3schools.com/js/js_output.asp and modified slightly)
We know we need the String from the variable shown above, we need the key value, and the string replace function, and finally the decode functions leaving the rest of the code remaining out. The rest of the code is for doing the eval on each page/ decoded section, we don’t want to run that code , we just want to decode what is in the sections.
Which is also where those small sections of code we showed earlier come in. They identify each of the remaining “Sections” to decode.
After some trial and error and a few choice words for the left off semi colons we end up with out first decoded section.
I say trial and error because unless you are very good at Html/Java script it will probably take several tries to get together everything you need for it to run properly. There may be a replacement variable that was not close to the rest of the code that you need to go back and find and then include.
Here we see the start of the code needed.
In this view we set the variable for the String replacements, string to get decoded, RegX string replace , the key used for decoding , and finally the variable to call the function to do the decode.
We next find the end of the decode function by the end braces and the semi colons and copy paste that into our new decoder in between the variables we just inserted and the document.write.
The “ };;;” tells us it is the end of the function.
Above you can see I added “var result = J6em” . The “J6em” was the variable name that was used to start the decode process where the encoded string and the key was passed to. It also give me one more place to set a break point at.
So we try to run it as is and it fails with a variable of zx not defined.
We can see plainly that it is defined (below), obfuscated, but defined and in several places in the rest of the script, so what gives ?
The location of where the code is in no longer in the bigger global code block since we extracted just part of the code so the function below can not use it.
So first we try and reduce the function so it is readable. That doesn’t work.
We finally realize that the problem is that the function below can not see this variable. So we move it into the function and it works, the code runs all of the way thru and we get the result we wanted.
Now we have a working decoder for this “Type” of encoding.
If you read my last article I found 7 different encodings ( string replacements) with 3 different decoding functions used in the samples I looked at.
Now that we have a working decoder we can do one of several things, we can copy it for each section and just replace the string to decode and our title at the top of the html or use the same decoder and just replace the string to decode.
Note: There seems to be a string length limit in what you can view in the F-12 debugging tools and display on the page. Some of the strings to decode can be very long.
Now as long as the landing page is using the same encoding type all you have to do is replace the string to decode and the key used for every new one you encounter. This method bypasses the problem of dynamic analysis where it is checking for User Agent strings and for other running programs.
Now the real problem.
Once you get each “Section” decoded, the decoded section may have more variables that need decoded in that “Section”. To see what they are doing you will have to start the process all over again for each different decoding function, including that section that you see at the top of the screenshot for what I call the “LowerSection” which used to always be found at the bottom of the decoding script which you now will sometimes find towards the top as seen here.
The decode function for the “LowerSection” variables is not found until you decode section 3 and then not all of strings get decoded before they are used in other sections.
In this example I did not reduce all of the variables to their full string values. If this was the first time you worked with this code you may want to do that, then comment out the others not needed anymore or just use a comment at the end of the final variable name that will be used after they are combined as I did for the “String[‘fromCharCode’] “.
Once you understand how the code works you could always use your favorite programming language and create the decoder that way, like I did.(It is much easier for high volume decoding)
I would suggest that the very first time you run one these to make sure you are running it in a VM just incase there are some surprises laid in for the analyst.
If you are not sure what a function does or what the value is supposed to be, launch it in the F-12 tools (in a VM) and step thru it to see it as it changes.
String Length Limits
I thought I was done writing this post until I went back to verify if the code would work for Section 4 knowing from experience that is is a very long string.
As I had mentioned above there is a string length limit on what a page will display.
If we check the length of the string for section 4 before it gets decoded we find that it is 66,220 characters long. which is no problem for the input.
When we decode it Using a VB.net version of this decoder we see that the decoded length of the string is 40,907 characters long.
But if we use the Html/Java script version we just made.
Notice the end of the string appears to be truncated (above).
As we see in the string length test it is only 40,546 characters long , a difference of 316 characters. So it appears that is what the string length limit is for outputting to a page .
So how do we work around this limit ?
We change our code and split the string. If we change the top to this.
And the end to this.
So what will this do for us ? Since we have a string length limit I just use 40,000 as number to do the split at and then we end up with this.
I know, you seen that it is still truncated in this version as well but here is where the workaround comes in.
If we launch this again in the F-12 tools setting a break point on “var result” then we can step a few more times until the values are filled out.
Extract the values and clean them up.
Clean off the Var name and double quotes on each end plus the word “String” on the end, or just copy paste everything in-between the double quotes .
Join them together and try to beautify them.
but we still have a problem.
Here we see that out Character count is now 40,964 instead of the expected 40,907. That is more so what is going on ?
If we take a close look at the string output to the Html page and what we extracted from the variable in the debugger we see the problem.
Do you see it yet ?
There is an escape character “\” before every single and double quote and even “\” is escaped.
The workaround for this is to copy the string from the output window for the first 40,000 chars so you don’t have to mess with the changes added by the debugger then get the last remaining characters from the debugger value and clean up the “\” by hand or write a script / program to do it for you like I did here for both.
Now once we clean and join these we end up with.
We end up with the expected 40,907 characters that we expected once we get rid of the extras. It will now also beautify.
We can now see the code to create a decoder for this encoded section.
Once We decode this section (var b) we see this. (Top)
My point of showing this section is there seems to be some more code missing from the bottom here. Not that I want to help the writers of this debug it, but if we look closer at the end we see this.
The output length before beautifying it is 29,976 characters well within the string length limit.
So this appears that either they truncated the the script before encoding it or got an extra character from another part of the code this came from. Without seeing the original it would be difficult to tell for sure. Checking a few others I find this “char” also and 1 different one at the end of different samples.
This was not the only potential mistake I found while going thru the code.
As has been said before, even the malware authors can run into problems.
Well that’s it for this one I hope I answered any questions and didn’t create allot more.