Recently I have been going thru the malware traffic exercises created by Brad Duncan of “malware-traffic-analysis.net”.
In my last post on a exercise I started wondering about the User-Agent strings used with malware as a way to possibly narrow in on the malware. The malware in the last post put a User-Agent string in the registry in it’s configuration key. I did not run down if that user agent string was used in the traffic or not do to a lack of an easy way to get a list of unique User-Agent strings.
As with a normal web page if the malware calls out to a certain page with a specially crafted User-Agent string it will respond a certain way. If a researcher try’s to look at that same page and uses a different User-Agent string then the page could respond totally different.
Over the last couple of days I’ve been working on a program that would get a list of unique User-Agent strings from a pcap file. Of course as I was writing the program I ended up with “scope creep”, I wanted it to do more that just return a list. It will now either get a full list of all locations where it finds the term “User-Agent” or will by default return just what it determines is a unique list along with the index location in hex where it was found.
What I found surprised me. There were also errors in the frames found.
Here is what it looks like using the latest exercise pcap from malware-traffic-analysis.net.
Note: If this program is run against a pcap file that is open in a hex editor in read/write mode then it will fail to open the file for reading .
This view highlights several of the errors I encountered while making this tool. Probably a good thing I chose “this” file as my test subject, others didn’t return some of the same problems or only returned a few unique User-Agents. This one has known malware in it’s traffic.
The way my program works is it looks for the string “User-Agent” in hex bytes, then it looks for the 2 bytes that the actual string normally ends on “ 0x0D 0x0A “ then it does the math and takes the bytes in between and outputs the U-A string. As you can see in this screen shot some of it was returned in hex, that was because it could not find the end byte and when returned as a string it would not display correctly. I also had to add a limit of 300 bytes to make sure I got the longest possible string returned without returning over 1,000 bytes in some.
If we convert the hex for the top one showing it as utf-8, here is what we see.
Using the offset of “0x3E4B59” given in my search tool we can jump to that point in the hex editor and see what was going on there.
From this we see that there appears to be some garbage mixed in with the user agent string, which caused my program to calculate the wrong length so I just output those frames in hex for further investigation.
The next item is those that only return 2 characters, when it hit those my length was calculated as –2 and threw an error , so what was going on there?
Well the end bytes we are looking for “ 0x0D 0x0A “ are right at the end of the string “User-Agent” and to get the first byte of the actual string I added two to the length of the search string to the two normal end bytes for the string “User-Agent”. So when I subtracted that full length from the end position of the actual User-Agent end bytes I ended up with “-2” so that tells me the User-Agent string is missing here. In order to overcome this and to find what was there if it returned “–2”, instead of returning “-2” for a length I would instead return 2 for the length of bytes to get and to be able to have the index where it is, thus only having 2 Characters.
The next one shows a User-Agent string but with the web address tacked onto it, so the end bytes I was looking for was missing where it should have been. I also had to truncate the last few characters “: */*” from those strings returned because it messed up the string builder that I used in the program and would not display anything after the first one.
So what else can we do with this information ? To be honest I’m not totally sure yet.
But, the next question is, once you find one of these how could you find the packet that this belongs to in Wireshark?
The answer, after pouring over the file format spec was to use the timestamp.
Each packet/frame has a timestamp in it.
What is the timestamp? It is a Epoch time stamp in GMT but the normal date time is displayed in the users local time. More on this in a bit.
Lets start from the beginning. First we find a interesting UA string we want to investigate so we use the UA tool to find the offset and jump to it in a hex editor.
For this example we will use the first unique U-A string found.
UA = Index Location: 0x1F48
Microsoft NCSI UA End
It is usually the first one seen when the computer tries to connect to the internet.
So we jump to that location and search “Up” for the 2 bytes “0x54 0x56 that will help us find the timestamp, “BF8C545617920A00”. When the year changes we will have to see what the bytes are.
Update: as December rolled around the bytes to search for changed to 0x60 0x54.
So if we just open the pcap of interest in Wireshark , pull a epoch timestamp and convert it to hex as done below then get the last 2 bytes of the first half (bytes 3,4 from left) of the time stamp that will give us the 2 bytes to search up for in the current pcap.
Ok so we found the timestamp, now what do we do with it? Now I build another tool to convert the timestamp from hex to decimal Epoch time. Just copy paste from the hex editor and we get this.
Next we go to Wireshark Edit –> Find Packet (Ctl + F) –> In the popup box set the radio buttons to “String” and “Packet Details” and insert the decimal Epoch string into the search box. Depending on where you are in the capture file you may need to change the search direction up or down.
Here we see we did find the location specified by the timestamp.
While going thru learning about the timestamps here and in the developer readme file in with the source code , we find that the “Arrival Time” is synced with the local time of the computer running Wireshark but the “Epoch Time” is in GMT time as it should be, which is also the timestamp we converted. There is also something else called a ‘Time Shift” that is used to shift the time to another time zone or modified time, rather than the one that Wireshark is currently running on. According to what I read, most all of the time that should be set to “0” zero or no time shift, but it should be something that you would want to pay attention to if you get the file from someone else.
Next question, can we go the other way with this? The answer is yes. We just take the Epoch time stamp from Wireshark run it thru our handy dandy time converter and get the hex bytes out to search for them in the hex editor.
So now I have a way to navigate back and forth from the hex editor to Wireshark.
So where does the malware part come in at? Well it may be possible if the User-Agent used for known traffic could help narrow it down, out of the many packets of traffic that may be malware. We can also use this information to investigate why a packet didn’t display correctly or was messed up in some way. Other than that I will have to do more research to see what else I can do with this information.
And for those that are curious, just how many times was that string found in the file.
Using another program I wrote that only counts strings it totaled 789 but using my new program that also looks for end bytes it found (after fixing another bug, duplicates).
Here we see it found 785.
Why the difference in the count ? It is because my program is searching for the hex bytes for “User-Agent” and not a string search that is case in-sensitive where it would find both the upper and lower case versions of the string.
Using a case sensitive search for the string ”user-agent” in the hex editor will land us at offset 2E1DAF which we can then look up the time stamp, convert it and find the packet in Wireshark at packet/frame number 4768. We see it is not for the actual U-A string but for.
“Access-Control-Allow-Headers: origin, content-type, accept, authorization, user-agent\r\n”
So we are not finding those with the tools but would be a quick search in the hex editor using case sensitive search with the lower case string.
I hope this information is helpful and useful to those that read it.
If anyone would like a zipped copy of my tools with the file extension mangled so it can hopefully pass thru email filters, email me at pcsxcetra [at] consolidated [dot] net .
Let me know which article the tools are from so I send the right ones.
As with any “Free tools” they are as is with out any warranty and if you find a way to break something with then its your fault.