SMTP (Simple Mail Transfer Protocol) is one of the most popular protocols, which is used for electronic mail transfer. As it is very common and almost every computer user has at least one email address, SMTP traffic is full of unsolicited messages (spam). There are many negative aspects connected with the spam like excessive advertising, harassment and bandwidth consumption, just to name a few. Moreover, it is commonly used for phishing campaigns and malware spreading. In order to counteract, many spam detection methods were developed. These were mainly based on message contents (both text and files), blacklisting and network specific characteristics (for example frequency of TCP connections). Nowadays, huge amount of spam is transmitted by botnets. These can for example send spam directly to the destination mail server or by using open relay SMTP servers. In order to restrain botnets' spam campaigns, Gianluca Stringhini et. al. proposed novel spam mitigation method based on analysis of SMTP implementation (SMTP dialects). Although this idea is simple, it turned out to be very effective. Let us stop for a minute, as this moment seems to be perfect for the short SMTP reminder. In simple words, SMTP message transfer can be described as a conversation between client and the server. Some commands have to be exchanged between them, in order to deliver a message from the sender to the recipient. Typical process of the email transfer is presented below.
Every command has to be finished with
- EHLO (or HELO for not extended SMTP) – command used for client introduction.
- MAIL FROM – command used for sender specification.
- RCPT TO – command used for recipient specification.
- DATA – command used for declaration of the message.
Now, we have a little quiz for you – what would SMTP server do, if we would like to type “MAIL FROM” command in a different manner? Several examples of different notations have been provided below.
The answer is: it will accept every command. From the server's point of view, all of the above mentioned “MAIL FROM” commands are appropriate and they should be parsed properly. According to that, several conclusions concerning “MAIL FROM” command notation can be highlighted: it is case insensitive, at some points amount of whitespaces is irrelevant and some special characters are optional. We can easily notice that SMTP implementations may vary because of the simple change in command syntax. Moreover, some additional commands like “RSET” may appear during SMTP conversation and the commands order will turn out to be different. Another example is a usage of “QUIT” command, which is optional. Idea presented by Gianluca Stringhini was very simple. Collect SMTP dialects derived from the known MUA/MTA and collect dialects from the botnet samples. Then, analyze incoming SMTP conversations and check whether they match benign or malicious dialect. If observed dialect was known – mark message as trusted. If not, mark it as a spam. However, authors parsed dialects only until “DATA” command. Included example shows differences between dialect derived from the Mozilla Thunderbird and one of the botnets. Differences were marked with the red color.
At this point, it can be noticed that this solution is not only capable of spam detection. It may be also used for botnet fingerprinting. Having known the dialect used by specific botnet, we are able to map malicious conversation to the infected machine IP address.
We have decided to expand this method and proposed several extensions to the already developed solution:
- Extended dialect parsing to the whole SMTP conversations.
- Included parsing of SMTP extensions.
- Connected SMTP analysis with specific IMF fields.
One of the analyzed IMF field is “User-Agent” (and other equivalent fields, for instance "X-Mailer"). It helps to detect email clients spoofing during a message transfer, which can indicate malicious activities. For example, we have an incoming mail, where SMTP dialect matches Mozilla Thunderbird. Using standard method, this message would be categorized as a benign one. However, IMF shows that this message was sent using Microsoft Outlook client, thus signalizing attempt of spoofing (malicious message). Scheme for this analysis is presented with the following scheme.
Now, let us take a look into a real example. Some time ago, Sendsafe botnet was responsible for the malicious Hancitory/Pony eFax malspam campaign (some details concerning this campaign can be read in a great post from the malware-traffic-analysis.net). In the majority of the messages, Sendsafe claimed to be sending emails using Apple devices. Let us take a look into a details of the Sendsafe SMTP dialect and Apple devices dialect.
(eFax screenshot taken from the malware-traffic-analysis.net)
There are three main differences between these two dialects:
- Sendsafe uses not extended SMTP (HELO) together with the domain, whereas Apple devices use extended SMTP (EHLO) together with the IP address.
- Sendsafe places space character between “:” and “<”, whereas Apple devices do not.
- Sendsafe interrupts TCP session just after the message delivery, whereas Apple devices send QUIT command before the session ends.
Clearly, it is an example of the successful attempt of email client spoofing, which was spotted in the wild. Sendsafe dialect and Apple's one have significant differences.
To sum up, analysis of SMTP dialects can be used effectively for both spam mitigation and botnet fingerprinting. Our extensions for the already developed method have proven to even increase effectiveness of this analysis. During SISSDEN project, we have fully implemented above mentioned method with many mechanisms, which allows to perform spam detection and botnet fingerprinting operations online. Moreover, we have our own SMTP dialects database, which is constantly updated. In the future posts, we are going to posts some facts about spamming botnets fingerprinting and noticed spam campaigns.
Our paper about SMTP dialects analysis was published in IEEE Security & Privacy Digital Forensics. Some details and results can be found in there.