EmailProfiler: Spearphishing Filtering with Header and Stylometric Features of Emails

Sevtap Duman, Kubra Kalkan, Manuel Egele, William Robertson, Engin Kirda
In Proceedings of the IEEE Conferences on Computers, Software, and Applications (COMPSAC)

anomaly detection machine learning spearphishing

Spearphishing is a prominent targeted attack vector in today’s Internet. By impersonating trusted email senders through carefully crafted messages and spoofed metadata, adversaries can trick victims into launching attachments containing malicious code or into clicking on malicious links that grant attackers a foothold into otherwise well-protected networks. Spearphishing is effective because it is fundamentally difficult for users to distinguish legitimate emails from spearphishing emails without additional defensive mechanisms. However, such mechanisms, such as cryptographic signatures, have found limited use in practice due to their perceived difficulty of use for normal users.

In this paper, we present a novel automated approach to defending users against spearphishing attacks. The approach first builds probabilistic models of both email metadata and stylometric features of email content. Then, subsequent emails are compared to these models to detect characteristic indicators of spearphishing attacks. Several instantiations of this approach are possible, including performing model learning and evaluation solely on the receiving side, or senders publishing models that can be checked remotely by the receiver. Our evaluation of a real data set drawn from 20 email users demonstrates that the approach effectively discriminates spearphishing attacks from legitimate email while providing significant ease-of-use benefits over traditional defenses.