Character and Word Embeddings for Phishing Email Detection

Authors

  • Nikola Stevanović Faculty of Sciences and Mathematics, University of Niš, 18000 Niš, Serbia

DOI:

https://doi.org/10.31577/cai_2022_5_1337

Keywords:

Cybersecurity, deep learning, phishing attack, phishing email detection, word embedding

Abstract

Phishing attacks are among the most common malicious activities on the Internet. During a phishing attack, cybercriminals present themselves as a trusted organization or individual. Their goal is to lure people to enter their private information, such as passwords and bank card numbers, while believing that nothing malicious is happening. The attack often starts with a phishing email, which is an email that is very similar to a legitimate email, but usually contains links to malicious websites or uses some other techniques to mislead victims. To prevent phishing attacks, it is crucial to detect phishing emails and remove them from email inbox folders. In this paper, a neural network based phishing email detection model is proposed. In comparison to some earlier approaches, our model does not use manually engineered input features. It learns character and word embeddings directly from email texts, and uses them to extract local and global features using convolutional and recurrent layers, respectively. Our model is tested on the two commonly used datasets for phishing email detection, the SpamAssassin Public Corpus and Nazario Phishing Corpus, and it achieves an accuracy of 99.81 % and F_1-score of 99.74 %, which is on par or better than the current state-of-the-art approaches.

Downloads

Download data is not yet available.

Downloads

Published

2022-12-31

How to Cite

Stevanović, N. (2022). Character and Word Embeddings for Phishing Email Detection. Computing and Informatics, 41(5), 1337–1357. https://doi.org/10.31577/cai_2022_5_1337