Abstract
Phishing attacks are a form of social engineering that involves the transmission of deceptive communications imitating a trustworthy source to deceive users into revealing confidential information. Antiphishing systems have made significant strides in recent years, allowing internet users to secure confidential and private information against such attacks. In this study, we propose a URL graph representation based on a random walk algorithm, specifically PageRank, for weighting URL tokens. To create the graph, an imaginary walker visits the URL tokens one at a time and assigns a value to each token based on the probability of encountering the target URL during the walk. We studied different random walk (rw) variations and their effects on the URL string. The BM25 algorithm was employed to produce a sparse matrix for the classification task from the token scores obtained. Experiments conducted with logistic regression revealed that the proposed model achieved an accuracy of 98.98%, a false alarm rate of 1.72%, and a missing alarm rate of 0.302%. The model also attained a 97.17% accuracy on a benchmark dataset.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data availability statement
The datasets, designated as D1 and D2, utilized in our study on the detection of phishing attacks, are conveniently accessible through the following links: https://github.com/ebubekirbbr/pdd/tree/master/input, https://sites.google.com/view/url-phishing-detection/main
Notes
Additional information
Notes on contributors
Abdelali Elkouay
Elkouay Abdelali is currently a PhD student at University of Chouaib Doukkali, EL Jadida, Morocco. His research interests include Cybersecurity, Data mining, Machine Learning and Natural Language Processing (NLP).
Najem Moussa
Moussa Najem is a full Professor at Department of Computer Science, Faculty of Science, Mohammed V University, Morocco. He obtained his PhD in Statistical Physics from Mohammed V University in 1998. His areas of interest include Machine learning, Intelligent transportation systems, Wireless and Sensor Networks, Epidemic and worm propagation.
Abdallah Madani
Abdellah Madani is currently a Professor and PhD Tutor in Department of Computer Science, Chouaib Doukkali University, Faculty of Sciences, El Jadida, Morocco. His main research interests include optimization algorithms, text mining, traffic flow and modeling platforms. He is the author of many research papers published at conference proceedings and international journals.