24
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Graph-based phishing detection: URLGBM model driven by machine learning

ORCID Icon, &
Received 08 Jul 2023, Accepted 09 Apr 2024, Published online: 18 Apr 2024
 

Abstract

Phishing attacks are a form of social engineering that involves the transmission of deceptive communications imitating a trustworthy source to deceive users into revealing confidential information. Antiphishing systems have made significant strides in recent years, allowing internet users to secure confidential and private information against such attacks. In this study, we propose a URL graph representation based on a random walk algorithm, specifically PageRank, for weighting URL tokens. To create the graph, an imaginary walker visits the URL tokens one at a time and assigns a value to each token based on the probability of encountering the target URL during the walk. We studied different random walk (rw) variations and their effects on the URL string. The BM25 algorithm was employed to produce a sparse matrix for the classification task from the token scores obtained. Experiments conducted with logistic regression revealed that the proposed model achieved an accuracy of 98.98%, a false alarm rate of 1.72%, and a missing alarm rate of 0.302%. The model also attained a 97.17% accuracy on a benchmark dataset.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The datasets, designated as D1 and D2, utilized in our study on the detection of phishing attacks, are conveniently accessible through the following links: https://github.com/ebubekirbbr/pdd/tree/master/input, https://sites.google.com/view/url-phishing-detection/main

Notes

Additional information

Notes on contributors

Abdelali Elkouay

Elkouay Abdelali is currently a PhD student at University of Chouaib Doukkali, EL Jadida, Morocco. His research interests include Cybersecurity, Data mining, Machine Learning and Natural Language Processing (NLP).

Najem Moussa

Moussa Najem is a full Professor at Department of Computer Science, Faculty of Science, Mohammed V University, Morocco. He obtained his PhD in Statistical Physics from Mohammed V University in 1998. His areas of interest include Machine learning, Intelligent transportation systems, Wireless and Sensor Networks, Epidemic and worm propagation.

Abdallah Madani

Abdellah Madani is currently a Professor and PhD Tutor in Department of Computer Science, Chouaib Doukkali University, Faculty of Sciences, El Jadida, Morocco. His main research interests include optimization algorithms, text mining, traffic flow and modeling platforms. He is the author of many research papers published at conference proceedings and international journals.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 288.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.