A comparison of machine learning techniques for detection of phishing websites
DOI:
https://doi.org/10.26439/interfases2020.n013.4886Keywords:
Anti-Phishing, Machine Learning, Cibersecurity, Phishing Warning, Phishing, CiberattackAbstract
Phishing is the theft of personal data through fake websites. Victims of this type of theft are directed to a fake website, where they are asked to enter their data to validate their identity. At that moment, theft is carried out, since entered data are stored and used by the hacker responsible for said attack to sell them or enter to websites and perform a fraud or scam. In order to conduct this work, we researched different methods for detecting phishing websites by using machine learning techniques. Thus, the purpose of this work is to compare machine learning techniques that have demonstrated to be the most effective methods to detect phishing websites. The results show that decision tree classifiers such as Decision Tree and Random Forest have achieved the highest accuracy and efficacy rates, with values between 97% and 99%, in detecting these types of websites.
Downloads
References
Abdelhamid, N., Thabtah, F. y Abdel-jaber, H. (2017). Phishing Detection: A Recent Intelligent machine learning Comparison based on Models Content and Features. IEEE Explorer, 6. doi:10.1109/ISI.2017.8004877
Abu-Nimeh, S., Nappa, D., Wang, X. y Nair, S. (2007). A Comparison of machine learning Techniques for Phishing Detection. ACM Digital Library, 10. doi:10,.1145/1299015.1299021
Al-Janabi, M., De Quincey, E. y Andras, P. (2017). Using Supervised Machine Learning Algorithms to Detect Suspicious URLs in Online Social Networks. ACM Digital Library, 8. doi:10.1145/3110025.3116201
Bulakh, V. y Gupta, M. (2016). Countering Phishing from Brands’ Vantage Point. ACM Digital Library, 8. doi:10.1145/2875475,2875478 Campo, D. (20 de noviembre de 2017). MachineLearningPhishing. GitHub. Recuperado de https://github.com/diegoocampoh/MachineLearningPhishing
Chen, T.-C., Dick, S. y Miller, J. (2010). Detecting Visually Similar Web Pages: Application to Phishing Detection. ACM Digital Library, 38. doi:10.1145/3282373.3282422
Chiew, K. L., Tan, C. L., Wong, K. S., Yong, K. S. y Tiong, W. K. (2019). A New Hybrid Ensemble Feature Selection Framework for Machine Learning-Based Phishing Detection System. Science Direct, 14. doi:10.1016/j.ins2019.01.064
Cuzzocrea, A., Martinelli, F., y Mercaldo, F. (2018). Applying Machine Learning Techniques to Detect and Analyze Web Phishing Attacks. ACM Digital Library, 5. doi:10,1145/3282373,3282422
ESET Security Report Latinoamérica 2017. (2017). Recuperado de https://www.welivesecurity.com/wpcontent/uploads/2017/04/eset-security-report-2017.pdf
Hota, H. S., Shrivas, A. K. y Hota, R. (2018). An Ensemble Model for Detecting Phishing Attack with Proposed Remove-Replace Feature Selection Technique. Science Direct, 8. doi:10.1016/j.procs.2018.05.103
Islam Mamun, M. S., Rathore, M. A., Lashkari, A. H., Stakhanova, N. y Ghorbani, A. A. (2016). Detecting Malicious URLs Using Lexical Analysis. Springer Link, 16. doi:10,1007/978-3-319-46298-1_30
Jain, A. K. y Gupta, B. B. (2016). A novel Approach to Protect against Phishing Attacks at Client Side Using Auto-Updated White-List. Springer Open, 11. doi:10.1186/ s13635-016-0034-3
Mao, J., Bian, J., Tian, W., Zhu, S., Wei, T., Li, A., y Liang, Z, (2018), Detecting Phishing Websites via Aggregation Analysis of Page Layouts. Science Direct, 7, doi:10,1016/j,procs,2018,03,053
Medvet, E., Kirda, E. y Kruegel, C. (2008). Visual-Similarity-Based Phishing Detection. ACM Digital Library, 6. doi:10.1145/1460877.1460905
Mitchell, T. M. (1997). Machine Learning. New York: McGraw-Hill Science.
Mourtaji, Y., Bouhorma, P. y Alghazzawi, P. (2017). Perception of a New Framework for Detecting Phishing Web Pages. ACM Digital Library, 6. doi:10.1145/3175628.3175633
Rajab, M. (2018). An Anti-Phishing Method based on Feature Analysis. ACM Digital Library, 7. doi:10.1145/3184066.3184082
Sanglerdsinlapachai, N. y Rungsawang, A. (2010). Web Phishing Detection Using Classifier Ensemble. ACM Digital Library, 6. doi:10.1145/1967486,1967521
Tan, C. L. (2018). Phishing Dataset for Machine Learning: Feature Evaluation. Mendeley. doi:10.17632/h3cgnj8hft.1
URL dataset (ISCX-URL-2016). (2016). UNB. Recuperado de https://www.unb.ca/cic/datasets/url-2016.html
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under an Attribution 4.0 International (CC BY 4.0) License. that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Last updated 03/05/21