Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion

Authors

DOI:

https://doi.org/10.26439/interfases2023.n017.6361

Keywords:

face recognition, RESNET-50, VGG-16, Vision Transformer, Swin Transformer

Abstract

Face recognition has become relevant in the search for non-physical contact solutions in enclosed spaces for identity verification in the context of the SARS-CoV-2 pandemic. One of the challenges of face recognition is mask occlusion which hides more than 50 % of the face. This research evaluated four models pre-trained by transfer learning: VGG-16, RESNET-50, Vision Transformer (ViT), and Swin Transformer, trained on their upper layers with a proprietary dataset. The analysis obtained an accuracy of 24 % (RESNET-50), 25 % (VGG-16), 96 % (ViT), and 91 % (Swin) with unmasked subjects. While with a mask, accuracy was 32 % (RESNET-50), 53 % (VGG-16), 87 % (ViT), and 61 % (Swin). These percentages indicate that modern architectures such as the Transformers perform better in mask recognition than the CNNs (VGG-16 and RESNET-50). The contribution of the research lies in the experimentation with two types of architectures: CNNs and Transformers, as well as the creation of the public dataset shared with the scientific community. This work strengthens the state of the art of computer vision in face recognition by mask occlusion by illustrating with experiments the variation of accuracy with different scenarios and architectures.

Downloads

Download data is not yet available.

Author Biographies

  • Brenda Xiomara Tafur Acenjo, Universidad de Lima, Lima, Peru

    Bachiller en Ingeniería de Sistemas por la Universidad de Lima, especializada en sistemas de información y estrategia de contenidos. Ha realizado investigación en el extranjero en temas de machine learning y lenguaje de señas peruano. Su interés de investigación se centra en visión computacional y nuevas formas de uso de la inteligencia artificial para un impacto positivo en la sociedad.

  • Martin Alexis Tello Pariona, Universidad de Lima, Lima, Peru

    Egresado de la Carrera de Ingeniería de Sistemas por la Universidad de Lima, cuenta con una especialización en Sistemas de Información. En esta casa de estudios, destacó por ser fundador del círculo de estudios CEADA, con el que ha participado en eventos internacionales de programación competitiva. Tiene experiencia laboral como RPA Developer en EY. Su interés como investigador se centra en la inteligencia artificial y la ciberseguridad.

  • Edwin Jhonatan Escobedo Cárdenas, Universidad de Lima, Lima, Peru

    Magíster y doctorado en Ciencia de la Computación por la Universidade Federal de Ouro Preto, Brasil. Bachiller en Ciencias de la Computación e Ingeniería Informática por la Universidad Nacional de Trujillo. Actualmente, es docente en la Universidad de Lima en la Carrera de Ingeniería de Sistemas. Investigador RENACYT. Sus áreas de interés son la visión computacional, el machine learning y la ciencia de datos.

References

Cheng, P., & Pan, S. (2022). Learning from face recognition under occlusion. En 2022 International Conference on Big Data, Information and Computer Network (BDICN) (pp. 721-727). IEEE. https://doi.org/10.1109/BDICN55575.2022.00140

Damer, N., Grebe, J. H., Chen, C., Boutros, F., Kirchbuchner, F., & Kuijper, A. (2020). The effect of wearing a mask on face recognition performance: An exploratory study. BIOSIG 2020 - Proceedings of the 19th International Conference of the Biometrics Special Interest Group, agosto. https://dl.gi.de/server/api/core/bitstreams/c3e8ae49-dde1-4b80-ad18-3d3536b1897b/content

Hariri, W. (2022). Efficient masked face recognition method during the COVID-19 pandemic. Signal, Image and Video Processing, 16(3), 605-612. https://doi.org/10.1007/s11760-021-02050-w

Laxminarayanamma, K., Deepthi, V., Ahmed, M. F., & Sowmya, G. (2021). A real time robust facial recognition model for masked face images using machine learning model. En 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA) (pp. 769-774). IEEE. https://doi.org/10.1109/ICECA52323.2021.9675936

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. En Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 10012-10022). IEEE. https://doi.org/10.1109/ICCV48922.2021.00986

Mandal, B., Okeukwu, A., & Theis, Y. (2021). Masked face recognition using RESNET-50. arXiv:2104.08997. https://doi.org/10.48550/arXiv.2104.08997

Meena, M. K., & Meena, H. K. (2022). A literature survey of face recognition under different occlusion conditions. En 2022 IEEE Region 10 Symposium (TENSYMP) (pp. 1-6). IEEE. https://doi.org/10.1109/TENSYMP54529.2022.9864502

Sáez Trigueros, D. S., Meng, L., & Hartnett, M. (2018). Enhancing convolutional neural networks for face recognition with occlusion maps and batch triplet loss. Image and Vision Computing, 79, 99-108. https://doi.org/10.1016/j.imavis.2018.09.011

Tran, C. P., Vu, A. K. N., & Nguyen, V. T. (2022). Baby learning with vision transformer for face recognition. En 2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR) (pp. 1-6). IEEE. https://doi.org/10.1109/MAPR56351.2022.9924795

Wang, Z., Huang, B., Wang, G., Yi, P., & Jiang, K. (2023). Masked face recognition dataset and application. IEEE Transactions on Biometrics, Behavior, and Identity Science, 5(2), 298-304. https://doi.org/10.1109/TBIOM.2023.3242085

Wu, Z., Shen, C., & Van Den Hengel, A. (2019). Wider or deeper: Revisiting the RESNET model for visual recognition. Pattern Recognition, 90, 119-133. https://doi.org/10.1016/j.patcog.2019.01.006

Yanai, K., & Kawano, Y. (2015). Food image recognition using deep convolutional network with pre-training and fine-tuning. En 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) (pp. 1-6). IEEE. https://doi.org/10.1109/ICMEW.2015.7169816

Zhong, Y., & Deng, W. (2021). Face transformer for recognition. arXiv:2103.14803. https://doi.org/10.48550/arXiv.2103.14803

Published

2023-07-31

Issue

Section

Research papers

How to Cite

Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion. (2023). Interfases, 17(017), 56-78. https://doi.org/10.26439/interfases2023.n017.6361

Most read articles by the same author(s)