Comparative between RESNET-50, VGG-16, Vision Transformer y Swin Transformer for facial recognition with mask occlusion
DOI:
https://doi.org/10.26439/interfases2023.n017.6361Keywords:
face recognition, RESNET-50, VGG-16, Vision Transformer, Swin TransformerAbstract
Face recognition has become relevant in the search for non-physical contact solutions in enclosed spaces for identity verification in the context of the SARS-CoV-2 pandemic. One of the challenges of face recognition is mask occlusion which hides more than 50 % of the face. This research evaluated four models pre-trained by transfer learning: VGG-16, RESNET-50, Vision Transformer (ViT), and Swin Transformer, trained on their upper layers with a proprietary dataset. The analysis obtained an accuracy of 24 % (RESNET-50), 25 % (VGG-16), 96 % (ViT), and 91 % (Swin) with unmasked subjects. While with a mask, accuracy was 32 % (RESNET-50), 53 % (VGG-16), 87 % (ViT), and 61 % (Swin). These percentages indicate that modern architectures such as the Transformers perform better in mask recognition than the CNNs (VGG-16 and RESNET-50). The contribution of the research lies in the experimentation with two types of architectures: CNNs and Transformers, as well as the creation of the public dataset shared with the scientific community. This work strengthens the state of the art of computer vision in face recognition by mask occlusion by illustrating with experiments the variation of accuracy with different scenarios and architectures.
Downloads
References
Cheng, P., & Pan, S. (2022). Learning from face recognition under occlusion. En 2022 International Conference on Big Data, Information and Computer Network (BDICN) (pp. 721-727). IEEE. https://doi.org/10.1109/BDICN55575.2022.00140
Damer, N., Grebe, J. H., Chen, C., Boutros, F., Kirchbuchner, F., & Kuijper, A. (2020). The effect of wearing a mask on face recognition performance: An exploratory study. BIOSIG 2020 - Proceedings of the 19th International Conference of the Biometrics Special Interest Group, agosto. https://dl.gi.de/server/api/core/bitstreams/c3e8ae49-dde1-4b80-ad18-3d3536b1897b/content
Hariri, W. (2022). Efficient masked face recognition method during the COVID-19 pandemic. Signal, Image and Video Processing, 16(3), 605-612. https://doi.org/10.1007/s11760-021-02050-w
Laxminarayanamma, K., Deepthi, V., Ahmed, M. F., & Sowmya, G. (2021). A real time robust facial recognition model for masked face images using machine learning model. En 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA) (pp. 769-774). IEEE. https://doi.org/10.1109/ICECA52323.2021.9675936
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. En Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 10012-10022). IEEE. https://doi.org/10.1109/ICCV48922.2021.00986
Mandal, B., Okeukwu, A., & Theis, Y. (2021). Masked face recognition using RESNET-50. arXiv:2104.08997. https://doi.org/10.48550/arXiv.2104.08997
Meena, M. K., & Meena, H. K. (2022). A literature survey of face recognition under different occlusion conditions. En 2022 IEEE Region 10 Symposium (TENSYMP) (pp. 1-6). IEEE. https://doi.org/10.1109/TENSYMP54529.2022.9864502
Sáez Trigueros, D. S., Meng, L., & Hartnett, M. (2018). Enhancing convolutional neural networks for face recognition with occlusion maps and batch triplet loss. Image and Vision Computing, 79, 99-108. https://doi.org/10.1016/j.imavis.2018.09.011
Tran, C. P., Vu, A. K. N., & Nguyen, V. T. (2022). Baby learning with vision transformer for face recognition. En 2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR) (pp. 1-6). IEEE. https://doi.org/10.1109/MAPR56351.2022.9924795
Wang, Z., Huang, B., Wang, G., Yi, P., & Jiang, K. (2023). Masked face recognition dataset and application. IEEE Transactions on Biometrics, Behavior, and Identity Science, 5(2), 298-304. https://doi.org/10.1109/TBIOM.2023.3242085
Wu, Z., Shen, C., & Van Den Hengel, A. (2019). Wider or deeper: Revisiting the RESNET model for visual recognition. Pattern Recognition, 90, 119-133. https://doi.org/10.1016/j.patcog.2019.01.006
Yanai, K., & Kawano, Y. (2015). Food image recognition using deep convolutional network with pre-training and fine-tuning. En 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) (pp. 1-6). IEEE. https://doi.org/10.1109/ICMEW.2015.7169816
Zhong, Y., & Deng, W. (2021). Face transformer for recognition. arXiv:2103.14803. https://doi.org/10.48550/arXiv.2103.14803
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under an Attribution 4.0 International (CC BY 4.0) License. that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Last updated 03/05/21
