Underwater Plastic Waste Detection with YOLO and Vision Transformer Models

Authors

DOI:

https://doi.org/10.26439/interfases2025.n021.7868

Keywords:

object detection, deep learning, plastic waste, object detection model, underwater images

Abstract

This study addresses the global issue of marine pollution, with a particular focus on plastic bag contamination, by leveraging real-time object detection techniques powered by deep learning algorithms. A detailed comparison was carried out between the YOLOv8, YOLO-NAS, and RT-DETR models to assess their effectiveness in detecting plastic waste in underwater environments. The methodology encompassed several key stages, including data preprocessing, model implementation, and training through transfer learning. Evaluation was conducted using a simulated video environment,

followed by an in-depth comparison of the results. Performance assessment was based on critical metrics such as mean average precision (mAP), recall, and inference time. The YOLOv8 model achieved an mAP50 of 0.921 on the validation dataset, along with a recall of 0.829 and an inference time of 14.1 milliseconds. The YOLO-NAS model, by contrast, reached an mAP50 of 0.813, a higher recall of 0.903, and an inference time of 17.8 milliseconds. The RT-DETR model obtained an mAP50 of 0.887, a recall of 0.819, and an inference time of 15.9 milliseconds. Notably, despite not having the highest mAP, the RT-DETR model demonstrated superior detection performance when deployed in real-world underwater conditions, highlighting its robustness and potential for practical environmental monitoring.

Downloads

Download data is not yet available.

Author Biographies

  • Jonathan Bruce Cárdenas Rondoño, Universidad de Lima, Perú

    Ingeniero de Sistemas por la Universidad de Lima, con experiencia en desarrollo fullstack y análisis de tecnologías de la información. En el ámbito académico, se enfoca en temas relacionados con la visión computacional y el diseño de sistemas inteligentes

  • Ners Armando Vasquez Espinoza, Universidad de Lima, Perú

    Ingeniero de sistemas por la Universidad de Lima. Actualmente, se desarrolla en áreas de big data y nube. Interesado en actualizarse en áreas relacionadas a analítica, IA y machine learning.

  • Edwin Jonathan Escobedo Cárdenas, Universidad de Lima, Perú

    Doctor y magíster en Ciencias de la Computación por la Universidade Federal de Ouro Preto (Brasil). Bachiller en Ciencias de la Computación e Ingeniería Informática por la Universidad Nacional de Trujillo. Actualmente, se desempeña como docente en la carrera de Ingeniería de Sistemas de la Universidad de Lima y es investigador registrado en Renacyt. Sus áreas de interés incluyen la visión computacional, el aprendizaje automático
    (machine learning) y la ciencia de datos.

References

Casas, E., Ramos, L., Bendek, E., & Rivas-Echeverría, F. (2023). Assessing the effectiveness of YOLO architectures for smoke and wildfire detection. IEEE Access, 11, 96554–96583. https://doi.org/10.1109/access.2023.3312217

Conley, G., Zinn, S. C., Hanson, T., McDonald, K., Beck, N., & Wen, H. (2022). Using a deep learning model to quantify trash accumulation for cleaner urban stormwater. Computers, Environment and Urban Systems, 93, Article 101752. https://doi.org/10.1016/j.compenvurbsys.2021.101752

Deng, H., Ergu, D., Liu, F., Ma, B., & Cai, Y. (2021). An embeddable algorithm for automatic garbage detection based on complex marine environment. Sensors, 21(19), Article 6391. https://doi.org/10.3390/s21196391

Dhana Lakshmi, M., & Santhanam, S. M. (2020). Underwater image recognition detector using deep ConvNet. 2020 National Conference on Communications (NCC), Kharagpur, India, pp. 1–6. http://doi.org/10.1109/NCC48643.2020.9056058

Fulton, M., Hong, J., Islam, M. J., & Sattar, J. (2019). Robotic detection of marine litter using deep visual detection models. 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, pp. 5752–5758. https://doi.org/10.1109/ICRA.2019.8793975

Han, F., Yao, J., Zhu, H., & Wang, C. (2020). Underwater image processing and object detection based on deep CNN method. Journal of Sensors, 2020, Article 6707328, 1–20. https://doi.org/10.1155/2020/6707328

Hong, J., Fulton, M., & Sattar, J. (2020). Trashcan: A semantically-segmented dataset towards visual detection of marine debris. arXiv. https://doi.org/10.48550/arXiv.2007.08097

Kavitha, P. M., Anitha, M., Padhiari, S., & Anitha, K. (2022). Detection of trash in sea using deep learning. YMER Digital, 21(7), 817–822. https://doi.org/10.37896/ymer21.07/65

Li, Y., Fan, Q., Huang, H., Han, Z., & Gu, Q. (2023). A modified YOLOv8 detection network for UAV aerial image recognition. Drones, 7(5), Article 304. https://doi.org/10.3390/drones7050304

Maurício, J., Domingues, I., & Bernardino, J. (2023). Comparing vision transformers and convolutional neural networks for image classification: A literature review. Applied Sciences, 13(9), Article 5521. https://doi.org/10.3390/app13095521

Moorton, Z., Kurt, Z., & Woo, W. L. (2022). Is the use of deep learning an appropriate means to locate debris in the ocean without harming aquatic wildlife? Marine Pollution Bulletin, 118, Article 113853. https://doi.org/10.1016/j.marpolbul.2022.113853

Muksit, A. A., Hasan, F., Hasan Bhuiyan Emon, M. F., Haque, M. R., Anwary, A. R., & Shatabda, S. (2022). YOLO-Fish: A robust fish detection model to detect fish in realistic underwater environment. Ecological Informatics, 72, Article 101847. https://doi.org/10.1016/j.ecoinf.2022.101847

National Institute of Statistics and Informatics. (2022). Technical report: Environmental statistics for January 2022. https://cdn.www.gob.pe/uploads/document/file/3067334/Estad%C3%ADsticas%20Ambientales%3A%20Enero%202022.pdf?v=1651873783

Panwar, H., Gupta, P. K., Siddiqui, M. K., Morales-Menendez, R., Bhardwaj, P., Sharma, S., & Sarker, I. H. (2020). AquaVision: Automating the detection of waste in water bodies using deep transfer learning. Case Studies in Chemical and Environmental Engineering, 2, Article 100026. https://doi.org/10.1016/j.cscee.2020.100026

Reis, D., Kupec, J., Hong, J., & Daoudi, A. (2023). Real-time flying object detection with YOLOv8. arXiv. https://doi.org/10.48550/arXiv.2305.09972

Rizos, P., & Kalogeraki, V. (2021). Deep learning for underwater object detection. Proceedings of the 24th Pan-Hellenic Conference on Informatics (PCI ‘20), New York, USA, pp. 175–177. https://doi.org/10.1145/3437120.3437301

Teng, X., Fei, Y., He, K., & Lu, L. (2022). The object detection of underwater garbage with an improved YOLOv5 algorithm. Proceedings of the 2022 International Conference on Pattern Recognition and Intelligent Systems (PRIS ‘22), New York, USA, pp. 55–60. https://doi.org/10.1145/3549179.3549189

Terven, J., & Córdova-Esparza, D. M. (2023). A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLONAS. Machine Learning and Knowledge Extraction, 5(4), 1680–1716. https://doi.org/10.3390/make5040083

Uparkar, O., Bharti, J., Pateriya, R. K., Gupta, R. K., & Sharma, A. (2023). Vision transformer outperforms deep convolutional neural network-based model in classifying X-ray images. Procedia Computer Science, 218, 2338–2349. https://doi.org/10.1016/j.procs.2023.01.209

Wu, Y.-C., Shih, P.-Y., Chen, L.-P., Wang, C-.C., & Samani, H. (2020). Towards underwater sustainability using ROV equipped with deep learning system. 2020 International Automatic Control Conference (CACS), Hsinchu, Taiwan, pp. 1–5. http://doi.org/10.1109/CACS50047.2020.9289788

Xue, B., Huang, B., Wei, W., Chen, G., Li, H., Zhao, N., & Zhan, H. (2021a). An efficient deep-sea debris detection method using deep neural networks. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 12348–12360. http://doi.org/10.1109/JSTARS.2021.3130238

Xue, B., Huang, B., Chen, G., Li, H., & Wei, W. (2021b). Deep-sea debris identification using deep convolutional neural networks. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 8909–8921 http://doi.org/10.1109/JSTARS.2021.3107853

Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y., & Chen, J. (2023). DETRs beat YOLOs on real-time object detection. arXiv, 2304.08069. https://doi.org/10.48550/arXiv.2304.08069

Zhou, H., Huang, H., Yang, X., Zhang, L., & Qi, L. (2017). Faster R-CNN for marine organism detection and recognition using data augmentation. Proceedings of the International Conference on Video and Image Processing - ICVIP 2017. https://doi.org/10.1145/3177404.3177433

Downloads

Published

2025-07-31

Issue

Section

Research papers

How to Cite

Cárdenas Rondoño, J. B. ., Vasquez Espinoza, N. A., & Escobedo Cárdenas, E. J. (2025). Underwater Plastic Waste Detection with YOLO and Vision Transformer Models. Interfases, 021, 81-100. https://doi.org/10.26439/interfases2025.n021.7868