Classification of Organisms into Kingdoms using DNA Codon Frequency

Authors

DOI:

https://doi.org/10.26439/interfases2022.n015.5896

Keywords:

machine learning, Ensembles, DNA codon frequency, kingdom

Abstract

This study aims to use machine learning classifiers to predict the kingdom to which an organism belongs by the frequency of use of DNA codons. The study used 13,028 data from GenBank organisms distributed in eleven kingdoms and reduced them to six kingdoms (archaea, bacteria, invertebrates, plants, viruses, and vertebrates) with 9,027 regrouped data. The process required cleaning irrelevant attributes, using measurement metrics of accuracy, precision, sensitivity, and score classifiers, and the adjustment of hyperparameters of the models. The classification algorithms were voting, bagging, boosting, and stacking, using KNN, AD, MLP, SVC, and RF. Random forest was used in selecting the attributes. The stacking ensemble, with its models, better predicts the classification of organisms in the present study.

Downloads

Download data is not yet available.

References

Khomtchouk, B. B. (2020). Codon usage bias levels predict taxonomic identity and genetic composition. BioRxiv. The Preprint Server for Biology. https://doi.org/10.1101/2020.10.26.356295.

Im, E.-H., & Choi, S. S. (2017). Synonymous codon usage controls various molecular aspects. Genomic & Informatics, 15(4), 123-127. https://doi.org/10.5808/GI.2017.15.4.123.

Nakamura, Y, Gojobori, T, & Ikemura, T. (2000). Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Research, 28(1), 292. https://doi.org/10.1093/nar/28.1.292

Parvathy, S. T., Udayasuriyan, V., & Bhadana, V. (2021). Codon usage bias. Molecular Biology Reports, 49, 539-565. https://doi.org/10.1007/s11033-021-06749-4

Sharp, P. M., Emery, L. R., & Zeng, K. (2010). Forces that influence the evolution of codon bias. Philosophical Transactions of the Royal Society B. Biological Sciences, 365(1544), 1203-1212. https://doi.org/10.1098/rstb.2009.0305

Wang, F.-P., & Li, H. (2009). Codon-pair usage and genome evolution. Gene, 433(1-2), 8-15. https://doi.org/10.1016/j.gene.2008.12.016

Published

2022-07-29

Issue

Section

Research papers

How to Cite

Classification of Organisms into Kingdoms using DNA Codon Frequency. (2022). Interfases, 15(015), 131-143. https://doi.org/10.26439/interfases2022.n015.5896