Random Forests as an extension of the classification trees with the R and Python programs
DOI:
https://doi.org/10.26439/interfases2017.n10.1775Keywords:
Random Forest, classification trees, non-parametric classification models, supervised learning, R language, Python languageAbstract
This article presents the application of the non-parametric Random Forest method through supervised learning, as an extension of classification trees. The Random Forest algorithm arises as the grouping of several classification trees. Basically it randomly selects a number of variables with which each individual tree is constructed and predictions are made with these variables that will later be weighted through the calculation of the most voted class of these trees that were generated, to finally do the prediction by Random Forest. For the application, we worked with 3168 recorded voices, for which the results of an acoustic analysis are presented, registering variables such as frequency, spectrum, modulation, among others, seeking to obtain a pattern of identification and classification according to gender through a voice identifier. The data record used is in open access and can be downloaded from the Kaggle web platform via <https://www.kaggle.com/primaryobjects/voicegende>r. For the development of the algorithm’s model, the statistical program R was used. Additionally, applications were made with Python by the development of classification algorithms.
Downloads
References
Ali, J., Khan, R., Ahmad, N., y Maqsood, I. (2012). Random forests and decision trees. IJCSI International Journal of Computer Science Issues, 9(5), 272-278. Recuperado de http://ijcsi.org/papers/IJCSI9-5-3-272-278.pdf
Alpaydin, E. (2010). Introduction to machine learning (2.a ed.). Massachusetts, Estados Unidos: MIT Press.
Breiman, L., Friedman, J., Stone, C., y Olshen, R. (1984). Classification and regression trees. California, Estados Unidos: Wadsworth, Inc.
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. DOI:10.1023/A:1010933404324
Freund, Y., y Schapire, R. (1996). Experiments with a New Boosting Algorithm. En Thirteenth
International Conference on Machine Learning, 148-156. Recuperado de https://webcourse.cs.technion.ac.il/236756/Spring2009/ho/WCFiles/FruendSchapireAdaboostExperiments.pdf
Hastie, T., Friedman, J., y Tibshirani, R. (2001). The Elements of Statistical Learning. Nueva York, Estados Unidos: Springer New York. DOI:10.1007/978-0-387-21606-5
James, G., Witten, D., Hastie, T., y Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. Nueva York, Estados Unidos: Springer New York / Heidelberg Dordrecht London. DOI:10.1007/978-1-4614-7138-7
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under an Attribution 4.0 International (CC BY 4.0) License. that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Last updated 03/05/21