Random Forests as an extension of the classification trees with the R and Python programs

Authors

  • Rosa Fátima Medina-Merino Universidad de Lima (Perú)
  • Carmen Ismelda Ñique-Chacón Instituto Nacional de Estadística e Informática.(Perú)

DOI:

https://doi.org/10.26439/interfases2017.n10.1775

Keywords:

Random Forest, classification trees, non-parametric classification models, supervised learning, R language, Python language

Abstract

This article presents the application of the non-parametric Random Forest method through supervised learning, as an extension of classification trees. The Random Forest algorithm arises as the grouping of several classification trees. Basically it randomly selects a number of variables with which each individual tree is constructed and predictions are made with these variables that will later be weighted through the calculation of the most voted class of these trees that were generated, to finally do the prediction by Random Forest. For the application, we worked with 3168 recorded voices, for which the results of an acoustic analysis are presented, registering variables such as frequency, spectrum, modulation, among others, seeking to obtain a pattern of identification and classification according to gender through a voice identifier. The data record used is in open access and can be downloaded from the Kaggle web platform via <https://www.kaggle.com/primaryobjects/voicegende>r. For the development of the algorithm’s model, the statistical program R was used. Additionally, applications were made with Python by the development of classification algorithms.

Downloads

Download data is not yet available.

References

Ali, J., Khan, R., Ahmad, N., y Maqsood, I. (2012). Random forests and decision trees. IJCSI International Journal of Computer Science Issues, 9(5), 272-278. Recuperado de http://ijcsi.org/papers/IJCSI9-5-3-272-278.pdf

Alpaydin, E. (2010). Introduction to machine learning (2.a ed.). Massachusetts, Estados Unidos: MIT Press.

Breiman, L., Friedman, J., Stone, C., y Olshen, R. (1984). Classification and regression trees. California, Estados Unidos: Wadsworth, Inc.

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. DOI:10.1023/A:1010933404324

Freund, Y., y Schapire, R. (1996). Experiments with a New Boosting Algorithm. En Thirteenth

International Conference on Machine Learning, 148-156. Recuperado de https://webcourse.cs.technion.ac.il/236756/Spring2009/ho/WCFiles/FruendSchapireAdaboostExperiments.pdf

Hastie, T., Friedman, J., y Tibshirani, R. (2001). The Elements of Statistical Learning. Nueva York, Estados Unidos: Springer New York. DOI:10.1007/978-0-387-21606-5

James, G., Witten, D., Hastie, T., y Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. Nueva York, Estados Unidos: Springer New York / Heidelberg Dordrecht London. DOI:10.1007/978-1-4614-7138-7

Downloads

Published

2017-12-18

Issue

Section

Dissemination papers

How to Cite

Random Forests as an extension of the classification trees with the R and Python programs. (2017). Interfases, 10(010), 165-189. https://doi.org/10.26439/interfases2017.n10.1775