Recognition model of variables that influence the performance of RDBMS transactions applying PCA and PCR

Authors

  • José Luis Ponce Vergara Universidad Nacional de Ingeniería, Lima, Perú

DOI:

https://doi.org/10.26439/ciis2021.5582

Keywords:

RDBMS, Statistical Machine Learning Algorithm, Principal Component Analysis, PCA, Principal Components Regression, PCR, SQL Performanc, Performance Management

Abstract

Knowledge of the factors that influence the efficiency of a system is essential for its administration and maintenance. Likewise, various organizations support their operations with applications that interact with a Relational Database Management System (RDBMS), which can improve their efficiency through knowledge of the factors that influence the performance of SQL statement executions. That makes up the workload, especially the workloads generated by applications implemented in Production environments that show recurrence over time. The present research article proposes a model for the recognition of factors that
affect the performance of the executions of the SQL statements that are processed in an RDBMS and discusses the implementation of a performance metric prediction technique, using algorithms of statistical machine learning called Principal Component Analysis (PCA) and Principal Components Regression (PCR), which exploit the information of the plans, statistics, and metrics generated during the life cycle of the executions of the SQL sentences.

Downloads

Download data is not yet available.

Author Biography

  • José Luis Ponce Vergara, Universidad Nacional de Ingeniería, Lima, Perú

    Candidato a doctor en Ingeniería de Sistemas por la Universidad Nacional de Ingeniería.Magíster en Ingeniería de Sistemas por la Universidad Nacional de Ingeniería. Ingeniero Industrial CIP de la Universidad Nacional de Ingeniería (primer puesto de la promoción “Waldo Rodríguez Franco”). Estudios de maestría en Telecomunicaciones. Desempeño en planeamiento y gestión de tecnología de información, modelamiento y desarrollo de sistemas de información, así como administración de plataformas y software en importantes empresas privadas, tales como Grupo CARSA, Banco Wiese, IBM, Axcess Financial y Global Business Solutions. Cuenta con varias certificaciones internacionales de software, entre ellas las de IBM (RUP, SOA, Base de Datos), Oracle (Big Data, Cloud Computing, Base de Datos) e ISACA (Auditoría Informática). Profesor de pregrado y posgrado en la Universidad Nacional de Ingeniería y la Universidad Pedro Ruiz Gallo. Ha publicado artículos de investigación en congresos de la Universidad de Lima y la Universidad Nacional Mayor de San Marcos.

References

Anderson, M. R., y Cafarella, M. (2016). Input Selection for Fast Feature Engineering. 2016 IEEE 32nd International Conference on Data Engineering (ICDE), 577-588. https://doi.org/10.1109/ICDE.2016.7498272

Badrinath Krishna, V., Weaver, G. A., y Sanders, W. H. (2015). PCA-Based Method for Detecting Integrity Attacks on Advanced Metering Infrastructure. En J. Campos y B. Haverkort (Eds.), Quantitative Evaluation of Systems: 12th International Conference, QEST 2015 (pp. 70-85). Springer, Cham. https://doi.org/10.1007/978-3-319-22264-6_5

Bontempi, G., y Kruijtzer, W. (2002). A Data Analysis Method for Software Performance Prediction. Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition, 971-976. https://doi.org/10.1109/DATE.2002.998417

Brauckhoff, D., Salamatian, K., y May, M. (2009). Applying PCA for Traffic Anomaly Detection: Problems and Solutions. IEEE INFOCOM 2009, 2866-2870. https://doi.org/10.1109/infcom.2009.5062248

De, P., Sinha, A. P., y Vessey, I. (2001). An Empirical Investigation of Factors Influencing Object-Oriented Database Querying. Information Technology and Management, 2, 71-93. https://doi.org/10.1023/A:1009934820999

Duggan, J., Cetintemel, U., Papaemmanouil, O., y Upfal, E. ( Junio del 2011). Performance Prediction for Concurrent Database Workloads. Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD '11), 337-348. https://doi.org/10.1145/1989323.1989359

Fortier, P. J., y Michel, H. E. (2003). Computer Systems Performance Evaluation and Prediction, Digital Press.

Ganapathi, A., Kuno, H., Dayal, U., Wiener, J. L., Fox, A., Jordan, M., y Patterson, D. (Marzo del 2009). Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. 2009 IEEE 25th International Conference on Data Engineering, 592-603. https://doi.org/10.1109/icde.2009.130

Ganapathi, A. S., Kuno, H. A., y Dayal, U. (2015). Predicting Performance of Multiple Queries Executing in a Database (US 9,189,523 B2). United States Patent and Trademark Office.

Giusto, P., Martin, G., y Harcourt, E. (2001). Reliable Estimation of Execution Time of Embedded Software. Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001, 580-588. https://doi.org/10.1109/DATE.2001.915082

Hadi, A. S., y Ling, R. F. (1998). Some Cautionary Notes on the Use of Principal Components Regression. The American Statistician, 52(1), 15-19. https://doi.org/10.1080/00031305.1998.10480530

James, G., Witten, D., Hastie, T., y Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. Springer.

Kleinrock, L. (1976). Queuing Systems, Volume II: Computer Applications. John Wiley & Sons. Lam, H. T., Thiebaut, J. M., Sinn, M., Chen, B., Mai, T., y Alkan, O. (2017). One Button Machine for Automating Feature Engineering in Relational Databases. arXiv. http://arxiv.org/abs/1706.00327

Lee, H., Park, Y. M., y Lee, S. (2015). Principal Component Regression by Principal Component Selection. Communications for Statistical Applications and Methods, 22(2), 173-180. https://doi.org/10.5351/CSAM.2015.22.2.173

Mikolajczyk, Y., y Schmid, C. (2005). A Performance Evaluation of Local Descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615-1630.

Panda, R., Erb, C., Lebeane, M., Ryoo, J. H., y John, L. K. (Octubre del 2015). Performance Characterization of Modern Databases on Out-of-Order CPUs. 2015 27th International Symposium on Computer Architecture and High Performance Computing, 114-121. https://doi.org/10.1109/SBAC-PAD.2015.31

Schkolnick, M., y Tiberio, P. (1985). Estimating the Cost of Updates in a Relational Database. ACM Transactions on Database Systems (TODS), 10(2), 163-179. https://doi.org/10.1145/3857.3863

Shawe-Taylor, J. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press. https://doi.org/10.1017/CBO9780511809682

Shyu, M. L., Chen, S. C., Sarinnapakorn, K., y Chang, L. (2003). A Novel Anomaly Detection Scheme Based on Principal Component Classifier. Defense Technical Information Center. https://apps.dtic.mil/sti/citations/ADA465712

Smith, C. U., y Williams, L. G. (2000). Performance and Scalability of Distributed Software Architectures. An SPE approach. Parallel and Distributed Computing Practices, 3(4).

Woodside, M., Franks, G., y Petriu, D. C. (Mayo del 2007). The future of Software Performance Engineering. Future of Software Engineering (FOSE'07), 171-187. https://doi.org/10.1109/FOSE.2007.32

Yu, P. S., Chen, M. S., Heiss, H. U., y Lee, S. (1992). On Workload Characterization of Relational Database Environments. IEEE Transactions on Software Engineering, 18(4), 347-355. https://doi.org/10.1109/32.129222

Downloads

Published

2021-12-22

How to Cite

Recognition model of variables that influence the performance of RDBMS transactions applying PCA and PCR . (2021). Actas Del Congreso Internacional De Ingeniería De Sistemas, 137-158. https://doi.org/10.26439/ciis2021.5582