Recognition model of variables that influence the performance of RDBMS transactions applying PCA and PCR
DOI:
https://doi.org/10.26439/ciis2021.5582Keywords:
RDBMS, Statistical Machine Learning Algorithm, Principal Component Analysis, PCA, Principal Components Regression, PCR, SQL Performanc, Performance ManagementAbstract
Knowledge of the factors that influence the efficiency of a system is essential for its administration and maintenance. Likewise, various organizations support their operations with applications that interact with a Relational Database Management System (RDBMS), which can improve their efficiency through knowledge of the factors that influence the performance of SQL statement executions. That makes up the workload, especially the workloads generated by applications implemented in Production environments that show recurrence over time. The present research article proposes a model for the recognition of factors that
affect the performance of the executions of the SQL statements that are processed in an RDBMS and discusses the implementation of a performance metric prediction technique, using algorithms of statistical machine learning called Principal Component Analysis (PCA) and Principal Components Regression (PCR), which exploit the information of the plans, statistics, and metrics generated during the life cycle of the executions of the SQL sentences.
Downloads
References
Anderson, M. R., y Cafarella, M. (2016). Input Selection for Fast Feature Engineering. 2016 IEEE 32nd International Conference on Data Engineering (ICDE), 577-588. https://doi.org/10.1109/ICDE.2016.7498272
Badrinath Krishna, V., Weaver, G. A., y Sanders, W. H. (2015). PCA-Based Method for Detecting Integrity Attacks on Advanced Metering Infrastructure. En J. Campos y B. Haverkort (Eds.), Quantitative Evaluation of Systems: 12th International Conference, QEST 2015 (pp. 70-85). Springer, Cham. https://doi.org/10.1007/978-3-319-22264-6_5
Bontempi, G., y Kruijtzer, W. (2002). A Data Analysis Method for Software Performance Prediction. Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition, 971-976. https://doi.org/10.1109/DATE.2002.998417
Brauckhoff, D., Salamatian, K., y May, M. (2009). Applying PCA for Traffic Anomaly Detection: Problems and Solutions. IEEE INFOCOM 2009, 2866-2870. https://doi.org/10.1109/infcom.2009.5062248
De, P., Sinha, A. P., y Vessey, I. (2001). An Empirical Investigation of Factors Influencing Object-Oriented Database Querying. Information Technology and Management, 2, 71-93. https://doi.org/10.1023/A:1009934820999
Duggan, J., Cetintemel, U., Papaemmanouil, O., y Upfal, E. ( Junio del 2011). Performance Prediction for Concurrent Database Workloads. Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD '11), 337-348. https://doi.org/10.1145/1989323.1989359
Fortier, P. J., y Michel, H. E. (2003). Computer Systems Performance Evaluation and Prediction, Digital Press.
Ganapathi, A., Kuno, H., Dayal, U., Wiener, J. L., Fox, A., Jordan, M., y Patterson, D. (Marzo del 2009). Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. 2009 IEEE 25th International Conference on Data Engineering, 592-603. https://doi.org/10.1109/icde.2009.130
Ganapathi, A. S., Kuno, H. A., y Dayal, U. (2015). Predicting Performance of Multiple Queries Executing in a Database (US 9,189,523 B2). United States Patent and Trademark Office.
Giusto, P., Martin, G., y Harcourt, E. (2001). Reliable Estimation of Execution Time of Embedded Software. Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001, 580-588. https://doi.org/10.1109/DATE.2001.915082
Hadi, A. S., y Ling, R. F. (1998). Some Cautionary Notes on the Use of Principal Components Regression. The American Statistician, 52(1), 15-19. https://doi.org/10.1080/00031305.1998.10480530
James, G., Witten, D., Hastie, T., y Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. Springer.
Kleinrock, L. (1976). Queuing Systems, Volume II: Computer Applications. John Wiley & Sons. Lam, H. T., Thiebaut, J. M., Sinn, M., Chen, B., Mai, T., y Alkan, O. (2017). One Button Machine for Automating Feature Engineering in Relational Databases. arXiv. http://arxiv.org/abs/1706.00327
Lee, H., Park, Y. M., y Lee, S. (2015). Principal Component Regression by Principal Component Selection. Communications for Statistical Applications and Methods, 22(2), 173-180. https://doi.org/10.5351/CSAM.2015.22.2.173
Mikolajczyk, Y., y Schmid, C. (2005). A Performance Evaluation of Local Descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615-1630.
Panda, R., Erb, C., Lebeane, M., Ryoo, J. H., y John, L. K. (Octubre del 2015). Performance Characterization of Modern Databases on Out-of-Order CPUs. 2015 27th International Symposium on Computer Architecture and High Performance Computing, 114-121. https://doi.org/10.1109/SBAC-PAD.2015.31
Schkolnick, M., y Tiberio, P. (1985). Estimating the Cost of Updates in a Relational Database. ACM Transactions on Database Systems (TODS), 10(2), 163-179. https://doi.org/10.1145/3857.3863
Shawe-Taylor, J. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press. https://doi.org/10.1017/CBO9780511809682
Shyu, M. L., Chen, S. C., Sarinnapakorn, K., y Chang, L. (2003). A Novel Anomaly Detection Scheme Based on Principal Component Classifier. Defense Technical Information Center. https://apps.dtic.mil/sti/citations/ADA465712
Smith, C. U., y Williams, L. G. (2000). Performance and Scalability of Distributed Software Architectures. An SPE approach. Parallel and Distributed Computing Practices, 3(4).
Woodside, M., Franks, G., y Petriu, D. C. (Mayo del 2007). The future of Software Performance Engineering. Future of Software Engineering (FOSE'07), 171-187. https://doi.org/10.1109/FOSE.2007.32
Yu, P. S., Chen, M. S., Heiss, H. U., y Lee, S. (1992). On Workload Characterization of Relational Database Environments. IEEE Transactions on Software Engineering, 18(4), 347-355. https://doi.org/10.1109/32.129222