COMPARATIVE ANALYSIS OF ACCURACY OF RANDOM FOREST AND GRADIENT BOOSTING CLASSIFIER ALGORITHM FOR DIABETES CLASSIFICATION

Authors

DOI:

https://doi.org/10.46984/sebatik.v27i1.2157

Keywords:

Classification, Random Forest, Gradient boosting, Accuracy, Precision, Recall and, F1-score

Abstract

Diabetes is a disease characterized by high blood sugar (glucose) levels. If blood sugar is not controlled properly, it can cause various critical diseases, one of which is diabetes. The purpose of this study was to determine the results of a comparison of the accuracy values ​​of the Random Forest Algorithm and the Gradient Boosting Classifier Algorithm in the classification of diabetes which will be tested for accuracy, Precision, Recall, and F1 score performance. The method used in this study was descriptive and the data source used the Pima Indians Diabetes Dataset from Kaggle. Based on data analysis using a ratio of 80:20, the Random Forest Algorithm has an accuracy of 79% obtained from the results of the confusion matrix. From the confusion matrix results, the results obtained were AUC 0.835, Recall 78%, and Precision 90%. Based on the results of Recall and Precision, an F1 score of 83% was obtained. Whereas the Boosting Classifier Algorithm has an accuracy result obtained from the results of the confusion matrix which is 81%. From the confusion matrix results, the AUC results were 0.877, Recall 83%, and Precision 67%. Based on the results of Recall and Precision, an F1 score of 74% was obtained. In this study, the accuracy evaluation results obtained were through the results of the Confusion matrix and the AUC value. These results indicate that the Gradient Boosting Classifier Algorithm has a more excellent accuracy evaluation result compared to the Random Forest Algorithm.

References

Agajanian, S., Oluyemi, O. and Verkhivker, G.M. (2019) ‘Integration of Random Forest classifiers and deep convolutional neural networks for classification and biomolecular modeling of cancer driver mutations’, Frontiers in Molecular Biosciences, 6(JUN). Available at: https://doi.org/10.3389/fmolb.2019.00044.
Alita, D. and Isnain, A.R. (2020) ‘Pendeteksian Sarkasme pada Proses Analisis Sentimen Menggunakan Random Forest Classifier’, Jurnal Komputasi, 8(2), pp. 50–58. Available at: https://doi.org/10.23960/komputasi.v8i2.2615.
Argina, A.M. (2020) ‘Penerapan Metode Klasifikasi K-Nearest Neigbor pada Dataset Penderita Penyakit Diabetes’, Indonesian Journal of Data and Science, 1(2), pp. 29–33. Available at: https://doi.org/10.33096/ijodas.v1i2.11.
Arlis, S. et al. (2018) ‘Pola Penentuan Status Peminjaman Dengan’, pp. 619–623.
Azar, A.T. et al. (2014) ‘A Random Forest classifier for lymph diseases’, Computer Methods and Programs in Biomedicine, 113(2), pp. 465–473. Available at: https://doi.org/10.1016/j.cmpb.2013.11.004.
Azis, A. (2020) ‘Identifikasi Jenis Ikan Menggunakan Model Hybrid Deep Learning Dan Algoritma Klasifikasi’, Sebatik, 24(2), pp. 201–206. Available at: https://doi.org/10.46984/sebatik.v24i2.1057.
Bentéjac, C., Csörgő, A. and Martínez-Muñoz, G. (2019) ‘A Comparative Analysis of XGBoost’, pp. 1–20. Available at: https://doi.org/10.1007/s10462-020-09896-5.
Brian, T. (2017) ‘Analisis Learning Rates Pada Algoritma Backpropagation Untuk Klasifikasi Penyakit Diabetes’, Edutic - Scientific Journal of Informatics Education, 3(1), pp. 21–27. Available at: https://doi.org/10.21107/edutic.v3i1.2557.
Eng, C.L., Tong, J.C. and Tan, T.W. (2014) ‘Predicting host tropism of influenza A virus proteins using Random Forest’, BMC Medical Genomics, 7(3), pp. 1–11. Available at: https://doi.org/10.1186/1755-8794-7-S3-S1.
Gayatri, L. and Hendry, H. (2021) ‘PemetaaPenyebaran Covid-19 Pada Tingkat Kabupaten/Kota Di Pulau Jawa Menggunakan Algoritma K-Means Clustering’, Sebatik, 25(2), pp. 493–499. Available at: https://doi.org/10.46984/sebatik.v25i2.1307.
Hanifa, T.T., Adiwijaya and Al-Faraby, S. (2017) ‘Analisis Churn Prediction pada Data Pelanggan PT. Telekomunikasi dengan Logistic Regression dan Underbagging’, e-Proceeding of Engineering , 4(2), pp. 3210–3225.
Hasan, F.N., Hikmah, N. and Utami, D.Y. (2018) ‘Perbandingan Algoritma C4.5, KNN, dan Naive Bayes untuk Penentuan Model Klasifikasi Penanggung jawab BSI Entrepreneur Center’, Jurnal Pilar Nusa Mandiri, 14(2), p. 169. Available at: https://doi.org/10.33480/pilar.v14i2.908.
Indrayani, Sugianti, D. and Al Karomi, M.A. (2019) ‘Optimasi Parameter K pada Algoritma K-Nearest Neighbour untuk Klasifikasi Penyakit Diabetes Mellitus’, Prosiding SNATIF ke-6 Tahun 2019, (2007), pp. 96–101.
Indriani, A. (2020) ‘Analisa Perbandingan Metode Naïve Bayes Classifier Dan K-Nearest Neighbor Terhadap Klasifikasi Data’, Sebatik, 24(1), pp. 1–7. Available at: https://doi.org/10.46984/sebatik.v24i1.909.
Ke, G. et al. (2017) ‘LightGBM: A highly efficient gradient boosting decision tree’, Advances in Neural Information Processing Systems, 2017-December(Nips), pp. 3147–3155.
Kementerian Kesehatan RI. (2020) ‘Infodatin tetap produktif, cegah, dan atasi Diabetes Melitus 2020’, Pusat Data dan Informasi Kementerian Kesehatan RI, pp. 1–10.
Kumari, S., Kumar, D. and Mittal, M. (2021) ‘An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier’, International Journal of Cognitive Computing in Engineering, 2(January), pp. 40–46. Available at: https://doi.org/10.1016/j.ijcce.2021.01.001.
Larose, D.T. (2004) ‘Discovering Knowledge in Data’, Discovering Knowledge in Data [Preprint]. Available at: https://doi.org/10.1002/0471687545.
Memprediksi, U. and Diabetes, P. (2017) ‘Penerapan Algoritma Bayesian Regularization Backpropagation Untuk Memprediksi Penyakit Diabetes’, Jurnal MIPA, 39(2), pp. 150–158.
Murtadho, A. and Sulistyawati, D.H. (2020) ‘Machine Learning Untuk Perbandingan Tingkat Akurasi Prediksi Penyakit Diabetes Dengan Supervised Learning’, Repository Untag Surabaya [Preprint], (Ml).
NAHZAT, S. and YAĞANOĞLU, M. (2021) ‘Makine Öğrenimi Sınıflandırma Algoritmalarını Kullanarak Diyabet Tahmini’, European Journal of Science and Technology, (24), pp. 53–59. Available at: https://doi.org/10.31590/ejosat.899716.
Natekin, A. and Knoll, A. (2013) ‘Gradient boosting machines, a tutorial’, Frontiers in Neurorobotics, 7(DEC). Available at: https://doi.org/10.3389/fnbot.2013.00021.
Rahman, M.F. et al. (2017) ‘Klasifikasi Untuk Diagnosa Diabetes Menggunakan Metode Bayesian Regularization Neural Network (RBNN)’, Jurnal Informatika, 11(1), p. 36. Available at: https://doi.org/10.26555/jifo.v11i1.a5452.
Santosa, S. and Yuliantara, R. (2017) ‘Model Prediksi Pola Loyalitas Pelanggan Telekomunikasi Menggunakan Naive Bayes Dengan Optimasi Particle Swarm Optimization’, Jurnal Teknologi Informasi, 13(2), pp. 154–169. Available at: http://research.
Sullivan, R. (2012) ‘Introduction to data mining for the life sciences’, Introduction to Data Mining for the Life Sciences, 9781597452, pp. 1–635. Available at: https://doi.org/10.1007/978-1-59745-290-8.
Vitrack-Tamam, S. et al. (2020) ‘Random Forest algorithm improves detection of physiological activity embedded within reflectance spectra using stomatal conductance as a test case’, Remote Sensing, 12(14). Available at: https://doi.org/10.3390/rs12142213.
Wang, L. et al. (2016) ‘Estimation of biomass in wheat using Random Forest regression algorithm and remote sensing data’, Crop Journal, 4(3), pp. 212–219. Available at: https://doi.org/10.1016/j.cj.2016.01.008.
Yanto, M., Mulyani, S.R. and Mayola, L. (2019) ‘Peramalan Jumlah Produksi Air Dengan Algoritma Backpropagation’, Sebatik, 23(1), pp. 172–177. Available at: https://doi.org/10.46984/sebatik.v23i1.465.
Zhang, Y. and Haghani, A. (2015) ‘A gradient boosting method to improve travel time prediction’, Transportation Research Part C: Emerging Technologies, 58, pp. 308–324. Available at: https://doi.org/10.1016/j.trc.2015.02.019.

Downloads

Published

2023-06-06

How to Cite

Nainggolan, S. P. and Sinaga, A. (2023) “COMPARATIVE ANALYSIS OF ACCURACY OF RANDOM FOREST AND GRADIENT BOOSTING CLASSIFIER ALGORITHM FOR DIABETES CLASSIFICATION”, Sebatik, 27(1), pp. 97–102. doi: 10.46984/sebatik.v27i1.2157.

Issue

Section

Articles