Model Interpretation for Student Major Selection Using Principal Component Analysis and Random Forest

Authors

DOI:

https://doi.org/10.46984/zyc1hh02

Keywords:

Major Selection, PCA, Random Forest, Recommendation, SHAP

Abstract

The development of information technology has had a significant impact on the education sector by providing data-driven tools to support the process of major selection. This process often causes confusion among students due to its crucial role in determining their academic and career futures. This study aims to develop an accurate and transparent recommendation system for major selection through the integration of Principal Component Analysis (PCA), Random Forest (RF), and SHAP. The research follows a systematic framework that includes data processing and model evaluation stages. PCA is applied to reduce the dimensionality of complex student data in order to improve computational efficiency and minimize information redundancy. Furthermore, the Random Forest algorithm is employed as a classification model to predict major recommendations such as Science, Social Sciences, and Religious Studies. The SHAP method is integrated to provide both mathematical and visual interpretations of the contribution of each academic feature to the model’s prediction results. The research data are obtained from the internal records of MAN 1 Payakumbuh covering the last three academic years (2022/2023–2024/2025). The dataset consists of 571 eleventh-grade students with tenth-grade academic scores and non-academic skill variables. The implementation of this model is able to provide more objective recommendations compared to conventional subjective assessments, achieving an accuracy of 88.70%. Visualization of feature contributions using SHAP enhances transparency and facilitates stakeholders’ understanding of the basis for each model decision. This study contributes to improving the efficiency of the major selection process and supports more accurate academic decision-making for students and educators.

References

Ahamd, M. (2025). Pengaruh Penggunaan Artificial Intelligence (Ai) Dalam Pembelajaran, Motivasi Belajar, Dan Gaya Belajar, Terhadap Keterampilan Berpikir Kritis Mahasiswa Pendidikan Ekonomi Universitas Lampung.

Al Ghifari, M. G. (2021). Prediksi Dropout Siswa dengan Kecerdasan Buatan yang Dapat Dijelaskan (Explainable AI) Menggunakan SHAP dan Machine Learning.

Albar, F. (2025). Analisis Sentimen dengan Komparasi Random Forest, SVM dan Naive Bayes menggunakan Dataset 20 Aplikasi Edukasi Anak. Universitas Islam Indonesia.

Alboaneen, D., Alqarni, R., Alqahtani, S., Alrashidi, M., Alhuda, R., Alyahyan, E., & Alshammari, T. (2023). Predicting colorectal cancer using machine and deep learning algorithms: Challenges and opportunities. Big Data and Cognitive Computing, 7(2), 74.

Alifariki, L. (2022). Metode Epidemiologi Sosial.

Ananda, A. T., & Malik, M. U. I. (2025). Adaptive learning Islamic education: Literature review model pembelajaran PAI adaptif. Jurnal Pembelajaran Dan Pengajaran, 8(2).

Aprilya, A., Setyani, G. R. T., Pitaloka, N., & Cahyani, S. P. (2023). Menganalisis Efektivitas Metode Evaluasi Pembelajaran di Sekolah Dasar Tinjau terhadap Kinerja Siswa dan Peningkatan Prestasi Belajar. Al-DYAS. AL-DYAS: Jurnal Inovasi Dan Pengabdian Kepada Masyarakat, 2(3), 595–603.

AS, A. H., Anam, K., & Rahman, M. (2021). Penerapan Data Mining Untuk Menemukan Pola Asosiasi Aktivitas Belajar Dan Prestasi Santri Menggunakan Algoritma Apriori.

Astuti, M. (2024). Perbandingan Metode Random Forest dan Naive Bayes pada Klasifikasi Perilaku Mahasiswa di LMS SPADA Indonesia= Comparison of Random Forest and Naive Bayes Methods in Student Behavior Classification at LMS SPADA Indonesia. Universitas Hasanuddin.

Dita, O. P., Antara, R. M., & Winarno, A. (2025). Tanggung jawab etis penggunaan artificial intelligence di tanah pendidikan: Formulasi paradigma baru untuk teknologi otonom. Master Manajemen, 3(2), 57–83.

Fadhilah, A. (2021). Etika Privasi Data dalam Social Network Mining.

Fajri, I. T. I., Sari, H. L., Kom, S., Kom, M., Dinata, R. K., Hasdyna, N., Retno, S., & Fadhilah, C. (2024). Data Mining. Serasi Media Teknologi.

Gustirani, A. (2024). Penerapan Data Mining Untuk Rekomendasi Bidang Studi Menggunakan Algoritma K-Medoids Pada SMA N 9 Kota Jambi. Jurnal Informatika Dan Rekayasa Komputer (JAKAKOM), 4(2), 1177–1186.

Handayani, F. (2022). Aplikasi Data Mining Menggunakan Algoritma K-Means Clustering untuk Mengelompokan Mahasiswa Berdasarkan Gaya Belajar. Jurnal Teknologi Dan Informasi, 12(1), 46–63.

Jufri, A. P., Asri, W. K., Mannahali, M., & Vidya, A. (2023). Strategi pembelajaran: Menggali potensi belajar melalui model, pendekatan, dan metode yang efektif. Ananta Vidya.

Laksono, M. I. A. (2025). Pemanfaatan Algoritma Data Mining Untuk Mendeteksi Anomali Sebagai Red Flag Dalam Audit Data E-Procurement Di Indonesia. Politeknik Keuangan Negara STAN.

Maulana, S., Premana, A., & Irawan, B. (2025). Prediksi Prestasi Akademik Siswa Terbaik Menggunakan Algoritma Decision Tree Berbasis Data Historis. JATI (Jurnal Mahasiswa Teknik Informatika), 9(5), 7890–7897.

Muis, A., Syafwan, H., Arfianto, A. Z., Simanjuntak, M. S., Riyandi, A., Trisnawan, A. B., Ramdhan, W., Triana, H., Saputra, M. H., & Handoko, W. (2025). DATA MINING: Konsep, Metode, dan Aplikasi. Faaslib Serambi Media.

Mulyanti, D. (2025). Strategi Manajemen Pendidikan di Era Digital: Optimalisasi Infrastruktur, SDM, dan Pembelajaran Berbasis Teknologi. Jurnal Pelita Nusantara, 2(4), 376–383.

Nirmala, H. (2025). AI dan Pendidikan: Peluang, Risiko, dan Strategi Implementasi untuk Guru dan Pendidikan. PT Indonesia Delapan Kreasi Nusa.

Rahayu, P. W., Sudipa, I. G. I., Suryani, S., Surachman, A., Ridwan, A., Darmawiguna, I. G. M., Sutoyo, M. N., Slamet, I., Harlina, S., & Maysanjaya, I. M. D. (2024). Buku ajar data mining. PT. Sonpedia Publishing Indonesia.

Rustiyana, R., Judijanto, L., Mahendra, G. S., Kamil, Z. A., Purba, D. N., Sutoyo, M. N., Hendrayana, I. G., Pasrun, Y. P., & Prayudani, S. (2025). Data Mining: Algoritma dan Penerapannya. PT. Sonpedia Publishing Indonesia.

Sulika, S. (2024). Klasifikasi kemampuan akademik peserta didik mengunakan metode Neural Network dan Metode C4. 5. Universitas Islam Negeri Maulana Malik Ibrahim.

Supriyono, L. A., Kusumastuti, S. Y., Hartanto, T., Atika, P. D., Kamil, Z. A., Rustiyana, R., Ginting, E. F., Maylani, I., Meilani, B. D., & Arifiyanti, A. A. (2025). Buku Ajar Big Data dan Data Mining: Konsep, Metodologi, dan Aplikasi. PT. Sonpedia Publishing Indonesia.

Victoria, A., Vanessa, P.-B., Mensing, S., Stodtmann, S., & Maier, C. S. (2024). Practical guide to SHAP analysis : Explaining supervised machine learning model predictions in drug development Mathematical background. August, 1–15. https://doi.org/10.1111/cts.70056

Xiao, W., Ji, P., & Hu, J. (2022). A survey on educational data mining methods used for predicting students’ performance. Engineering Reports, 4(5), e12482.

Downloads

Published

2026-06-30

Issue

Section

Articles

How to Cite

“Model Interpretation for Student Major Selection Using Principal Component Analysis and Random Forest” (2026) Sebatik, 30(1), pp. 153–160. doi:10.46984/zyc1hh02.