Sentiment Analysis of Public Comments on YouTube Content Using Principal Component Analysis and Naive Bayes

Authors

  • Dede Pratama Magister Teknik Informatika, Universitas Putra Indonesia YPTK Padang
  • Sumijan Magister Teknik Informatika, Universitas Putra Indonesia YPTK Padang
  • Rini Sovia Magister Teknik Informatika, Universitas Putra Indonesia YPTK Padang

DOI:

https://doi.org/10.46984/ra6yvz19

Keywords:

Sentiment Analysis, YouTube, TVRI, TF-IDF, Principal Component Analysis, Gaussian Naive Bayes

Abstract

The rapid acceleration of digital media development compels public broadcasting institutions to adapt to shifting public information consumption patterns, which are now centered on online platforms. TVRI Sumatera Barat has responded to these dynamics by leveraging YouTube as a channel for content distribution and audience engagement. However, this interaction generates a massive volume of unstructured comment text, rendering manual sentiment analysis inefficient, time-consuming, and prone to subjectivity. This study aims to address these challenges by automatically and objectively classifying user sentiment using a machine learning approach. The applied methodology integrates Principal Component Analysis (PCA) and the Gaussian Naive Bayes algorithm. PCA serves as a dimensionality reduction technique to simplify TF-IDF weighted text features without losing vital information, while Gaussian Naive Bayes was selected for classification due to its efficiency in rapidly processing the continuous numerical data resulting from the PCA transformation. The research dataset comprises 10 comments from the TVRI Sumatera Barat YouTube channel in 2024, collected via the YouTube Data API, which underwent preprocessing and labeling for positive and negative sentiments. Model validation was conducted using a confusion matrix with accuracy, precision, recall, and F1-score metrics. The test results demonstrate that the combination of PCA and Gaussian Naive Bayes effectively enhances computational efficiency and delivers precise classification performance. This research makes a significant contribution by providing a measurable method for public opinion analysis, which is essential as a basis for evaluating audience perception to improve the quality of digital broadcasting strategies in public institutions.

References

Astriani, W., Bachri, O. S., & Irawan, B. (2025). Classification of product review sentiment using Naive Bayes. Bulletin of Informatics, 8(2). https://doi.org/10.32877/bt.v8i2.3554

Aziz, F. A., & Harahap, L. S. (2025). Sentiment analysis regarding the Indonesian House of Representatives using Naive Bayes. JEECS, 10(1), 31–37. https://doi.org/10.54732/jeecs.v10i1.4

Fajria, A. M., Faqih, A., & Dwilestari, G. (2025). The impact of Principal Component Analysis dimensionality reduction on sentiment classification performance. Journal of Artificial Intelligence and Engineering Applications, 4(2), 764–770. https://doi.org/10.59934/jaiea.v4i2.744

Khoerunnisa, S., Shiddieq, D. F., & Nurhayati, D. (2025). Sentiment analysis using Naive Bayes and TF-IDF with cross validation. MALCOM, 5(2), 566–577. https://doi.org/10.57152/malcom.v5i2.1852

Lestari, A. A., Faqih, A., & Dwilestari, G. (2025). Improving sentiment analysis performance using PCA and Naïve Bayes. Journal of Artificial Intelligence and Engineering Applications, 4(2), 758–763. https://doi.org/10.59934/jaiea.v4i2.743

Luo, L., & Liu, T. (2024). Integrating advanced PCA into Naive Bayes for enhanced classification performance. Advances in Operation Research and Production Management, 3(1), 27–31. https://doi.org/10.54254/3029-0880/3/2024019

Madjid, M. F., Ratnawati, D. E., & Rahayudi, B. (2023). Sentiment analysis on app reviews using SVM and Naïve Bayes. Sinkron, 8(1), 556–562. https://doi.org/10.33395/sinkron.v8i1.12161

Prastyo, D., Irawan, D., & Mursyidin, I. H. (2024). Klasifikasi sentimen komentar YouTube dengan NLP pada debat Pilkada Banten 2024. Bit-Tech, 7(2), 413–421. https://doi.org/10.32877/bt.v7i2.1833

Purbaratri, W., Purnomo, H. D., Manongga, D., Setyawan, I., & Hendry, H. (2024). Sentiment analysis of e-government service using the Naive Bayes algorithm. MATRIK, 23(2), 441–452. https://doi.org/10.30812/matrik.v23i2.3272

Sarwadi, S., Rosnelly, R., & Triandi, B. (2025). Feature selection analysis using PCA and Naïve Bayes. ZERO: Jurnal Sains, Matematika dan Terapan, 9(1), 1–14. https://doi.org/10.30829/zero.v9i1.24086

Sharma, S., Kumbhakar, M., Hedau, V., & Gupta, V. B. (2025). Decoding viewer reactions: Sentiment and emoji analysis on YouTube. Proceedings of the International Conference on Social Media Analysis, 53–62. https://doi.org/10.2991/978-94-6463-716-8_5

Susanti, E., Maimunah, M., & Nugroho, S. (2025). Sentiment analysis of YouTube comments using machine learning models. PIKSEL, 13(1), 103–114. https://doi.org/10.33558/piksel.v13i1.10743

Umar, N., & Nur, M. A. (2022). Application of Naïve Bayes algorithm variations on Indonesian dataset. Jurnal RESTI, 6(4), 585–590. https://doi.org/10.29207/resti.v6i4.4179

Haddi, E., Liu, X., & Shi, Y. (2021). The role of text pre-processing in sentiment analysis. Procedia Computer Science, 17, 17–23. https://doi.org/10.1016/j.procs.2013.05.005

Hasan, A., Moin, S., Karim, A., & Shamshirband, S. (2022). Machine learning-based sentiment analysis for Twitter accounts. Mathematical and Computational Applications, 23(1), 11. https://doi.org/10.3390/mca23010011

Kaur, H., Mangat, V., & Nidhi. (2023). A survey of sentiment analysis techniques. Procedia Computer Science, 218, 2300–2308. https://doi.org/10.1016/j.procs.2023.01.206

Pamungkas, E. W., Basile, V., & Patti, V. (2022). Towards hate speech detection in code-switched language. IEEE Access, 10, 1561–1572. https://doi.org/10.1109/ACCESS.2021.3137309

Rahayu, N., & Sensuse, D. I. (2022). Sentiment analysis on e-commerce product reviews in Indonesian language using various machine learning algorithms. Procedia Computer Science, 197, 671–680. https://doi.org/10.1016/j.procs.2021.12.189

Wibowo, A. T., Aji, A. F., Winata, G. I., Cahyawijaya, S., Kang, M., Bahar, A., & Purwarianti, A. (2021). IndoCollex: A testbed for morphological transformation of Indonesian word colloquialism. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 3170–3180. https://doi.org/10.18653/v1/2021.findings-acl.280

Downloads

Published

2026-06-30

Issue

Section

Articles

How to Cite

“Sentiment Analysis of Public Comments on YouTube Content Using Principal Component Analysis and Naive Bayes” (2026) Sebatik, 30(1), pp. 214–220. doi:10.46984/ra6yvz19.