Sentiment Analysis of Public Comments on YouTube Content Using Principal Component Analysis and Naive Bayes
DOI:
https://doi.org/10.46984/ra6yvz19Keywords:
Sentiment Analysis, YouTube, TVRI, TF-IDF, Principal Component Analysis, Gaussian Naive BayesAbstract
The rapid acceleration of digital media development compels public broadcasting institutions to adapt to shifting public information consumption patterns, which are now centered on online platforms. TVRI Sumatera Barat has responded to these dynamics by leveraging YouTube as a channel for content distribution and audience engagement. However, this interaction generates a massive volume of unstructured comment text, rendering manual sentiment analysis inefficient, time-consuming, and prone to subjectivity. This study aims to address these challenges by automatically and objectively classifying user sentiment using a machine learning approach. The applied methodology integrates Principal Component Analysis (PCA) and the Gaussian Naive Bayes algorithm. PCA serves as a dimensionality reduction technique to simplify TF-IDF weighted text features without losing vital information, while Gaussian Naive Bayes was selected for classification due to its efficiency in rapidly processing the continuous numerical data resulting from the PCA transformation. The research dataset comprises 10 comments from the TVRI Sumatera Barat YouTube channel in 2024, collected via the YouTube Data API, which underwent preprocessing and labeling for positive and negative sentiments. Model validation was conducted using a confusion matrix with accuracy, precision, recall, and F1-score metrics. The test results demonstrate that the combination of PCA and Gaussian Naive Bayes effectively enhances computational efficiency and delivers precise classification performance. This research makes a significant contribution by providing a measurable method for public opinion analysis, which is essential as a basis for evaluating audience perception to improve the quality of digital broadcasting strategies in public institutions.
References
Astriani, W., Bachri, O. S., & Irawan, B. (2025). Classification of product review sentiment using Naive Bayes. Bulletin of Informatics, 8(2). https://doi.org/10.32877/bt.v8i2.3554
Aziz, F. A., & Harahap, L. S. (2025). Sentiment analysis regarding the Indonesian House of Representatives using Naive Bayes. JEECS, 10(1), 31–37. https://doi.org/10.54732/jeecs.v10i1.4
Fajria, A. M., Faqih, A., & Dwilestari, G. (2025). The impact of Principal Component Analysis dimensionality reduction on sentiment classification performance. Journal of Artificial Intelligence and Engineering Applications, 4(2), 764–770. https://doi.org/10.59934/jaiea.v4i2.744
Khoerunnisa, S., Shiddieq, D. F., & Nurhayati, D. (2025). Sentiment analysis using Naive Bayes and TF-IDF with cross validation. MALCOM, 5(2), 566–577. https://doi.org/10.57152/malcom.v5i2.1852
Lestari, A. A., Faqih, A., & Dwilestari, G. (2025). Improving sentiment analysis performance using PCA and Naïve Bayes. Journal of Artificial Intelligence and Engineering Applications, 4(2), 758–763. https://doi.org/10.59934/jaiea.v4i2.743
Luo, L., & Liu, T. (2024). Integrating advanced PCA into Naive Bayes for enhanced classification performance. Advances in Operation Research and Production Management, 3(1), 27–31. https://doi.org/10.54254/3029-0880/3/2024019
Madjid, M. F., Ratnawati, D. E., & Rahayudi, B. (2023). Sentiment analysis on app reviews using SVM and Naïve Bayes. Sinkron, 8(1), 556–562. https://doi.org/10.33395/sinkron.v8i1.12161
Prastyo, D., Irawan, D., & Mursyidin, I. H. (2024). Klasifikasi sentimen komentar YouTube dengan NLP pada debat Pilkada Banten 2024. Bit-Tech, 7(2), 413–421. https://doi.org/10.32877/bt.v7i2.1833
Purbaratri, W., Purnomo, H. D., Manongga, D., Setyawan, I., & Hendry, H. (2024). Sentiment analysis of e-government service using the Naive Bayes algorithm. MATRIK, 23(2), 441–452. https://doi.org/10.30812/matrik.v23i2.3272
Sarwadi, S., Rosnelly, R., & Triandi, B. (2025). Feature selection analysis using PCA and Naïve Bayes. ZERO: Jurnal Sains, Matematika dan Terapan, 9(1), 1–14. https://doi.org/10.30829/zero.v9i1.24086
Sharma, S., Kumbhakar, M., Hedau, V., & Gupta, V. B. (2025). Decoding viewer reactions: Sentiment and emoji analysis on YouTube. Proceedings of the International Conference on Social Media Analysis, 53–62. https://doi.org/10.2991/978-94-6463-716-8_5
Susanti, E., Maimunah, M., & Nugroho, S. (2025). Sentiment analysis of YouTube comments using machine learning models. PIKSEL, 13(1), 103–114. https://doi.org/10.33558/piksel.v13i1.10743
Umar, N., & Nur, M. A. (2022). Application of Naïve Bayes algorithm variations on Indonesian dataset. Jurnal RESTI, 6(4), 585–590. https://doi.org/10.29207/resti.v6i4.4179
Haddi, E., Liu, X., & Shi, Y. (2021). The role of text pre-processing in sentiment analysis. Procedia Computer Science, 17, 17–23. https://doi.org/10.1016/j.procs.2013.05.005
Hasan, A., Moin, S., Karim, A., & Shamshirband, S. (2022). Machine learning-based sentiment analysis for Twitter accounts. Mathematical and Computational Applications, 23(1), 11. https://doi.org/10.3390/mca23010011
Kaur, H., Mangat, V., & Nidhi. (2023). A survey of sentiment analysis techniques. Procedia Computer Science, 218, 2300–2308. https://doi.org/10.1016/j.procs.2023.01.206
Pamungkas, E. W., Basile, V., & Patti, V. (2022). Towards hate speech detection in code-switched language. IEEE Access, 10, 1561–1572. https://doi.org/10.1109/ACCESS.2021.3137309
Rahayu, N., & Sensuse, D. I. (2022). Sentiment analysis on e-commerce product reviews in Indonesian language using various machine learning algorithms. Procedia Computer Science, 197, 671–680. https://doi.org/10.1016/j.procs.2021.12.189
Wibowo, A. T., Aji, A. F., Winata, G. I., Cahyawijaya, S., Kang, M., Bahar, A., & Purwarianti, A. (2021). IndoCollex: A testbed for morphological transformation of Indonesian word colloquialism. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 3170–3180. https://doi.org/10.18653/v1/2021.findings-acl.280
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Dede Pratama, Sumijan, Rini Sovia

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain all their rights to the published works, such as (but not limited to) the following rights; Copyright and other proprietary rights relating to the article, such as patent rights, The right to use the substance of the article in own future works, including lectures and books, The right to reproduce the article for own purposes, The right to self-archive the article





