Claim Missing Document
Check
Articles

Found 4 Documents
Search

Enhancing BERTopic with Neural Network Clustering for Thematic Analysis of U.S. Presidential Speeches Anggai, Sajarwo; Zain, Rafi Mahmud; Tukiyat, Tukiyat; Waskita, Arya Adhyaksa
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 4 (2025): JUTIF Volume 6, Number 4, Agustus 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.4.5090

Abstract

Understanding the underlying themes in presidential speeches is critical for analyzing political discourse and determining public policy direction.  However, topic modeling in this context presents difficulties, particularly when clustering semantically rich topics from high-dimensional embeddings.  This study seeks to improve topic modeling performance by incorporating a Neural Network Clustering (NNC) approach into the BERTopic pipeline.  We analyze 2,747 speeches delivered by U.S President Joe Biden (2021-2025) and compare three clustering techniques: HDBSCAN, KMeans, and the proposed Autoencoder-based NNC.  The evaluation metrics (UMass, NPMI, Topic Diversity) show that NNC produces the most coherent and diverse topic clusters (UMass = -0.4548, NPMI = 0.0234, Diversity = 0.3950, ).  These findings show that NNC can overcome the limitations of density and centroid-based clustering in high-dimensional semantic spaces. The study contributes to the field of Natural Language Processing by demonstrating how neural-based clustering can improve topic modeling, particularly for complex, real-world political corpora.
Narasi Presiden Indonesia: Analisis Wacana Politik Menggunakan BERTopic dalam Mengungkap Pola Tematik Pidato Presiden Uliyatunisa, Uliyatunisa; Tukiyat, Tukiyat; Waskita, Arya Adhyaksa; Handayani, Murni; Zain, Rafi Mahmud
Building of Informatics, Technology and Science (BITS) Vol 7 No 2 (2025): September 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i2.8298

Abstract

The speeches of the President of Indonesia play an important role as a means of political communication, policy delivery, and leadership image building in front of the public. However, the increasing volume of speeches presents new challenges in the manual analysis process, as it is time-consuming and prone to researcher subjectivity. This study offers a solution by using BERTopic, a transformer-based topic modelling method that utilises semantic representations from modern embedding models. The research data consists of transcripts of President Joko Widodo's official speeches obtained from the Cabinet Secretariat portal. To improve the quality of semantic representations, this study compares several Indonesian language embedding models, namely DistilBERT, NusaBERT, IndoE5, and SBERT. The analysis process was carried out through the stages of data preprocessing, embedding formation, dimension reduction, clustering, and model evaluation using topic coherence metrics. The objectives of this study were to reveal the themes contained in the President's speeches and to evaluate the effectiveness of embedding models in producing more coherent topics. The results show twenty main themes that consistently appear, including infrastructure development, economic policy, health and the pandemic, digital transformation, international diplomacy, sports, nationalism issues, and regional development. In terms of performance, SBERT provides the best results with a coherence value of UMass = -2.036 and NPMI = 0.082, indicating a positive semantic relationship. A UMass value close to zero indicates greater coherence of words within a topic, while an NPMI value above zero indicates that the connections between words are more easily understood by humans. This research contributes to the development of NLP-based political discourse studies in Indonesia, providing an empirical overview of the selection of appropriate embedding models in topic modelling and opening up opportunities for the integration of similar methods in public policy analysis.
Ekstraksi Topik dalam Dataset Menggunakan Teknik Pemodelan Topik Anggai, Sajarwo; Tukiyat; Rivai, Abu Khalid; Zain, Rafi Mahmud
Jurnal Ilmu Komputer Vol 2 No 1 (2024): Jurnal Ilmu Komputer (Edisi Juli 2024)
Publisher : Universitas Pamulang

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

The issue in this research is the lack of understanding regarding the main topics and their changes in speeches and media publications related to President Joko Widodo. This study aims to identify, analyze, and predict changes in key topics within speeches, statements, and media publications related to President Joko Widodo using Latent Dirichlet Allocation (LDA) topic modeling techniques. The research employs a quantitative approach to analyze President Joko Widodo's speech texts using the Latent Dirichlet Allocation (LDA) method. The process began with scraping documents from the official website of the Republic of Indonesia's Secretariat, resulting in 5,988 speech transcripts from October 20, 2014, to March 2, 2024. Text preprocessing involved tokenization, stopword removal, and stemming/ lemmatization, followed by dictionary-term formation. The findings indicate that the model with k=16 has the highest coherence (0.554) and the best perplexity at k=21 (-13.130). The main topics identified include Nationalism and National Values, Regional Government, and Education and Children. Topic visualization with PyLDAvis aids in the exploration and identification of topics, providing insights for decision-making and policy development. To enhance understanding of topic changes, it is recommended to conduct trend analysis on key topics over time. This will help identify how President Joko Widodo's priorities shift and respond to new issues. By monitoring these trends, the research can provide deeper insights into the evolution of policies and the President's focus.
Penerapan Data Mining Dalam Menentukan Pelajaran yang Diminati Dengan Metode Support Vector Mechine (SVM) Sulistilawati, Iis; Musyafa , Ahmad; Zain, Rafi Mahmud
Jurnal Ilmu Komputer Vol 2 No 1 (2024): Jurnal Ilmu Komputer (Edisi Juli 2024)
Publisher : Universitas Pamulang

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

This research focuses on predicting students' grades and learning interests using the Support Vector Machine (SVM) method in education. It is crucial for students to determine the subjects they are most interested in to achieve optimal results. Data mining, particularly through the SVM method, can help identify students' learning interests based on historical data. SVM analyzes various variables related to students' academic performance, such as test scores, assignments, attendance, and class participation. By analyzing this data, SVM builds a model that predicts students' interest in specific subjects. This model provides insights into students' potential academic performance. The predictions generated by this SVM model can be used by educators to design more personalized learning strategies. By understanding students' interests and potential, educators can offer lesson recommendations that align with individual needs, enhancing students' motivation and academic performance. Moreover, these predictions assist students in making more informed decisions regarding subject selection, optimizing their potential and achieving better educational outcomes. This research aims to provide a tool that helps direct students' learning interests effectively, thereby improving the overall quality of education and academic results.