Mohammad Andri Budiman
Jurusan Ilmu Komputer Fakultas Ilmu Komputer Dan Teknologi Informasi, USU

Published : 11 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Journal of Information Systems and Informatics

Reducing Semantic Distortion of Multiword Expressions for Topic Modeling with Latent Dirichlet Allocation Sitopu, Widya Astuti; Nababan, Erna Budhiarti; Budiman, Mohammad Andri
Journal of Information System and Informatics Vol 7 No 3 (2025): September
Publisher : Asosiasi Doktor Sistem Informasi Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51519/journalisi.v7i3.1266

Abstract

The Makan Bergizi Gratis (MBG) is one of the Indonesian government’s priority initiatives that has received significant coverage in online media. To understand the main themes within these narratives, this study applies topic modeling using Latent Dirichlet Allocation (LDA). However, the results of topic modeling are highly influenced by the preprocessing stage, particularly in handling multiword expressions (MWEs) such as named entities, collocations, and compound words. This study compares two preprocessing approaches: basic and extended, with the latter involving the masking of MWEs. Experimental results show that the extended preprocessing model achieved the highest coherence score of 0.5149 at K=22K = 22K=22, with four other scores also exceeding 0.496, whereas the basic preprocessing model only reached a maximum of 0.3932 at K=10K = 10K=10. Furthermore, cosine similarity scores between topics in the extended model were lower (maximum 0.7406) than in the basic model (maximum 0.8244), indicating that the topics produced were more diverse and less overlapping. These findings highlight the importance of preprocessing strategies that preserve phrase-level meaning to reduce semantic distortion and improve topic coherence and representation-particularly in analyzing media discourse on public policy programs such as MBG.