Garuda - Garba Rujukan Digital

Jurnal Nasional Teknik Elektro dan Teknologi Informasi

Vol 14 No 2: Mei 2025

Muhammad Rafli Aditya H. (Unknown)
Muhammad Ilham (Unknown)
Dewi Fatmarani Surianto (Unknown)
Abdul Muis Mappalotteng (Unknown)

Publish Date
28 May 2025

Kamus Besar Bahasa Indonesia (KBBI) is a primary resource for data in research on determining word-meaning similarity in Indonesian. This study investigates the effectiveness of word embedding methods and the term frequency–inverse document frequency (TF-IDF) weighting technique in assessing the semantic similarity of synonym pairs. The objective is to measure the similarity of synonym word pairs listed in KBBI by applying cosine similarity, leveraging TF-IDF weighting, various word embedding models, and latent semantic analysis (LSA). The methodology involved data collection, followed by a text preprocessing stage consisting of case folding, stopword removal, stemming, and tokenization. The processed data were transformed into vector representations using word embedding models, including Word2Vec, fastText, GloVe, and sentence-bidirectional encoder representations from transformers (S-BERT), and TF-IDF. LSA was employed for dimensionality reduction of the vectors before similarity testing using cosine similarity, with final evaluation of the results. The findings revealed that fastText significantly improved the similarity scores between synonym pairs, achieving an average similarity score of 0.901 for 30 synonym pairs. Evaluation results indicated an accuracy of 0.88, a recall of 1.00, a precision of 0.81, and an F1 score of 0.90. These results suggest that fastText is more effective in enhancing the accuracy of synonym meaning similarity measurements. Future research is encouraged to expand the corpus and further explore the use of word embedding for semantic similarity tasks. This study contributes to the natural language processing advancement and provides a potential foundation for more accurate language-based applications that assess word meaning similarity in KBBI.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Jurnal Nasional Teknik Elektro dan Teknologi Informasi

Website

Abbrev

JNTETI

Publisher

Universitas Gadjah Mada

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering Energy Engineering

Description

Topics cover the fields of (but not limited to): 1. Information Technology: Software Engineering, Knowledge and Data Mining, Multimedia Technologies, Mobile Computing, Parallel/Distributed Computing, Artificial Intelligence, Computer Graphics, Virtual Reality 2. Power Systems: Power Generation, ...

Article Info

Abstract

Evaluasi Pengukuran Semantik Sinonim KBBI Menggunakan Pendekatan Word Embedding

Article Info

Abstract