Jurnal Teknik Informatika (JUTIF)
Vol. 7 No. 1 (2026): JUTIF Volume 7, Number 1, February 2026

Optimizing Bag of Words and Word2Vec with Vocabulary Pruning and TF-IDF Weighted Embeddings for Accurate Chatbot Responses in Indonesian Treasury Services

Aprianto, Eko (Unknown)
Mahdiana, Deni (Unknown)
Wibowo, Arief (Unknown)



Article Info

Publish Date
15 Feb 2026

Abstract

The high volume of support tickets submitted to the HAI DJPb Service Desk has caused delays and inconsistent response quality in payroll-related inquiries across Indonesian treasury work units (Satker). To improve the accuracy and efficiency of public service responses, this research proposes an optimized text-vectorization framework for chatbot development using a hybrid combination of Bag of Words (BoW), Word2Vec, vocabulary pruning, and TF-IDF weighted embeddings. The dataset consists of 2024 ticket logs, curated FAQs, and questionnaire data related to the Satker Web Payroll Application. The method includes preprocessing (snippet removal, normalization, tokenization, stopword removal, stemming), vocabulary pruning based on empirical frequency thresholds (<5 and >80) while preserving domain-specific technical terms, and semantic weighting through TF-IDF. Four vectorization models—BoW, BoW with pruning, Word2Vec, and Word2Vec + TF-IDF—were evaluated using cosine similarity, response time, and accuracy. Results show that BoW achieved the highest accuracy of 88.32%, while Word2Vec produced the most stable response time with an average of 47.32 ms and a cosine similarity of 0.99. The findings demonstrate that frequency-based representations remain highly effective for structured administrative datasets, while weighted embeddings improve semantic relevance. This study contributes to the field of Informatics by providing an efficient hybrid vectorization framework tailored for Indonesian administrative language, enabling more accurate and scalable chatbot solutions for e-government services.

Copyrights © 2026






Journal Info

Abbrev

jurnal

Publisher

Subject

Computer Science & IT

Description

Jurnal Teknik Informatika (JUTIF) is an Indonesian national journal, publishes high-quality research papers in the broad field of Informatics, Information Systems and Computer Science, which encompasses software engineering, information system development, computer systems, computer network, ...