The high volume of support tickets submitted to the HAI DJPb Service Desk has caused delays and inconsistent response quality in payroll-related inquiries across Indonesian treasury work units (Satker). To improve the accuracy and efficiency of public service responses, this research proposes an optimized text-vectorization framework for chatbot development using a hybrid combination of Bag of Words (BoW), Word2Vec, vocabulary pruning, and TF-IDF weighted embeddings. The dataset consists of 2024 ticket logs, curated FAQs, and questionnaire data related to the Satker Web Payroll Application. The method includes preprocessing (snippet removal, normalization, tokenization, stopword removal, stemming), vocabulary pruning based on empirical frequency thresholds (<5 and >80) while preserving domain-specific technical terms, and semantic weighting through TF-IDF. Four vectorization models—BoW, BoW with pruning, Word2Vec, and Word2Vec + TF-IDF—were evaluated using cosine similarity, response time, and accuracy. Results show that BoW achieved the highest accuracy of 88.32%, while Word2Vec produced the most stable response time with an average of 47.32 ms and a cosine similarity of 0.99. The findings demonstrate that frequency-based representations remain highly effective for structured administrative datasets, while weighted embeddings improve semantic relevance. This study contributes to the field of Informatics by providing an efficient hybrid vectorization framework tailored for Indonesian administrative language, enabling more accurate and scalable chatbot solutions for e-government services.
Copyrights © 2026