Jurnal Teknik Informatika (JUTIF)
Vol. 7 No. 3 (2026): JUTIF Volume 7, Number 3, June 2026

Development of a Hybrid Machine Learning-Based E-Commerce Chatbot Using Jaccard Similarity and K-Nearest Neighbor for Accurate Intent Classification

Andrian Sah (Information Systems, Faculty of Computer Science, Universitas Yapis Papua, Indonesia)
Andi Ilham (Information Systems, Faculty of Computer Science, Universitas Yapis Papua, Indonesia)
Rasna Rasna (Information Systems, Faculty of Computer Science, Universitas Yapis Papua, Indonesia)
Siti Nurhayati (Information Systems, Faculty of Computer Science, Universitas Yapis Papua, Indonesia)



Article Info

Publish Date
15 Jun 2026

Abstract

The advancement of technology in the e-commerce industry requires fast and accurate information services, particularly through the use of Natural Language Processing (NLP)-based chatbots. However, many existing chatbots rely on a single method, which often limits their ability to understand user question contexts effectively. This study proposes a hybrid approach integrating Jaccard Similarity and K-Nearest Neighbor (K-NN) to improve answer retrieval accuracy and intent classification in e-commerce chatbot systems. Jaccard Similarity is employed to measure the similarity between user queries and Frequently Asked Questions (FAQ) data, while K-NN is used to determine intent based on the nearest neighbor with the highest similarity values. The dataset, consisting of FAQ questions and answers, is preprocessed through case folding, tokenization, stopword removal, and stemming. System performance is evaluated using accuracy, precision, recall, and F1-score metrics. The experimental results show that Jaccard Similarity effectively selects relevant answer candidates, achieving similarity values of up to 66%, while K-NN produces stable intent classification results. The proposed hybrid model achieved an accuracy of 87%, precision of 86%, recall of 85%, and an F1-score of 85%, outperforming single-method implementations. Furthermore, confidence score analysis indicates that most chatbot responses fall into the high confidence category (>0.70). Rule-based NLP evaluation also provides insights into unclassified inputs, which can be used as a basis for future dataset development. The implementation results demonstrate that the chatbot system can be operated effectively on both customer and admin sides and monitored through analytical features. Overall, the proposed hybrid approach enhances the reliability, relevance, and stability of chatbot responses, making it a practical and effective solution for real-time intent classification and FAQ retrieval in e-commerce customer service environments.

Copyrights © 2026






Journal Info

Abbrev

jurnal

Publisher

Subject

Computer Science & IT

Description

Jurnal Teknik Informatika (JUTIF) is an Indonesian national journal, publishes high-quality research papers in the broad field of Informatics, Information Systems and Computer Science, which encompasses software engineering, information system development, computer systems, computer network, ...