The Indonesian Journal of Computer Science
Vol. 13 No. 4 (2024): The Indonesian Journal of Computer Science (IJCS)

Question Similarity Detection in Indonesian Language Consumer Health Forums with Feature-based Binary Classification Approach

Irianti, Eka Putri (Unknown)



Article Info

Publish Date
08 Aug 2024

Abstract

Two questions are considered similar if the same response can be given to both. Due to the increase in users of consumer health forums, a growing number of similar questions are not being adequately answered. Identifying duplicate questions in online medical Question Answering (QA) forums offers several advantages for users and medical professionals. Therefore, it is crucial for online medical QA forums to identify similar questions to provide relevant and useful answers. This study examines a feature-based binary classification method for detecting similar questions in the Indonesian consumer health domain. The results indicate that the feature-based classification approach using the CatBoost model yields the best performance. The research also explores techniques to address class imbalance in the dataset, finding that imbalanced learning technique such as ADASYN and SMOTE results in improved classification performance. This study also analyzes discriminative features for identifying semantic similarity between question pairs, concluding that a combination of distance, medical, and encoding features produce the best results.

Copyrights © 2024






Journal Info

Abbrev

ijcs

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering Engineering

Description

The Indonesian Journal of Computer Science (IJCS) is a bimonthly peer-reviewed journal published by AI Society and STMIK Indonesia. IJCS editions will be published at the end of February, April, June, August, October and December. The scope of IJCS includes general computer science, information ...