Siregar, Akhmad Arif Faisal
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

A Hybrid Case-Based Reasoning Framework Using KNN, Word2Vec, and Cosine Similarity for Employee Attrition Analysis Siregar, Akhmad Arif Faisal; Utami, Ema; Sari, Tika Novita
JUITA: Jurnal Informatika JUITA Vol. 14 Issue 1, March 2026
Publisher : Department of Informatics Engineering, Universitas Muhammadiyah Purwokerto

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Employee attrition prediction remains a longstanding challenge in human resource analytics, as organizations increasingly depend on computational decision-support systems that are transparent, consistent, and operationally accountable. Conventional methods that rely solely on numerical attributes are restricted in their ability to accurately capture the structural and contextual relationships inherent in categorical and text-based employee descriptors. To overcome this limitation, the current study investigates a hybrid Case-Based Reasoning (CBR) retrieval framework that combines K-Nearest Neighbors (KNN) with Word2Vec embeddings derived from the dataset's limited textual attributes, specifically Department, Gender, EducationField, MaritalStatus, and OverTime. Eight experimental configurations were assessed to examine the impact of alternative similarity metrics and diverse feature representations. The optimal configuration of KNN, enhanced with Word2Vec embeddings and cosine similarity, attained an accuracy of 0.8526 and a weighted F1-score of 0.8000, thereby exceeding the performance of baseline models based solely on numerical features and those utilizing Manhattan distance. Nonetheless, the improvements in performance remained limited owing to dataset-specific limitations, such as class imbalance and the inherently superficial characteristics of the textual descriptors, which restrict the semantic richness of Word2Vec embeddings. Furthermore, the IBM attrition dataset does not encompass downsizing or termination situations, highlighting conceptual and ethical constraints when utilizing similarity-based predictions for high-stakes HR decisions. Overall, the findings indicate that hybrid similarity representations, particularly the combination of Word2Vec embeddings with cosine distance, can improve the structural expressiveness of CBR, although their predictive effectiveness is still limited by data sparsity and considerations of fairness.