JUITA : Jurnal Informatika
JUITA Vol. 14 Issue 1, March 2026

A Hybrid Case-Based Reasoning Framework Using KNN, Word2Vec, and Cosine Similarity for Employee Attrition Analysis

Siregar, Akhmad Arif Faisal (Unknown)
Utami, Ema (Unknown)
Sari, Tika Novita (Unknown)



Article Info

Publish Date
31 Mar 2026

Abstract

Employee attrition prediction remains a longstanding challenge in human resource analytics, as organizations increasingly depend on computational decision-support systems that are transparent, consistent, and operationally accountable. Conventional methods that rely solely on numerical attributes are restricted in their ability to accurately capture the structural and contextual relationships inherent in categorical and text-based employee descriptors. To overcome this limitation, the current study investigates a hybrid Case-Based Reasoning (CBR) retrieval framework that combines K-Nearest Neighbors (KNN) with Word2Vec embeddings derived from the dataset's limited textual attributes, specifically Department, Gender, EducationField, MaritalStatus, and OverTime. Eight experimental configurations were assessed to examine the impact of alternative similarity metrics and diverse feature representations. The optimal configuration of KNN, enhanced with Word2Vec embeddings and cosine similarity, attained an accuracy of 0.8526 and a weighted F1-score of 0.8000, thereby exceeding the performance of baseline models based solely on numerical features and those utilizing Manhattan distance. Nonetheless, the improvements in performance remained limited owing to dataset-specific limitations, such as class imbalance and the inherently superficial characteristics of the textual descriptors, which restrict the semantic richness of Word2Vec embeddings. Furthermore, the IBM attrition dataset does not encompass downsizing or termination situations, highlighting conceptual and ethical constraints when utilizing similarity-based predictions for high-stakes HR decisions. Overall, the findings indicate that hybrid similarity representations, particularly the combination of Word2Vec embeddings with cosine distance, can improve the structural expressiveness of CBR, although their predictive effectiveness is still limited by data sparsity and considerations of fairness.

Copyrights © 2026






Journal Info

Abbrev

JUITA

Publisher

Subject

Computer Science & IT

Description

UITA: Jurnal Informatika is a science journal and informatics field application that presents articles on thoughts and research of the latest developments. JUITA is a journal peer reviewed and open access. JUITA is published by the Informatics Engineering Study Program, Universitas Muhammadiyah ...