Garuda - Garba Rujukan Digital

JUITA : Jurnal Informatika

JUITA Vol. 14 Issue 1, March 2026

Akhmad Arif Faisal Siregar (Universitas Amikom Yogyakarta)
Ema Utami (Universitas Amikom Yogyakarta)
Tika Novita Sari (Universitas Amikom Yogyakarta, Yogyakarta)

Publish Date
31 Mar 2026

Employee attrition prediction remains a longstanding challenge in human resource analytics, as organizations increasingly depend on computational decision-support systems that are transparent, consistent, and operationally accountable. Conventional methods that rely solely on numerical attributes are restricted in their ability to accurately capture the structural and contextual relationships inherent in categorical and text-based employee descriptors. To overcome this limitation, the current study investigates a hybrid Case-Based Reasoning (CBR) retrieval framework that combines K-Nearest Neighbors (KNN) with Word2Vec embeddings derived from the dataset's limited textual attributes, specifically Department, Gender, EducationField, MaritalStatus, and OverTime. Eight experimental configurations were assessed to examine the impact of alternative similarity metrics and diverse feature representations. The optimal configuration of KNN, enhanced with Word2Vec embeddings and cosine similarity, attained an accuracy of 0.8526 and a weighted F1-score of 0.8000, thereby exceeding the performance of baseline models based solely on numerical features and those utilizing Manhattan distance. Nonetheless, the improvements in performance remained limited owing to dataset-specific limitations, such as class imbalance and the inherently superficial characteristics of the textual descriptors, which restrict the semantic richness of Word2Vec embeddings. Furthermore, the IBM attrition dataset does not encompass downsizing or termination situations, highlighting conceptual and ethical constraints when utilizing similarity-based predictions for high-stakes HR decisions. Overall, the findings indicate that hybrid similarity representations, particularly the combination of Word2Vec embeddings with cosine distance, can improve the structural expressiveness of CBR, although their predictive effectiveness is still limited by data sparsity and considerations of fairness.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

JUITA : Jurnal Informatika

Website

Abbrev

JUITA

Publisher

Universitas Muhammadiyah Purwokerto

Subject

Computer Science & IT

Description

UITA: Jurnal Informatika is a science journal and informatics field application that presents articles on thoughts and research of the latest developments. JUITA is a journal peer reviewed and open access. JUITA is published by the Informatics Engineering Study Program, Universitas Muhammadiyah ...

Article Info

Abstract

A Hybrid Case-Based Reasoning Framework Using KNN, Word2Vec, and Cosine Similarity for Employee Attrition Analysis

Article Info

Abstract