IT JOURNAL RESEARCH AND DEVELOPMENT
Vol. 10 No. 2 (2025)

Job Classification Based on Skills and Qualifications Using Natural Language Processing and Ensemble Learning Methods

Oktasia Nasution, Hafiza (Unknown)
Ramadhani, Dian (Unknown)
Aprilina Tarigan, Mida (Unknown)
Andreas, Prima (Unknown)
Suryati Ningsih, Dewita (Unknown)
Pramadewi, Arwinence (Unknown)



Article Info

Publish Date
12 Mar 2026

Abstract

This study proposes a job classification framework using Natural Language Processing (NLP) and Ensemble Learning to classify job roles based on required skills and qualifications. A large-scale open-source dataset containing 1.048.576 job postings was utilized, with attributes such as job title, qualifications, skills, company profile, and role. Only relevant attributes were used: skills and qualifications as input features, and role as the target label. Data were filtered to focus on three major job roles—Management, IT, and Digital—resulting in 489,651 relevant entries. Skills were extracted and standardized using GROK AI before feature transformation with MultiLabelBinarizer for one-hot encoding. The XGBoost algorithm was applied for classification under multiple data split configurations (70:15:15, 80:10:10, 70:30, 80:20, 90:10) with random_state=42 and multi-class log loss evaluation. Results showed that the 90:10 configuration achieved the highest accuracy (74.18%), followed by 80:20 with 68.44%. This research demonstrates that ensemble learning effectively handles high-dimensional categorical job data and provides a foundation for automated job classification systems and labor market analysis.

Copyrights © 2025






Journal Info

Abbrev

ITJRD

Publisher

Subject

Computer Science & IT Control & Systems Engineering Engineering

Description

Information Technology Journal Research and Development (ITJRD) adalah Jurnal Ilmiah yang dibangun oleh Prodi Teknik Informatika, Universitas Islam Riau untuk memberikan sarana bagi para akademisi dan peneliti untuk mempublikasikan tulisan dan karya ilmiah di Bidang Teknologi Informatika. Adapun ...