This study proposes a job classification framework using Natural Language Processing (NLP) and Ensemble Learning to classify job roles based on required skills and qualifications. A large-scale open-source dataset containing 1.048.576 job postings was utilized, with attributes such as job title, qualifications, skills, company profile, and role. Only relevant attributes were used: skills and qualifications as input features, and role as the target label. Data were filtered to focus on three major job roles—Management, IT, and Digital—resulting in 489,651 relevant entries. Skills were extracted and standardized using GROK AI before feature transformation with MultiLabelBinarizer for one-hot encoding. The XGBoost algorithm was applied for classification under multiple data split configurations (70:15:15, 80:10:10, 70:30, 80:20, 90:10) with random_state=42 and multi-class log loss evaluation. Results showed that the 90:10 configuration achieved the highest accuracy (74.18%), followed by 80:20 with 68.44%. This research demonstrates that ensemble learning effectively handles high-dimensional categorical job data and provides a foundation for automated job classification systems and labor market analysis.
Copyrights © 2025