Claim Missing Document
Check
Articles

Found 26 Documents
Search

CatBoost Algorithm Implementation for Classifying Women's Fashion Products Madani, Fadillah; Lubis, Andre Hasudungan
JOURNAL OF INFORMATICS AND TELECOMMUNICATION ENGINEERING Vol. 9 No. 1 (2025): Issues July 2025
Publisher : Universitas Medan Area

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31289/jite.v9i1.15604

Abstract

The rapid growth of the women's fashion industry in the digital era has intensified the need for data-driven approaches to understand customer preferences. This study aims to classify women’s clothing products based on customer reviews by applying CatBoost, a gradient boosting algorithm known for its strong performance with categorical features. The dataset, consisting of 23,486 entries and 11 attributes, was obtained from Kaggle and processed through data cleaning, normalization, exploratory analysis, and model training. Hyperparameter optimization was conducted using Grid Search. Model performance was evaluated using accuracy, precision, recall, and F1-score, and benchmarked against four traditional classifiers: Decision Tree (C4.5), Naïve Bayes, Support Vector Machine (SVM), and K-Nearest Neighbor (KNN). The results show that CatBoost achieved an accuracy of 93.70%, an F1-score of 0.9606, and an AUC of 0.9691, indicating excellent and balanced classification performance. This study demonstrates the effectiveness of CatBoost in handling customer review data and contributes to the development of intelligent product classification systems in the fashion industry
Comparison of KNN and SVM Performance in 2024 Election Results Sentiment Analysis Bukit, M Iqbal Fahilla; Lubis, Andre Hasudungan
Journal of Artificial Intelligence and Software Engineering Vol 5, No 3 (2025): September
Publisher : Politeknik Negeri Lhokseumawe

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30811/jaise.v5i3.7659

Abstract

This study compares the performance of the K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) algorithms in sentiment analysis related to the 2024 election results using data from social media. The dataset used consists of 506 public opinion entries categorized into three sentiment labels: positive, negative, and neutral. The data processing involved preprocessing steps such as case folding, tokenization, stopword removal, and stemming, then represented using the Term Frequency–Inverse Document Frequency (TF-IDF) method. The test results showed that both algorithms were able to classify with an accuracy of over 70%. The KNN algorithm produced an accuracy of 75.49%, precision of 71.36%, recall of 75.49%, and an F1-score of 72.88%, while the SVM algorithm showed slightly better performance with an accuracy of 77.45%, precision of 70.59%, recall of 77.45%, and F1-score of 72.15%. Based on the confusion matrix analysis, both models have a high ability to classify positive sentiments, but still face obstacles in recognizing negative and neutral sentiments due to the imbalance in data distribution. Overall, this study indicates that SVM is more suitable for election sentiment analysis on high-dimensional text data.
Clustering Culinary Locations Using the DBSCAN Algorithm Halawa, Anestin; Lubis, Andre Hasudungan
Journal of Artificial Intelligence and Software Engineering Vol 5, No 3 (2025): September
Publisher : Politeknik Negeri Lhokseumawe

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30811/jaise.v5i3.7512

Abstract

The culinary industry plays a vital role in driving creative economic growth and local tourism while also being an integral part of urban lifestyle. Given the high number and diversity of culinary locations, clustering techniques are needed to group them based on marketing characteristics, enabling more efficient decision-making for both consumers and businesses. This study aims to cluster culinary locations based on marketing-related attributes using the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm. Secondary data was obtained from Kaggle, consisting of restaurant information in Semarang City, with attributes such as rating, number of reviews, and operating hours. After preprocessing and exploratory analysis, DBSCAN was applied with adjusted parameters to generate optimal clusters. The results produced 41 clusters with diverse characteristics, including several outliers detected as noise. Performance evaluation using Silhouette Score and Davies-Bouldin Index showed that DBSCAN achieved more compact and well-separated clusters compared to K-Means. These findings demonstrate that DBSCAN is more adaptive for non-uniform culinary data with varying densities and is suitable for segmentation and strategic decision-making in the culinary industry.
Predicting Burnout in Start-Up Environments: A Multivariate Risk Scoring Approach for Early Managerial Intervention Sutrisno, Nos; Elveny, Maricha; Lubis, Andre Hasudungan; Syah, Rahmad; Hartono, Hartono; Krisdayanti, Sabina
International Journal of Engineering, Science and Information Technology Vol 5, No 4 (2025)
Publisher : Malikussaleh University, Aceh, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52088/ijesty.v5i4.1663

Abstract

Start-up organisations operate under fast timelines, lean staffing, and constantly shifting priorities, exposing employees to chronic workload pressure and emotional strain. Unmanaged burnout in these settings threatens individual well-being, talent retention, and long-term execution capacity. This study proposes a multivariate burnout risk scoring approach that aims to identify and prioritise employees at elevated risk before full deterioration occurs, enabling early managerial intervention rather than reactive recovery. The proposed pipeline integrates principal component analysis (PCA), Random Forest, and Support Vector Machine (SVM). PCA is first applied to reduce redundancy across workplace indicators, yielding five principal components (PC1–PC5) that together explain 88% of the total variance in self-reported stress level, job satisfaction, emotional exhaustion, work-life balance, performance, and social interaction. These components are then used as predictors in two supervised classification models, Random Forest and SVM, to estimate the likelihood that each employee belongs to a high-burnout-risk class. The Random Forest model achieved an accuracy of 88%, and the SVM model achieved an accuracy of 86%, demonstrating strong predictive capability in distinguishing higher-risk employees from lower-risk employees. The resulting predicted probability is interpreted as an individualised burnout risk score, which can be mapped to action categories such as workload redistribution, role clarification, targeted supervisory check-ins, or temporary protection from critical-path tasks. In this way, the framework operationalises burnout prediction not only as a detection task but also as an actionable decision-support signal for leaders. The study therefore offers both a quantitative method for forecasting burnout in start-up environments and a practical structure for translating prediction into preventive intervention.
Klasifikasi Produk Iphone dengan Menggunakan Algoritma XGBoost Sihombing, Stevi Freshia; Pakpahan, Josua Prayuda; Lubis, Andre Hasudungan
Journal of Informatics Management and Information Technology Vol. 5 No. 3 (2025): July 2025
Publisher : Forum Kerjasama Pendidikan Tinggi (FKPT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/jimat.v5i3.649

Abstract

The iPhone has become a symbol of advanced technology and a modern lifestyle that is highly sought after by the global community, including in Indonesia. Known for its stable and exclusive iOS operating system, this product offers seamless cross-device integration, consistent system updates, and high performance through the support of the latest generation of processors. The iPhone also has a visual appeal through a minimalist and elegant design, as well as superior features such as professional camera quality, high-level data security, and power efficiency. The high popularity of the iPhone makes it one of the most competitive products in the smartphone market. However, the diversity of models, features, and prices of each iPhone series causes user preferences to be diverse and complex. In an effort to understand these preferences, an accurate classification method is needed to group products according to consumer appeal. This study adopts the XGBoost algorithm which is known to be effective in handling complex and large data. By utilizing iPhone product sales transaction data in the Indonesian market, this model is designed to identify purchasing patterns and user segmentation. The classification results are expected to provide deeper insights for manufacturers and marketers in formulating more targeted data-based marketing strategies.
Prediksi Produksi Tanaman Padi di Indonesia dengan Menggunakan Algoritma Random Forest Regressor Manurung, Dinikxon; Zealtiel, Billiam; Lubis, Andre Hasudungan
Journal of Computing and Informatics Research Vol 4 No 3 (2025): July 2025
Publisher : Forum Kerjasama Pendidikan Tinggi (FKPT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/comforch.v4i3.2125

Abstract

Produksi padi merupakan komponen utama dalam menjaga ketahanan pangan nasional di Indonesia, mengingat beras adalah makanan pokok mayoritas penduduk. Namun, kestabilan produksi padi sering kali terganggu oleh berbagai faktor, terutama kondisi agronomis dan variabilitas iklim yang sulit diprediksi. Oleh karena itu, diperlukan pendekatan berbasis data yang mampu memodelkan kompleksitas faktor-faktor tersebut secara akurat. Penelitian ini bertujuan untuk membangun model prediksi produksi padi menggunakan algoritma Random Forest Regressor, sebuah metode pembelajaran mesin yang dikenal andal dalam menangani data non-linear dan kompleks. Dataset yang digunakan mencakup parameter pertanian seperti luas panen dan produktivitas, serta data iklim meliputi suhu, kelembaban udara, dan curah hujan, yang dikumpulkan dari sumber terbuka seperti Kaggle dan Badan Meteorologi Klimatologi dan Geofisika (BMKG) untuk rentang tahun 2018 hingga 2024. Metodologi yang diterapkan dalam penelitian ini terdiri dari beberapa tahapan, yaitu prapemrosesan data (penanganan nilai hilang dan normalisasi), analisis data eksploratif untuk memahami pola dan korelasi antar variabel, pelatihan model prediksi, serta evaluasi performa model menggunakan metrik Mean Squared Error (MSE) dan R-squared (R²). Hasil penelitian menunjukkan bahwa konfigurasi terbaik diperoleh saat data dibagi dengan rasio pelatihan dan pengujian sebesar 90:10, serta penggunaan 200 decision tree dalam model. Konfigurasi ini menghasilkan nilai MSE sebesar 0.0004 dan R² sebesar 0.9918, yang mengindikasikan tingkat akurasi prediksi yang sangat tinggi serta kemampuan model dalam merepresentasikan hubungan antar variabel dengan baik. Penelitian ini menunjukkan bahwa Random Forest Regressor efektif dalam memprediksi produksi padi dan berpotensi menjadi alat bantu pengambilan keputusan strategis bagi pemangku kepentingan di sektor pertanian.