JURTEKSI
Vol. 11 No. 4 (2025): September

COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR COSMETIC SALES PREDICTION ON TOKOPEDIA

Sahira, Mutia (Unknown)
Tania, Ken Ditha (Unknown)
Afrina, Mira (Unknown)



Article Info

Publish Date
30 Sep 2025

Abstract

Abstract: The rapid growth of the cosmetics industry on e-commerce platforms has intensified competition, creating a critical need for effective, data-driven marketing strategies. This study aims to conduct a comparative analysis of machine learning algorithms to predict the sales categories (High, Medium, Low) of cosmetic products on the Tokopedia marketplace. Four classification models; Random Forest, XGBoost, Logistic Regression, and Naive Bayes were trained and evaluated on data collected via web scraping. The methodology incorporates the Synthetic Minority Over-sampling Technique (SMOTE) to address significant class imbalance and GridSearchCV for hyperparameter optimization to ensure a fair and robust comparison. The experimental results conclusively show that the Random Forest model achieved the best performance, yielding the highest F1-Score Macro Average of 0.75 and an accuracy of 85.3%. The superior model was subsequently implemented in a simple recommendation system to simulate optimal discount strategies, demonstrating its practical utility in providing actionable insights for business decisions. Keywords: classification; comparative analysis; machine learning; sales prediction; SMOTE Abstrak: Pertumbuhan pesat industri kosmetik pada platform e-commerce telah membuat persaingan ketat, sehingga menciptakan kebutuhan krusial akan strategi pemasaran yang efektif dan berbasis data. Penelitian ini bertujuan untuk melakukan analisis komparatif terhadap algoritma machine learning untuk memprediksi kategori penjualan (Tinggi, Sedang, Rendah) produk kosmetik di marketplace Tokopedia. Empat model klasifikasi, yaitu Random Forest, XGBoost, Regresi Logistik, dan Naive Bayes, dilatih dan dievaluasi menggunakan data yang dikumpulkan melalui web scraping. Metodologi penelitian ini menerapkan Synthetic Minority Over-sampling Technique (SMOTE) untuk mengatasi ketidakseimbangan kelas yang signifikan dan GridSearchCV untuk optimisasi hyperparameter guna memastikan perbandingan yang adil. Hasil eksperimen menunjukkan bahwa model Random Forest mencapai performa terbaik, dengan menghasilkan F1-Score Macro Average tertinggi sebesar 0,75 dan akurasi 85,3%. Model unggul ini kemudian diimplementasikan dalam sebuah sistem rekomendasi sederhana untuk menyimulasikan strategi diskon yang optimal, yang menunjukkan kegunaan praktisnya dalam memberikan wawasan yang dapat ditindaklanjuti untuk pengambilan keputusan bisnis. Kata kunci: analisis komparatif; klasifikasi; machine learning; prediksi penjualan; SMOTE

Copyrights © 2025






Journal Info

Abbrev

jurteksi

Publisher

Subject

Computer Science & IT

Description

JURTEKSI (Jurnal Teknologi dan Sistem Informasi) is a scientific journal which is published by STMIK Royal Kisaran. This journal published twice a year on December and June. This journal contains a collection of research in information technology and computer ...