Sembiring, Joanne Polama Putri
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

In-Situ Database Machine Learning: Evaluating SQL-Based K-Means for E-Commerce Sales Analysis Sembiring, Joanne Polama Putri; Yunmar, Rajif Agung
Journal of Information System Research (JOSH) Vol 7 No 1 (2025): Oktober 2025
Publisher : Forum Kerjasama Pendidikan Tinggi (FKPT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/josh.v7i1.8468

Abstract

Conventional machine learning techniques, such as K-Means clustering, often necessitate transferring data outside the database for analysis, which introduces inefficiencies, potential data inconsistencies, or security and privacy concerns. This research proposes an in-situ database machine learning approach by implementing the K-Means clustering algorithm directly within the database management system through using stored procedure. The methodology comprises five main stages: collection of public datasets (from Kaggle), data preparation and cleaning, transformation of data through cyclical feature encoding for temporal context, in-database K-Means implementation, and performance evaluation. The evaluation utilized the Silhouette Score metric and execution time to compare the proposed in-situ approach with a conventional off-database implementation. The in-situ database clustering achieved an optimal Silhouette Score of S ≈ 0.914 in a remarkably short time of 0.0121 seconds. In comparison, the conventional off-database clustering achieved an identical quality score, but required a significantly longer execution time of 1.2956 seconds. This means that, to achieve the exact same cluster quality, the in-situ method is approximately 107.07 times faster than the off-database method. The identical score confirms the mathematical correctness of the SQL-based implementation and indicates excellent cluster quality. The findings of this study demonstrate that the in-situ database clustering approach is a superior methodology. This exceptional efficiency, validated by the successful categorization of e-commerce sales data into distinct demand patterns, lays a strong foundation for developing more effective and efficient predictive analytical strategies and data-driven decision-making, particularly for inventory planning.