Musababa, Muhammad Adin
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Data Streaming Pipeline Model Using DBSTREAM-Based Online Machine Learning for E-Commerce User Segmentation Musababa, Muhammad Adin; Fachrie, Muhammad
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.11522

Abstract

The rapid development of information technology has driven major transformations in the digital business sector, particularly e-commerce. Consumers who shop at e-commerce sites generally have different characteristics, behaviors, and needs. Analyzing the behavior of each consumer is difficult to do manually, requiring an automation system that can help identify consumer behavior patterns adaptively. However, most customer segmentation approaches still rely on batch learning methods based on static data, making them unable to quickly adapt to changes in user behavior. This study aims to design a streaming data pipeline based on Online Machine Learning (OML) integrated with the Density-Based Clustering for Data Streams (DBSTREAM) algorithm to produce adaptive e-commerce user segmentation. The system was developed using Python with RabbitMQ as a real-time data stream simulator, MongoDB for storing results, and Streamlit as a visualization interface. The clustering process was performed incrementally using DBSTREAM, then stabilized through Hierarchical Agglomerative Clustering (HAC) to avoid over-segmentation. Evaluation using the Silhouette Coefficient and Davies-Bouldin Index (DBI) shows that the optimal model for the cluster threshold is in the range of 0.6 to 0.8 and for the fading factor is 0.0005 or even smaller, such as 0.0003. The evaluation results obtained a Silhouette value of -0.1125 and a DBI of 0.2796. These results prove that DBSTREAM-based OML integration is capable of forming consumer behavior segmentation efficiently and adaptively to continuous and real-time changes in streaming data.