Mauludiah, Siska Farizah
Unknown Affiliation

Published : 4 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 4 Documents
Search

Struggling Models: An Analysis of Logistic Regression and Random Forest in Predicting Repeat Buyers with Imbalanced Performance Metrics Mauludiah, Siska Farizah; Arif, Yunifa Miftachul; Faisal, Muhammad; Putra, Dony Darmawan
Applied Information System and Management (AISM) Vol 7, No 2 (2024): Applied Information System and Management (AISM)
Publisher : Depart. of Information Systems, FST, UIN Syarif Hidayatullah Jakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15408/aism.v7i2.39326

Abstract

Predicting repeat buyers is essential for businesses seeking to improve customer retention and maximize profitability. This study examines the effectiveness of logistic regression and random forest algorithms in forecasting repeat buyers, utilizing an e-commerce dataset from Kaggle. Despite the theoretical strengths of these models, our results indicate significant performance challenges. Both models were evaluated on key metrics: accuracy, precision, recall, F1 score, and ROC-AUC. The findings revealed that the models logistic regression and random forest performed poorly, with accuracy hovering around 50%, precision and recall demonstrating imbalanced performance, and ROC-AUC scores barely exceeding random guessing levels. Such metrics highlight the limited discriminative power of these models in identifying repeat buyers. The analysis suggests that issues such as data quality, feature relevance, and class imbalance contribute to these shortcomings. Specifically, the models struggled to effectively learn from the data, leading to suboptimal predictions. These results underscore the need for enhanced feature engineering, better handling of class imbalance, and possibly exploring more advanced algorithms. This study provides a critical assessment of the limitations inherent in using Logistic Regression and Random Forest for predicting repeat buyers, hence implements feature engineering, SMOTE and hyperparameter tuning using RandomSearchCV to get better result.
Enhancing Repeat Buyer Classification with Multi Feature Engineering in Logistic Regression Mauludiah, Siska Farizah; Crysdian, Cahyo; Arif, Yunifa Miftachul
Applied Information System and Management (AISM) Vol 8, No 1 (2025): Applied Information System and Management (AISM)
Publisher : Depart. of Information Systems, FST, UIN Syarif Hidayatullah Jakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15408/aism.v8i1.45025

Abstract

This study presents a novel approach to improving repeat buyer classification on e-commerce platforms by integrating Kullback-Leibler (KL) divergence with logistic regression and focused feature engineering techniques. Repeat buyers are a critical segment for driving long-term revenue and customer retention, yet identifying them accurately poses challenges due to class imbalance and the complexity of consumer behavior. This research uses KL divergence in a new way to help choose important features and evaluate the model, making it easier to understand and more effective at classifying repeat buyers, unlike traditional methods. Using a real-world dataset from Indonesian e-commerce with 1,000 records, divided into 80% for training and 20% for testing, the study uses logistic regression along with techniques like SMOTE for oversampling, class weighting, and regularization to fix issues with data imbalance and overfitting. Model performance is assessed using accuracy, precision, recall, F1-score, and KL divergence. Experimental results indicate that the KL-enhanced logistic regression model significantly outperforms the baseline, especially in balancing precision and recall for the minority class of repeat buyers. The unique contribution of this work lies in its synergistic use of KL divergence in both the feature engineering and evaluation phases, offering a robust, interpreted, and data-efficient solution. For e-commerce businesses, the findings translate into improved targeting of high-value customers, better personalization of marketing efforts, and more strategic allocation of resources. This research offers practical tips for enhancing predictive customer analytics and supports data-driven decision-making in digital commerce environments.
Enhancing Repeat Buyer Classification with Multi Feature Engineering in Logistic Regression Mauludiah, Siska Farizah; Crysdian, Cahyo; Arif, Yunifa Miftachul
Applied Information System and Management (AISM) Vol. 8 No. 1 (2025): Applied Information System and Management (AISM)
Publisher : Depart. of Information Systems, FST, UIN Syarif Hidayatullah Jakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15408/aism.v8i1.45025

Abstract

This study presents a novel approach to improving repeat buyer classification on e-commerce platforms by integrating Kullback-Leibler (KL) divergence with logistic regression and focused feature engineering techniques. Repeat buyers are a critical segment for driving long-term revenue and customer retention, yet identifying them accurately poses challenges due to class imbalance and the complexity of consumer behavior. This research uses KL divergence in a new way to help choose important features and evaluate the model, making it easier to understand and more effective at classifying repeat buyers, unlike traditional methods. Using a real-world dataset from Indonesian e-commerce with 1,000 records, divided into 80% for training and 20% for testing, the study uses logistic regression along with techniques like SMOTE for oversampling, class weighting, and regularization to fix issues with data imbalance and overfitting. Model performance is assessed using accuracy, precision, recall, F1-score, and KL divergence. Experimental results indicate that the KL-enhanced logistic regression model significantly outperforms the baseline, especially in balancing precision and recall for the minority class of repeat buyers. The unique contribution of this work lies in its synergistic use of KL divergence in both the feature engineering and evaluation phases, offering a robust, interpreted, and data-efficient solution. For e-commerce businesses, the findings translate into improved targeting of high-value customers, better personalization of marketing efforts, and more strategic allocation of resources. This research offers practical tips for enhancing predictive customer analytics and supports data-driven decision-making in digital commerce environments.
Struggling Models: An Analysis of Logistic Regression and Random Forest in Predicting Repeat Buyers with Imbalanced Performance Metrics Mauludiah, Siska Farizah; Arif, Yunifa Miftachul; Faisal, Muhammad; Putra, Dony Darmawan
Applied Information System and Management (AISM) Vol. 7 No. 2 (2024): Applied Information System and Management (AISM)
Publisher : Depart. of Information Systems, FST, UIN Syarif Hidayatullah Jakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15408/aism.v7i2.39326

Abstract

Predicting repeat buyers is essential for businesses seeking to improve customer retention and maximize profitability. This study examines the effectiveness of logistic regression and random forest algorithms in forecasting repeat buyers, utilizing an e-commerce dataset from Kaggle. Despite the theoretical strengths of these models, our results indicate significant performance challenges. Both models were evaluated on key metrics: accuracy, precision, recall, F1 score, and ROC-AUC. The findings revealed that the models logistic regression and random forest performed poorly, with accuracy hovering around 50%, precision and recall demonstrating imbalanced performance, and ROC-AUC scores barely exceeding random guessing levels. Such metrics highlight the limited discriminative power of these models in identifying repeat buyers. The analysis suggests that issues such as data quality, feature relevance, and class imbalance contribute to these shortcomings. Specifically, the models struggled to effectively learn from the data, leading to suboptimal predictions. These results underscore the need for enhanced feature engineering, better handling of class imbalance, and possibly exploring more advanced algorithms. This study provides a critical assessment of the limitations inherent in using Logistic Regression and Random Forest for predicting repeat buyers, hence implements feature engineering, SMOTE and hyperparameter tuning using RandomSearchCV to get better result.