Claim Missing Document
Check
Articles

Found 2 Documents
Search

Comparative Data Resample to Predict Subscription Services Attrition Using Tree-based Ensembles Okpor, Margaret Dumebi; Aghware, Fidelis Obukohwo; Akazue, Maureen Ifeanyi; Ojugo, Arnold Adimabua; Emordi, Frances Uche; Odiakaose, Christopher Chukwufunaya; Ako, Rita Erhovwo; Geteloma, Victor Ochuko; Binitie, Amaka Patience; Ejeh, Patrick Ogholuwarami
Journal of Fuzzy Systems and Control Vol. 2 No. 2 (2024): Vol. 2, No. 2, 2024
Publisher : Peneliti Teknologi Teknik Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.59247/jfsc.v2i2.213

Abstract

The digital market today, is rippled with a variety of goods/services that promote monetization and asset exchange with clients constantly seeking improved alternatives at lowered cost to meet their value demands. From item upgrades to their replacement, businesses are poised with retention strategies to help curb the challenge of customer attrition. Such strategies include the upgrade of goods and services at lesser cost and targeted improved value chains to meet client needs. These are found to improve client retention and better monetization. The study predicts customer churn via tree-based ensembles with data resampling such as the random-under-sample, synthetic minority oversample (SMOTE), and SMOTE-edited nearest neighbor (SMOTEEN). We chose three (3) tree-based ensembles namely: (a) decision tree, (b) random forest, and (c) extreme gradient boosting – to ensure we have single and ensemble classifier(s) to assess how well bagging and boosting modes perform on consumer churn prediction. With chi-square feature selection mode, the Decision tree model yields an accuracy of 0.9973, F1 of 0.9898, a precision of 0.9457, and a recall of 0.9698 respectively; while Random Forest yields an accuracy of 0.9973, F1 of 0.9898, precision 0.9457, and recall 0.9698 respectively. The XGBoost outperformed both Decision tree and Random Forest classifiers with an accuracy of 0.9984, F1 of 0.9945, Precision of 0.9616, and recall of 0.9890 respectively – which is attributed to its use of hyper-parameter tuning on its trees. We also note that SMOTEEN data balancing outperforms other data augment schemes with retention of a 30-day moratorium period for our adoption of the recency-frequency-monetization to improve monetization and keep business managers ahead of the consumer attrition curve.
Pilot Study on Enhanced Detection of Cues over Malicious Sites Using Data Balancing on the Random Forest Ensemble Okpor, Margaret Dumebi; Aghware, Fidelis Obukohwo; Akazue, Maureen Ifeanyi; Eboka, Andrew Okonji; Ako, Rita Erhovwo; Ojugo, Arnold Adimabua; Odiakaose, Christopher Chukwufunaya; Binitie, Amaka Patience; Geteloma, Victor Ochuko; Ejeh, Patrick Ogholuwarami
Journal of Future Artificial Intelligence and Technologies Vol. 1 No. 2 (2024): September 2024
Publisher : Future Techno Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62411/faith.2024-14

Abstract

The digital revolution frontiers have rippled across society today – with various web content shared online for users as they seek to promote monetization and asset exchange, with clients constantly seeking improved alternatives at lowered costs to meet their value demands. From item upgrades to their replacement, businesses are poised with retention strategies to help curb the challenge of customer attrition. The birth of smartphones has proliferated feats such as mobility, ease of accessibility, and portability – which, in turn, have continued to ease their rise in adoption, exposing user device vulnerability as they are quite susceptible to phishing. With users classified as more susceptible than others due to online presence and personality traits, studies have sought to reveal lures/cues as exploited by adversaries to enhance phishing success and classify web content as genuine and malicious. Our study explores the tree-based Random Forest to effectively identify phishing cues via sentiment analysis on phishing website datasets as scrapped from user accounts on social network sites. The dataset is scrapped via Python Google Scrapper and divided into train/test subsets to effectively classify contents as genuine or malicious with data balancing and feature selection techniques. With Random Forest as the machine learning of choice, the result shows the ensemble yields a prediction accuracy of 97 percent with an F1-score of 98.19% that effectively correctly classified 2089 instances with 85 incorrectly classified instances for the test-dataset.