Journal of Applied Data Sciences
Vol 6, No 2: MAY 2025

Unveiling Hybrid Model with Naive Bayes, Deep Learning, Logistic Regression for Predicting Customer Churn and Boost Retention

Subramanian, Devibala (Unknown)
Ajitha, Ajitha (Unknown)
Maidin, Siti Sarah (Unknown)



Article Info

Publish Date
30 Apr 2025

Abstract

The telecommunications sector is rapidly evolving but is increasingly challenged by customer churn, where subscribers switch to competing service providers. This study introduces a hybrid model for churn prediction and customer retention by combining machine learning methods—Naive Bayes, Deep Learning, and Logistic Regression—with sentiment analysis on user-generated content (UGC). Data was gathered through two primary sources: survey responses and 352 social media comments from users aged 20–35. The survey data was enriched with features such as gender, age, subscription period, complaints, and retention efforts. The preprocessing steps included handling missing values, scaling features, and encoding categorical variables to ensure model robustness. Experimental results demonstrated that Logistic Regression achieved the highest accuracy (88.45%) and sensitivity (91.33%) in detecting potential churners. The PCA-based approach followed closely with an accuracy of 86.77% and a balanced sensitivity-specificity profile (89.95% and 83.58%, respectively), effectively capturing key churn indicators. Random Forest and Decision Tree classifiers yielded lower sensitivity but remained strong in specificity, indicating their suitability for identifying loyal customers. Attribute weight analysis across models revealed that subscription plan, age, and retention effort were consistently influential in churn prediction. Furthermore, the integration of sentiment analysis provided emotional context to churn behavior, with negative comments triggering alerts for proactive engagement. The study highlights the predictive strength of combining structured survey data and unstructured UGC through machine learning and sentiment analytics. It underscores the importance of personalized retention strategies based on model interpretability and correlation weight findings. This hybrid approach equips telecom companies with actionable insights to minimize churn and sustain customer loyalty in a competitive market.

Copyrights © 2025






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...