Claim Missing Document
Check
Articles

Found 1 Documents
Search

An Adaptive Random Forest for Data Stream Sentiment Classification under Concept Drift Arkana, Brian Farrel; Sudianto, Sudianto; Isnaeni, Nenen
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1153

Abstract

Data labeling plays a crucial role in determining the performance of machine learning models, especially in data stream environments where concept drift frequently occurs. The primary objective of this study is to analyze the effectiveness of adaptive learning models in managing dynamic data distribution changes and to evaluate the influence of different labeling strategies on sentiment classification performance using user reviews from the OVO mobile application. The research contributes to understanding how labeling approaches interact with adaptive modeling under real-time data stream conditions. Two labeling methods were employed: score-based labeling derived from user ratings and content-based labeling generated automatically using the IndoRoBERTa language model. These labeled data streams were evaluated using two classifiers: a conventional Random Forest model and an Adaptive Random Forest model designed to handle evolving data distributions. The evaluation was conducted through streaming experiments that continuously fed new review data to simulate real-world drift scenarios. The results reveal that in the score-based labeling scenario, the conventional Random Forest model’s accuracy gradually declined, reaching a final accuracy of 31%, while the Adaptive Random Forest achieved 80%, reflecting a 49% performance gap. In the content-based labeling scenario, both models improved over time, with final accuracies of 57% for Random Forest and 76% for the adaptive model, resulting in a 19% difference. These findings indicate that Adaptive Random Forest is more robust in adapting to distributional and temporal changes in data streams regardless of the labeling strategy used. This study implies that combining adaptive learning with semantically rich labeling approaches can substantially enhance model reliability in real-time sentiment analysis tasks. Future research may further explore hybrid adaptive mechanisms to improve the resilience of data stream classification models across various domains.