Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Indonesian Journal of Electrical Engineering and Computer Science

Experimental of information gain and AdaBoost feature for machine learning classifier in media social data Jasmir, Jasmir; Abidin, Dodo Zaenal; Fachruddin, Fachruddin; Riyadi, Willy
Indonesian Journal of Electrical Engineering and Computer Science Vol 36, No 2: November 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v36.i2.pp1172-1181

Abstract

In this research, we use several machine learning methods and feature selection to process social media data, namely restaurant reviews. The selection feature used is a combination of information gain (IG) and adaptive boosting (AdaBoost) which is used to see its effect on the classification performance evaluation value of machine learning methods such as Naïve Bayes (NB), K-nearest neighbor (KNN), and random forest (RF) which is the aim of this research. NB is very simple and efficient and very sensitive to feature selection. Meanwhile, KNN is known for its weaknesses such as biased k values, overly complex computation, memory limitations, and ignoring irrelevant attributes. Then RF has weaknesses, including that the evaluation value can change significantly with only small data changes. In text classification, feature selection can improve the scalability, efficiency and accuracy of text classification. Based on tests that have been carried out on several machine learning methods and a combination of the two selection features, it was found that the best classifier is the RF algorithm. RF produces a significant increase in value after using the IG and AdaBoost features. Increased accuracy by 10%, precision by 12.43%, recall by 8.14% and F1-score by 10.37%. RF also produces even accuracy, precision, recall, and F1-score values after using IG and AdaBoost with an accuracy value of 84.5%; precision of 85.58%; recall was 86.36%; and F1-score was 85.97%.
Comparison of robust machine learning algorithms on outliers and imbalanced spam data Abidin, Dodo Zaenal; Jasmir, Jasmir; Rasywir, Errisya; Siswanto, Agus
Indonesian Journal of Electrical Engineering and Computer Science Vol 39, No 2: August 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v39.i2.pp1130-1144

Abstract

Effective spam detection is essential for data security, user experience, and organizational trust. However, outliers and class imbalance can impact machine learning models for spam classification. Previous studies focused on feature selection and ensemble learning but have not explicitly examined their combined effects. This study evaluates the performance of random forest (RF), gradient boosting (GB), and extreme gradient boosting (XGBoost) under four experimental scenarios: (i) without synthetic minority over-sampling technique (SMOTE) and outliers, (ii) without SMOTE but with outliers, (iii) with SMOTE and without outliers, and (iv) with SMOTE and with outliers. Results show that XGBoost achieves the highest accuracy (96%), an area under the curve-receiver operating characteristic (AUCROC) of 0.9928, and the fastest computation time (0.6184 seconds) under the SMOTE and outlier-free scenario. Additionally, RF attained an AUCROC of 0.9920, while GB achieved 0.9876 but required more processing time. These findings emphasize the need to address class imbalance and outliers in spam detection models. This study contributes to developing more robust spam filtering techniques and provides a benchmark for future improvements. By systematically evaluating these factors, it lays a foundation for designing more effective spam detection frameworks adaptable to real-world imbalanced and noisy data conditions.