Claim Missing Document
Check
Articles

Extreme Gradient Boosting Algorithm to Improve Machine Learning Model Performance on Multiclass Imbalanced Dataset Pristyanto, Yoga; Mukarabiman, Zulfikar; Nugraha, Anggit Ferdita
JOIV : International Journal on Informatics Visualization Vol 7, No 3 (2023)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.7.3.1102

Abstract

Unbalanced conditions in the dataset often become a real-world problem, especially in machine learning. Class imbalance in the dataset is a condition where the number of minority classes is much smaller than the majority class, or the number is insufficient. Machine learning models tend to recognize patterns in the majority class more than in the minority class. This problem is one of the most critical challenges in machine learning research, so several methods have been developed to overcome it. However, most of these methods only focus on binary datasets, so few methods still focus on multiclass datasets. Handling unbalanced multiclass is more complex than handling unbalanced binary because it involves more classes than binary class datasets. With these problems, we need an algorithm with features that can support adjustments to the difficulties that arise in multiclass unbalanced datasets. One of the algorithms that have features for adjustment is the ensemble algorithm, namely Xtreme Gradient Boosting. Based on the research, our proposed method with Xtreme Gradient Boosting showed better results than the other classification and ensemble algorithms on eight datasets with five evaluation metrics indicators such as balanced accuracy, the geometric-mean, multiclass area under the curve, true positive rate, and true negative rate. In future research, we suggest combining methods at the data level and Xtreme Gradient Boosting. With the performance increase in Xtreme Gradient Boosting, it can be a solution and reference in the case of handling multiclass imbalanced problems. Besides, we also recommended testing with datasets in the form of categorical and continuous data.
Performance Evaluation of Machine Learning Models for Soil Fertility Classification Based on the Indian Soil Fertility Dataset Yoga Pristyanto; Ibrahim Aji Fajar Romadhon; Nugraha, Anggit Ferdita; Nurmasani, Atik; Wulandari, Irma Rofni
Edu Komputika Journal Vol. 12 No. 1 (2025): Edu Komputika Journal
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/edukom.v12i1.10317

Abstract

Rice farming productivity worldwide has been declining due to improper soil management practices, including excessive chemical fertilizer use and irregular irrigation. The main challenge lies in accurately classifying soil fertility levels to support optimal land use and reduce resource waste, especially when dealing with imbalanced datasets. This study aims to compare the performance of single classifiers and ensemble classifiers in classifying soil fertility. The single classifiers used include K-Nearest Neighbor (KNN), Naive Bayes, Decision Tree, Support Vector Machine (SVM), and Artificial Neural Network (ANN), while the ensemble classifiers include Random Forest and XGBoost. The Indian Soil Fertility Dataset, obtained from Kaggle, contains 880 samples with 12 features and 1 output class. The research methodology involved data acquisition, preprocessing, data splitting, standardization, and classification, with performance evaluation conducted using a confusion matrix. The results show that ensemble classifiers, particularly Random Forest and XGBoost, outperform single classifiers in imbalanced datasets, achieving accuracy, precision, recall, and F1-score values exceeding 92%-95% across all split scenarios. The findings conclude that Random Forest and XGBoost can serve as reliable models for assisting farmers and agricultural experts in evaluating soil conditions, minimizing unnecessary fertilizer usage, and improving rice farming productivity globally.
Analisis Sentimen Pengguna Twitter Terhadap Layanan Internet Provider Menggunakan Algoritma Support Vector Machine Fadhilah Dwi Ananda; Yoga Pristyanto
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol. 20 No. 2 (2021)
Publisher : Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v20i2.1130

Abstract

Media sosial saat ini merupakan media komunikasi yang sering digunakan oleh kalangan masyarakat Indonesia dalam menyampaikan sebuah opini. Salah satu media yang sering digunakan masyarat adalah twitter. Twitter merupakan media sosial yang memberikan banyak informasi melalui tweet, dari informasi yang ditulis tersebut terdapat data yang dapat diolah. Penelitian ini menggunakan teknik text mining dengan menerapkan algoritma Support Vector Machine dipergunakan untuk klasifikasi sentimen pengguna twitter terhadap layanan internet Biznet. Kernel yang digunakan adalah kernel Linear dan kernel RBF. Pengujian dilakukan dengan 3 skenario, pada skenario 1 menggunakan 800 data, skenario 2 menggunakan 900 data dan skenario 3 menggunakan 1000 data, untuk pembagiannya yaitu 90% data training dan 10% data testing dari masing-masing skenario. Berdasarkan hasil pengujian yang dilakukan menggunakan kernel linear dan kernel RBF dapat diambil kesimpulan sebagai berikut. Algoritma SVM menggunakan dengan kernel linear maupun kernel RBF memiliki hasil kinerja evaluasi baik dari sisi akurasi, presisi dan recall yang relatif sama. Sehingga dapat dikatakan bahwa algoritma SVM baik dengan kernel RBF maupun Linear sama sama dapat digunakan dengan baik dalam menentukan sentimen pengguna internet Biznet. Selain itu dengan 3 skenario pengujian dengan jumlah data yang berbeda algoritma SVM baik dengan kernel RBF maupun Linear sama sama konsisten kinerjanya.
The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance Cherfly Kaope; Yoga Pristyanto
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol. 22 No. 2 (2023)
Publisher : Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v22i2.2515

Abstract

Class imbalance is a condition where the amount of data in the minority class is smaller than that of the majority class. The impact of the class imbalance in the dataset is the occurrence of minority class misclassification, so it can affect classification performance. Various approaches have been taken to deal with the problem of class imbalances such as the data level approach, algorithmic level approach, and cost-sensitive learning. At the data level, one of the methods used is to apply the sampling method. In this study, the ADASYN, SMOTE, and SMOTE-ENN sampling methods were used to deal with the problem of class imbalance combined with the AdaBoost, K-Nearest Neighbor, and Random Forest classification algorithms. The purpose of this study was to determine the effect of handling class imbalances on the dataset on classification performance. The tests were carried out on five datasets and based on the results of the classification the integration of the ADASYN and Random Forest methods gave better results compared to other model schemes. The criteria used to evaluate include accuracy, precision, true positive rate, true negative rate, and g-mean score. The results of the classification of the integration of the ADASYN and Random Forest methods gave 5% to 10% better than other models.
Investigating The Effectiveness of Various Convolutional Neural Network Model Architectures for Skin Cancer Melanoma Classification Rizky Hafizh Jatmiko; Yoga Pristyanto
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol. 23 No. 1 (2023)
Publisher : Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v23i1.3185

Abstract

Melanoma is one of the most dangerous types of skin cancer. Since 2018, the number of skin cancer cases in the US has increased and exceeded 100,000. Melanoma is the third most common cancer in Indonesia, following womb cancer and breast cancer. Standard detection of melanoma skin cancer biopsy is costly and time-consuming. The purpose of this research is to build and compare melanoma skin cancer detection using various Convolutional Neural Network method. This research used four CNN model architectures methods, VGG-16, LeNet, Xception, and MobileNet. The dataset for this research is image data that consists of 9605 data divided into benign and malignant classes. The data will be augmented to increase its quantity. After that, the data will be trained using four CNN architecture models and evaluated using the confusion matrix. The result of this study is that Xception model has the best accuracy and the lowest loss, with 93% accuracy and 19% loss, with precision 93%, recall 93,5%, and f1-score 93%. Whereas the other model, VGG-16 gives 90 % accuracy, 27% loss, LeNet 89,7% accuracy, 28% loss, and mobileNet 90,8% accuracy and 22,5% loss.
Aceh Province Tourism Destination Recommendation System using Content based Filtering Method Maulana, Ariefhan; Rohman, Arif Nur; Pristyanto, Yoga
Sistemasi: Jurnal Sistem Informasi Vol 15, No 2 (2026): Sistemasi: Jurnal Sistem Informasi
Publisher : Program Studi Sistem Informasi Fakultas Teknik dan Ilmu Komputer

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32520/stmsi.v15i2.5500

Abstract

Tourists often experience difficulties in finding tourist destinations in Aceh Province that match their content preferences and are geographically close to their location. This study aims to develop a tourism destination recommendation system in Aceh Province using a Content-Based Filtering approach with the Cosine Similarity algorithm and the Haversine Formula. The dataset consists of 119 tourist destinations, including attributes such as destination name, destination description, and geographical coordinates (latitude and longitude). The research process began with text data preprocessing, which included case folding, punctuation removal, tokenization, duplicate word removal, stopword removal, and stemming. Next, the similarity between destinations was calculated using the Cosine Similarity algorithm based on tourism content descriptions, while the Haversine Formula was applied to measure the geographical distance between the user’s location and the tourist destinations. The results indicate that the developed system is able to provide relevant tourism destination recommendations by simultaneously considering content relevance and geographical proximity. Therefore, the system can assist tourists in selecting destinations that best match their preferences.
Hybrid LexRank-LDA-MMR for Indonesian Text Summarization Muis, Nasrul Amin; Pristyanto, Yoga; Fajri, Ika Nur
Jurnal Nasional Teknologi dan Sistem Informasi Vol 12 No 1 (2026): April 2026
Publisher : Departemen Sistem Informasi, Fakultas Teknologi Informasi, Universitas Andalas

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.25077/TEKNOSI.v12i1.2026.97-104

Abstract

The rapid growth of digital text information makes it crystal clear that there is a need for automated tools that summarize text for rapid retrieval. Extractive methods employed include LexRank, Latent Dirichlet Allocation (LDA), and Maximal Marginal Relevance (MMR), and the study aimed at enhancing the quality of Indonesian text summaries with more than just regular LexRank. In this study, the role of LexRank was to assist in selecting meaningful sentences with centricity to the center of the graphs, while the role of LDA was to ensure that the sentences were topically relevant. The strength of MMR is maintaining the document's relevance and diversity, which reduces redundancy in the summaries. Summaries from two publicly available datasets, IndoSum and Liputan6, containing texts in Bahasa Indonesia, were analyzed at 30% and 50% compression levels and graded using ROUGE (ROUGE-1, ROUGE-2, ROUGE-L F1 score) measurements. Analysis of 5000 articles per dataset showed that the implementation of LexRank and LDA together with MMR resulted in a greater average ROUGE score than when using standard LexRank, irrespective of the set compression levels and across both datasets, demonstrating the effectiveness of the approach to enhance summary quality. The improvements recorded are most significant in ROUGE-1 and ROUGE-2, which indicates that these combination approaches can produce more informative and relevant summaries while preserving sentence-level diversity, which deepens the understanding of the information presented in the summary.
A Hybrid Intersection Filtering and Recursive Feature Elimination Technique for Efficient Feature Reduction in High Dimensional Datasets Dahlan, Akhmad; Pristyanto, Yoga; Nugraha, Anggit Ferdita; Aziza, Rifda Faticha Alfa; Purwanto, Ibnu Hadi
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 10 No 2 (2026): April 2026
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29207/resti.v10i2.7396

Abstract

High-dimensional datasets are commonly encountered in real-world machine learning applications and often degrade classification performance due to redundant and irrelevant features. In addition, the presence of excessive features increases computational complexity and processing time. Feature selection is therefore a crucial preprocessing step to improve model accuracy and efficiency. This study proposes a hybrid feature selection approach called Intersection Filtering based on Recursive Feature Elimination with Cross-Validation (IF-RFECV), which integrates wrapper-based and filter-based strategies to obtain a stable and optimal subset of features. The proposed method first applies Recursive Feature Elimination with Cross-Validation (RFECV) using multiple classification models to rank and select relevant features. Subsequently, an intersection filtering mechanism is employed to identify features that are consistently selected across different RFECV-based models, thereby reducing model-dependent bias and improving feature robustness. The effectiveness of IF-RFECV is evaluated using four benchmark datasets with varying dimensionality obtained from the KEEL and UCI repositories. Several classification algorithms, including Gradient Boosting, K-Nearest Neighbor, Naïve Bayes, Decision Tree, Random Forest, and Support Vector Machine, are used to assess model performance. Experimental results demonstrate that IF-RFECV produces a more compact feature subset compared to conventional RFECV while achieving superior performance in terms of accuracy, precision, recall, and F1-score on most datasets, particularly those with higher dimensionality. Although IF-RFECV requires slightly higher computational time due to its two-stage process, the performance gains and improved generalization justify this trade-off. These findings indicate that IF-RFECV is an effective and robust feature selection technique for high-dimensional classification problems.
Stock Price Prediction Using SVR: A Feature Engineering and Hyperparameter Tuning Approach Alfian Ramadhan; Yoga Pristyanto; Anggit Dwi Hartanto; Donni Prabowo
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 10 No 3 (2026): Juni 2026 (in progress)
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29207/resti.v10i3.7249

Abstract

Stock price prediction in Indonesia's volatile mining sector poses significant forecasting challenges driven by commodity price dynamics and structural market shifts. This study proposes a systematic prediction framework for PT Indo Tambangraya Megah Tbk (ITMG.JK) integrating technical and market-derived non-technical feature engineering, LightGBM-based feature selection, multilevel TimeSeriesSplit cross-validation, and hyperparameter optimization. Support Vector Regression (SVR) is benchmarked against LightGBM, XGBoost, and Random Forest under 5-fold, 10-fold, and 15-fold schemes. SVR achieves the best performance at 10-fold, with RMSE of 0.0121, MAE of 0.0090, MAPE of 1.1457%, and R² of 0.9249. Generalization experiments across four additional stocks in banking, automotive, and mining sectors confirm SVR's robustness, maintaining R² above 0.89 and MAPE below 2.65% in all cases while tree-based models produce negative R² on certain datasets. Statistical validation via Wilcoxon signed-rank test (p < 0.05) and Cohen's d (|d| > 0.8) confirms the significance of SVR's advantage. These findings indicate that SVR consistently outperforms the evaluated models under the proposed experimental framework.
Co-Authors Acihmah Sidauruk Aditya Yoga Pratama Afrig Aminuddin Aisha Shakila Iedwan Akhmad Dahlan Alfian Ramadhan Alvin Rahman Al Musyaffa Andi Sunyoto Anggi Thoat Ariyanto Anggit Dwi Hartanto Anggit Dwi Hartanto Anggit Dwi Hartanto Anggit Dwi Hartanto, Anggit Dwi Anggita, Sharazita Dyah Anna Baita Arif Nur Rohman arif nur rohman Asti Astuti, Ika Atik Nurmasani ATIK NURMASANI Atik Nurmasani Barus, Herianta Bety Wulan Sari Bety Wulan Sari, Bety Wulan Bligania Bligania Cherfly Kaope Donni Prabowo Donni Prabowo, Donni Dwi Hartanto, Anggit Dyah Anggita, Sharazita Eli Pujastuti, Eli Eza Nanda Fadhilah Dwi Ananda Fajri, Ika Nur Fauzy, Marwan Noor Gagah Gumelar Gita Cahyani Heri Sismoro Hidayat, Kardilah Rohmat Ibnu Hadi Purwanto Ibrahim Aji Fajar Romadhon Iedwan, Aisha Shakila Ike Verawati Ikmah Ikmah Irfan Pratama Istikomah Khoiruddin, Lukman Kono, Maria Fatima Kristianti, Fanny Novatriana Lucky Adhikrisna Wirasakti Mambaul Hisam Marcheilla Trecya Anindita Maulana, Ariefhan Mauliza, Nia Muis, Nasrul Amin Mukarabiman, Zulfikar Mulia Sulistiyono Nia Mauliza Nia Mauliza Nugraha, Anggit Ferdita Nuri Cahyono Nurindah A Amari Nurwijayanti Purwati, Sintia Eka Putra, Frahma Aditya Rahman Saputra, Rahman Rifda Faticha Alfa Aziza Rizky Hafizh Jatmiko Rohmad Fajarudin Rohman, Arif Nur Romadhon, Ibrahim Aji Fajar Rospita, Andri Sabella, Cindy Dinda Sifa’ul Husna, Siti Okta Sumarni Adi Windarni, Vikky Aprelia Wirantanu, Dipa Wirasakti, Lucky Adhikrisna Wiwi Widayani Wulandari, Irma Rofni Yanuar Nur Kholik Yudiyanto, Muhammad Resa Arif Yuli Astuti Zein, Aditya Ahmad