Claim Missing Document
Check
Articles

Extreme Gradient Boosting Algorithm to Improve Machine Learning Model Performance on Multiclass Imbalanced Dataset Pristyanto, Yoga; Mukarabiman, Zulfikar; Nugraha, Anggit Ferdita
JOIV : International Journal on Informatics Visualization Vol 7, No 3 (2023)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.7.3.1102

Abstract

Unbalanced conditions in the dataset often become a real-world problem, especially in machine learning. Class imbalance in the dataset is a condition where the number of minority classes is much smaller than the majority class, or the number is insufficient. Machine learning models tend to recognize patterns in the majority class more than in the minority class. This problem is one of the most critical challenges in machine learning research, so several methods have been developed to overcome it. However, most of these methods only focus on binary datasets, so few methods still focus on multiclass datasets. Handling unbalanced multiclass is more complex than handling unbalanced binary because it involves more classes than binary class datasets. With these problems, we need an algorithm with features that can support adjustments to the difficulties that arise in multiclass unbalanced datasets. One of the algorithms that have features for adjustment is the ensemble algorithm, namely Xtreme Gradient Boosting. Based on the research, our proposed method with Xtreme Gradient Boosting showed better results than the other classification and ensemble algorithms on eight datasets with five evaluation metrics indicators such as balanced accuracy, the geometric-mean, multiclass area under the curve, true positive rate, and true negative rate. In future research, we suggest combining methods at the data level and Xtreme Gradient Boosting. With the performance increase in Xtreme Gradient Boosting, it can be a solution and reference in the case of handling multiclass imbalanced problems. Besides, we also recommended testing with datasets in the form of categorical and continuous data.
Performance Evaluation of Machine Learning Models for Soil Fertility Classification Based on the Indian Soil Fertility Dataset Yoga Pristyanto; Ibrahim Aji Fajar Romadhon; Nugraha, Anggit Ferdita; Nurmasani, Atik; Wulandari, Irma Rofni
Edu Komputika Journal Vol. 12 No. 1 (2025): Edu Komputika Journal
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/edukom.v12i1.10317

Abstract

Rice farming productivity worldwide has been declining due to improper soil management practices, including excessive chemical fertilizer use and irregular irrigation. The main challenge lies in accurately classifying soil fertility levels to support optimal land use and reduce resource waste, especially when dealing with imbalanced datasets. This study aims to compare the performance of single classifiers and ensemble classifiers in classifying soil fertility. The single classifiers used include K-Nearest Neighbor (KNN), Naive Bayes, Decision Tree, Support Vector Machine (SVM), and Artificial Neural Network (ANN), while the ensemble classifiers include Random Forest and XGBoost. The Indian Soil Fertility Dataset, obtained from Kaggle, contains 880 samples with 12 features and 1 output class. The research methodology involved data acquisition, preprocessing, data splitting, standardization, and classification, with performance evaluation conducted using a confusion matrix. The results show that ensemble classifiers, particularly Random Forest and XGBoost, outperform single classifiers in imbalanced datasets, achieving accuracy, precision, recall, and F1-score values exceeding 92%-95% across all split scenarios. The findings conclude that Random Forest and XGBoost can serve as reliable models for assisting farmers and agricultural experts in evaluating soil conditions, minimizing unnecessary fertilizer usage, and improving rice farming productivity globally.
Analisis Sentimen Pengguna Twitter Terhadap Layanan Internet Provider Menggunakan Algoritma Support Vector Machine Fadhilah Dwi Ananda; Yoga Pristyanto
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol. 20 No. 2 (2021)
Publisher : Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v20i2.1130

Abstract

Media sosial saat ini merupakan media komunikasi yang sering digunakan oleh kalangan masyarakat Indonesia dalam menyampaikan sebuah opini. Salah satu media yang sering digunakan masyarat adalah twitter. Twitter merupakan media sosial yang memberikan banyak informasi melalui tweet, dari informasi yang ditulis tersebut terdapat data yang dapat diolah. Penelitian ini menggunakan teknik text mining dengan menerapkan algoritma Support Vector Machine dipergunakan untuk klasifikasi sentimen pengguna twitter terhadap layanan internet Biznet. Kernel yang digunakan adalah kernel Linear dan kernel RBF. Pengujian dilakukan dengan 3 skenario, pada skenario 1 menggunakan 800 data, skenario 2 menggunakan 900 data dan skenario 3 menggunakan 1000 data, untuk pembagiannya yaitu 90% data training dan 10% data testing dari masing-masing skenario. Berdasarkan hasil pengujian yang dilakukan menggunakan kernel linear dan kernel RBF dapat diambil kesimpulan sebagai berikut. Algoritma SVM menggunakan dengan kernel linear maupun kernel RBF memiliki hasil kinerja evaluasi baik dari sisi akurasi, presisi dan recall yang relatif sama. Sehingga dapat dikatakan bahwa algoritma SVM baik dengan kernel RBF maupun Linear sama sama dapat digunakan dengan baik dalam menentukan sentimen pengguna internet Biznet. Selain itu dengan 3 skenario pengujian dengan jumlah data yang berbeda algoritma SVM baik dengan kernel RBF maupun Linear sama sama konsisten kinerjanya.
The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance Cherfly Kaope; Yoga Pristyanto
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol. 22 No. 2 (2023)
Publisher : Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v22i2.2515

Abstract

Class imbalance is a condition where the amount of data in the minority class is smaller than that of the majority class. The impact of the class imbalance in the dataset is the occurrence of minority class misclassification, so it can affect classification performance. Various approaches have been taken to deal with the problem of class imbalances such as the data level approach, algorithmic level approach, and cost-sensitive learning. At the data level, one of the methods used is to apply the sampling method. In this study, the ADASYN, SMOTE, and SMOTE-ENN sampling methods were used to deal with the problem of class imbalance combined with the AdaBoost, K-Nearest Neighbor, and Random Forest classification algorithms. The purpose of this study was to determine the effect of handling class imbalances on the dataset on classification performance. The tests were carried out on five datasets and based on the results of the classification the integration of the ADASYN and Random Forest methods gave better results compared to other model schemes. The criteria used to evaluate include accuracy, precision, true positive rate, true negative rate, and g-mean score. The results of the classification of the integration of the ADASYN and Random Forest methods gave 5% to 10% better than other models.
Investigating The Effectiveness of Various Convolutional Neural Network Model Architectures for Skin Cancer Melanoma Classification Rizky Hafizh Jatmiko; Yoga Pristyanto
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol. 23 No. 1 (2023)
Publisher : Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v23i1.3185

Abstract

Melanoma is one of the most dangerous types of skin cancer. Since 2018, the number of skin cancer cases in the US has increased and exceeded 100,000. Melanoma is the third most common cancer in Indonesia, following womb cancer and breast cancer. Standard detection of melanoma skin cancer biopsy is costly and time-consuming. The purpose of this research is to build and compare melanoma skin cancer detection using various Convolutional Neural Network method. This research used four CNN model architectures methods, VGG-16, LeNet, Xception, and MobileNet. The dataset for this research is image data that consists of 9605 data divided into benign and malignant classes. The data will be augmented to increase its quantity. After that, the data will be trained using four CNN architecture models and evaluated using the confusion matrix. The result of this study is that Xception model has the best accuracy and the lowest loss, with 93% accuracy and 19% loss, with precision 93%, recall 93,5%, and f1-score 93%. Whereas the other model, VGG-16 gives 90 % accuracy, 27% loss, LeNet 89,7% accuracy, 28% loss, and mobileNet 90,8% accuracy and 22,5% loss.
Aceh Province Tourism Destination Recommendation System using Content based Filtering Method Maulana, Ariefhan; Rohman, Arif Nur; Pristyanto, Yoga
SISTEMASI Vol 15, No 2 (2026): Sistemasi: Jurnal Sistem Informasi
Publisher : Program Studi Sistem Informasi Fakultas Teknik dan Ilmu Komputer

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32520/stmsi.v15i2.5500

Abstract

Tourists often experience difficulties in finding tourist destinations in Aceh Province that match their content preferences and are geographically close to their location. This study aims to develop a tourism destination recommendation system in Aceh Province using a Content-Based Filtering approach with the Cosine Similarity algorithm and the Haversine Formula. The dataset consists of 119 tourist destinations, including attributes such as destination name, destination description, and geographical coordinates (latitude and longitude). The research process began with text data preprocessing, which included case folding, punctuation removal, tokenization, duplicate word removal, stopword removal, and stemming. Next, the similarity between destinations was calculated using the Cosine Similarity algorithm based on tourism content descriptions, while the Haversine Formula was applied to measure the geographical distance between the user’s location and the tourist destinations. The results indicate that the developed system is able to provide relevant tourism destination recommendations by simultaneously considering content relevance and geographical proximity. Therefore, the system can assist tourists in selecting destinations that best match their preferences.
Co-Authors Acihmah Sidauruk Aditya Yoga Pratama Afrig Aminuddin Aisha Shakila Iedwan Akhmad Dahlan Alvin Rahman Al Musyaffa Andi Sunyoto Anggi Thoat Ariyanto Anggit Dwi Hartanto Anggit Dwi Hartanto Anggit Dwi Hartanto, Anggit Dwi Anggita, Sharazita Dyah Anna Baita arif nur rohman Arif Nur Rohman Asti Astuti, Ika Atik Nurmasani Atik Nurmasani ATIK NURMASANI Barus, Herianta Bety Wulan Sari Bety Wulan Sari, Bety Wulan Bligania Bligania Cherfly Kaope Donni Prabowo, Donni Dwi Hartanto, Anggit Dyah Anggita, Sharazita Eli Pujastuti, Eli Eza Nanda Fadhilah Dwi Ananda Fajri, Ika Nur Fauzy, Marwan Noor Gagah Gumelar Gita Cahyani Hendra Kurniawan Heri Sismoro Hidayat, Kardilah Rohmat Ibnu Hadi Purwanto Ibrahim Aji Fajar Romadhon Iedwan, Aisha Shakila Ike Verawati Ikmah Ikmah Irfan Pratama Istikomah Khoiruddin, Lukman Kono, Maria Fatima Kristianti, Fanny Novatriana Lucky Adhikrisna Wirasakti Mambaul Hisam Marcheilla Trecya Anindita Maulana, Ariefhan Mauliza, Nia Mukarabiman, Zulfikar Mulia Sulistiyono Nia Mauliza Nia Mauliza Nugraha, Anggit Ferdita Nuri Cahyono Nurindah A Amari Purwati, Sintia Eka Putra, Frahma Aditya Rahman Saputra, Rahman Rifda Faticha Alfa Aziza Rizky Hafizh Jatmiko Rohmad Fajarudin Rohman, Arif Nur Romadhon, Ibrahim Aji Fajar Rospita, Andri Sabella, Cindy Dinda Sifa’ul Husna, Siti Okta Sumarni Adi Windarni, Vikky Aprelia Wirantanu, Dipa Wirasakti, Lucky Adhikrisna Wiwi Widayani Wulandari, Irma Rofni Yanuar Nur Kholik Yudiyanto, Muhammad Resa Arif Yuli Astuti Zein, Aditya Ahmad