Claim Missing Document
Check
Articles

Found 20 Documents
Search

Using genetic algorithm feature selection to optimize XGBoost performance in Australian credit Pertiwi, Dwika Ananda Agustina; Ahmad, Kamilah; Salahudin, Shahrul Nizam; Annegrat, Ahmed Mohamed; Muslim, Much Aziz
Journal of Soft Computing Exploration Vol. 5 No. 1 (2024): March 2024
Publisher : SHM Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/joscex.v5i1.302

Abstract

To reduce credit risk in credit institutions, credit risk management practices need to be implemented so that lending institutions can survive in the long term. Data mining is one of the techniques used for credit risk management. Where data mining can find information patterns from big data using classification techniques with the resulting level of accuracy. This research aims to increase the accuracy of classification algorithms in predicting credit risk by applying genetic algorithms as the best feature selection method. Thus, the most important feature will be used to search for credit risk information. This research applies a classification method using the XGBoost classifier on the Australian credit dataset, then carries out an evaluation by measuring the level of accuracy and AUC. The results show an increase in accuracy of 2.24%, with an accuracy value of 89.93% after optimization using a genetic algorithm. So, through research on genetic algorithm feature selection, we can improve the accuracy performance of the XGBoost algorithm on the Australian credit dataset.
A new CNN model integrated in onion and garlic sorting robot to improve classification accuracy Lestari, Apri Dwi; Khan, Atta Ullah; Pertiwi, Dwika Ananda Agustina; Muslim, Much Aziz
Journal of Soft Computing Exploration Vol. 5 No. 1 (2024): March 2024
Publisher : SHM Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/joscex.v5i1.304

Abstract

The profit share of the vegetable market, which is quite large in the agricultural industry, needs to be equipped with the ability to classify types of vegetables quickly and accurately. Some vegetables have a similar shape, such as onions and garlic, which can lead to misidentification of these types of vegetables. Through the use of computer vision and machine learning, vegetables, especially onions, can be classified based on the characteristics of shape, size, and color. In classifying shallot and garlic images, the CNN model was developed using 4 convolutional layers, with each layer having a kernel matrix of 2x2 and a total of 914,242 train parameters. The activation function on the convolutional layer uses ReLu and the activation function on the output layer is softmax. Model accuracy on training data is 0.9833 with a loss value of 0.762.
Comparison of gridsearchcv and bayesian hyperparameter optimization in random forest algorithm for diabetes prediction Muzayanah, Rini; Pertiwi, Dwika Ananda Agustina; Ali, Muazam; Muslim, Much Aziz
Journal of Soft Computing Exploration Vol. 5 No. 1 (2024): March 2024
Publisher : SHM Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/joscex.v5i1.308

Abstract

Diabetes Mellitus (DM) is a chronic disease whose complications have a significant impact on patients and the wider community. In its early stages, diabetes mellitus usually does not cause significant symptoms, but if it is detected too late and not handled properly, it can cause serious health problems. To overcome these problems, diabetes detection is one of the solutions used. In this research, diabetes detection was carried out using Random Forest with gridsearchcv and bayesian hyperparameter optimization. The research was carried out through the stages of study literature, model development using Kaggle Notebook, model testing, and results analysis. This study aims to compare GridSearchCV and Bayesian hyperparameter optimizations, then analyze the advantages and disadvantages of each optimization when applied to diabetes prediction using the Random Forest algorithm. From the research conducted, it was found that GridSearchCV and Bayesian hyperparameter optimization have their own advantages and disadvantages. The GridSearchCV hyperparameter excels in terms of accuracy of 0.74, although it takes longer for 338,416 seconds. On the other hand, Bayesian hyperparameter optimization has a lower accuracy rate than GridSearchCV optimization with a difference of 0.01, which is 0.73 and takes less time than GridSearchCV for 177,085 seconds.
Extreme Gradient Boosting Model with SMOTE for Heart Disease Classification Dullah, Ahmad Ubai; Darmawan, Aditya Yoga; Pertiwi, Dwika Ananda Agustina; Unjung, Jumanto
JISKA (Jurnal Informatika Sunan Kalijaga) Vol. 10 No. 1 (2025): January 2025
Publisher : UIN Sunan Kalijaga Yogyakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14421/jiska.2025.10.1.48-62

Abstract

Heart disease is one of the leading causes of death worldwide. According to data from the World Health Organisation (WHO), the number of victims who die from heart disease reaches 17.5 million people every year. However, the method of diagnosing heart disease in patients is still not optimal in determining the proper treatment. Along with technology development, various models of machine learning algorithms and data processing techniques have been developed to find models that can produce the best precision in classifying heart disease. This research aims to create a machine learning algorithm model for categorizing heart disease, thereby enhancing the effectiveness of diagnosis and facilitating the determination of appropriate treatment for patients. This research also aims to overcome the limitations of accuracy in existing diagnosis methods by identifying models that can provide the best results in processing and analyzing health data, particularly in terms of heart disease classification. In this study, the XGBoost model was identified as the most superior, with an accuracy of 99%. These results demonstrate that the XGBoost model achieves a higher accuracy rate than previous methods, making it a promising solution for enhancing the accuracy of future heart disease diagnosis and classification.
Operational Supply Chain Risk Management on Apparel Industry Based on Supply Chain Operation Reference (SCOR) Pertiwi, Dwika Ananda Agustina; Yusuf, Muhammad; Efrilianda, Devi Ajeng
Journal of Information System Exploration and Research Vol. 1 No. 1 (2023): January 2023
Publisher : shmpublisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/joiser.v1i1.103

Abstract

The occurrence of uncertainty requires proper handling to avoid the adverse effects called risk. Risk tends to arise in the supply chain process called supply chain risk. The purpose of this research is to identify the possible level of risk that occurs and has the potential to disrupt supply chain activities, determine priority risk sources based on Supply Chain Operation References (SCOR). The object of this research is the apparel industry, which is a company engaged in fashion and apparel production. This study uses a qualitative and quantitative approach, the value of the instrument is assessed based on the results of the Aggregate Risk Potential (ARP) calculation in the House of Risk method phase 1.  The results showed that there were 39 correlations between risk events and risk agents, with 22 correlations with a high scale and 1 correlation with a low scale, and 15 correlations on a medium scale.
Sentiment analysis spotify applications on google play store with naïve bayes and neural network methods Syahra, Syahra Audiyani Fitra; Pertiwi, Dwika Ananda Agustina
Journal of Student Research Exploration Vol. 3 No. 2 (2025): July 2025
Publisher : SHM Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/josre.v3i2.416

Abstract

Digital advancements have significantly changed the way music is accessed and enjoyed, with streaming platforms such as Spotify emerging as one of the most widely used applications worldwide. Along with this growth, user reviews on platforms like the Google Play Store have become an important source of information, offering insights into user satisfaction and areas for improvement. In this study, sentiment analysis was conducted on Spotify reviews using two classification methods, Naïve Bayes and Neural Networks. The reviews were collected, processed, and then analyzed with both approaches to evaluate their performance. The results show that Neural Networks outperformed in terms of accuracy, F1-score, and recall, while Naïve Bayes performed better in AUC, precision, and MCC. Analysis of the dataset also revealed that negative reviews dominated at 52.8%, followed by positive at 28.3%, and neutral at 19%. These findings highlight the value of sentiment analysis in understanding user perspectives and can support developers in improving application quality and user experience.
Increasing package delivery efficiency through the application of the prim algorithm to find the shortest route on the expedition route Lestari, Apri Dwi; Pertiwi, Dwika Ananda Agustina; Muslim, Much Aziz
Journal of Student Research Exploration Vol. 1 No. 1: January 2023
Publisher : SHM Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/josre.v1i1.105

Abstract

One of the changes is in terms of shopping. Previously, people shopped through physical stores, but since the emergence of online shopping platforms, people have started to switch to using the marketplace as a place to make buying and selling transactions. This platform utilizes expedition services to send packages in the form of ordered goods from sellers to buyers. This activity presents a new problem, which is related to the efficiency of package delivery by courier services so that goods can arrive as quickly as possible in the hands of buyers. Graph modeling to solve a problem related to the shortest path and the fastest path is adapted in this paper. The algorithm used is Prim's Algorithm, which is an algorithm to determine the minimum spanning tree of a connected weighted graph. The test results show that the algorithm is suitable for increasing packet delivery efficiency by determining the shortest path based on the minimum spanning tree concept. By taking a sample of travel routes on the island of Java, the best route was obtained with a total distance of 1,771 kilometers connecting cities from the city of Jakarta to the city of Banyuwangi.
A Performance Comparison of Data Balancing Model to Improve Credit Risk Prediction in P2P Lending Pertiwi, Dwika Ananda Agustina; Ahmad, Kamilah; Unjung, Jumanto; Muslim, Much Aziz
Scientific Journal of Informatics Vol. 11 No. 4: November 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i4.14018

Abstract

Purpose: The problem of imbalanced datasets often affects the performance of classification models for prediction, one of which is credit risk prediction in P2P lending. To overcome this problem, several data balancing models have been applied in the existing literature. However, existing research only evaluates performance based on classification model performance. Thus, in addition to measuring the performance of classification models, this study involves the contribution of the performance of data balancing models including Random Oversampling (ROS), Random Undersampling (RUS), and Synthetic Minority Oversampling (SMOTE). Methods: This research uses the Lending Club dataset with an imbalanced ratio (IR) of 4.098, and 2 classifiers such as LightGBM and XGBoost, as well as 10 cross-validation to assess the performance of the data balancing model including Random Oversampling (ROS), Random Undersampling (RUS), and Synthetic Minority Oversampling (SMOTE). Then the model is evaluated using the metrics of accuracy, recall, precision, and F1-score. Result: The research results show that SMOTE has superior performance as a data balancing model in P2P lending, with an accuracy of the LightGBM+SMOTE model of 92.56% and the XGBoost+SMOTE model of 92.32%, where this performance is better than other models. Novelty: This research concludes that SMOTE as a data balancing model to improve credit risk prediction in P2P lending has superior performance. Apart from that, in this case, we find that the larger the data size used as a model training sample, the superior performance obtained by the classification model in predicting credit risk in P2P lending.
Analyzing the Impact of Effort Expectancy and Cognitive Attitudes on The Willingness to Accept ChatGPT Saputra, Andri; Noraini, Oktafiyani Aisah; Pertiwi, Dwika Ananda Agustina
Journal of Information System Exploration and Research Vol. 3 No. 2 (2025): July 2025
Publisher : shmpublisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/joiser.v3i2.599

Abstract

This study aims to analyze the impact of Effort Expectancy (EE) adapted from the Unified Theory of Acceptance and Use of Technology (UTAUT) and Cognitive Attitude (CA) from the Theory of Reasined Action (TRA) model on Willingness to Accept (WA) adapted from TAM on ChatGPT. By understanding the relationship between these factors, we can identify effective strategies to increase user acceptance of ChatGPT technology. The research method used is quantitative with multiple linear regression calculations in SPSS. This study obtained 50 respondents with a total of 10 variables but there were 3 main variables. With the final result, Effort Expectancy has no significant effect on Willingness to Accept while Cognitive Attitude has a significant effect on Willingness to Accept. This suggests that users’ perceptions of how easy or difficult it is to use ChatGPT do not influence their decision to accept and use the technology. In this context, users may feel that ease of use is not a major factor influencing their acceptance of ChatGPT. This means that users’ cognitive attitudes—including their beliefs, perceptions, and understanding of the technology—play an important role in their decision to accept and use ChatGPT.
Comparative Study of Imbalanced Data Oversampling Techniques for Peer-to-Peer Landing Loan Prediction Muzayanah, Rini; Lestari, Apri Dwi; Jumanto, Jumanto; Prasetiyo, Budi; Pertiwi, Dwika Ananda Agustina; Muslim, Much Aziz
Scientific Journal of Informatics Vol 11, No 1 (2024): February 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i1.50274

Abstract

Purpose: Data imbalances that often occur in the classification of loan data on the Peer-to-Peer Lending platform cancause algorithm performance to be less than optimal, causing the resulting accuracy to decrease. To overcome thisproblem, appropriate resampling techniques are needed so that the classification algorithm can work optimally andprovide results with optimal accuracy. This research aims to find the right resampling technique to overcome theproblem of data imbalance in data lending on peer-to-peer landing platforms.Methods: This study uses the XGBoost classification algorithm to evaluate and compare the resampling techniquesused. The resampling techniques that will be compared in this research include SMOTE, ADACYN, Border Line, andRandom Oversampling.Results: The highest training accuracy was achieved by the combination of the XGBoost model with the Boerder Lineresampling technique with a training accuracy of 0.99988 and the combination of the XGBoost model with the SMOTEresampling technique. In accuracy testing, the combination with the highest accuracy score was achieved by acombination of the XGBoost model with the SMOTE resampling technique.Novelty: It is hoped that from this research we can find the most suitable resampling technique combined with theXGBoost sorting algorithm to overcome the problem of unbalanced data in uploading data on peer-to-peer lendingplatforms so that the sorting algorithm can work optimally and produce optimal accuracy.