Claim Missing Document
Check
Articles

Comparasion of Error Rate Prediction Methods of C4.5 Algorithm for Balanced Data Ichlas Djuazva; Dodi Vionanda; Nonong Amalita; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/74

Abstract

C4.5 is a highly effective decision tree algorithm for classification purposes. Compared to CHAID, Cart, and ID3, C4.5 generates the decision tree faster and is easier to understand. However, C4.5 algorithm is also not exempt from errors in classification, which can impact the accuracy of the resulting model. Model accuracy could be measured by predicting the error rate. One commonly used method for error rate prediction is cross-validation. The cross-validation method divides data into two parts: training set to build model and testing set to test the model. There are several cross-validation techniques commonly used to predict the error rate, such as Leave One Out Cross Validation (LOOCV), Hold Out (HO), and k-fold cross-validation. LOO has unbiased estimation but takes a long time and depends on the data size; HO could avoid overfitting and work faster; and k-folds cross validation has a smaller error rate prediction.   This study uses artificially generated data with a normal distribution, including univariate, bivariate, and multivariate datasets with various combinations of mean differences and different correlations. Different correlation structures are applied to see the impact of these different correlations on the error rate prediction method. Considering these factors, this research focuses on comparing three cross-validation methods to predict error rates for the decision tree model generated by C4.5 algorithm. This research found that k-folds cross-validation is the most suitable cross-validation method to apply when testing the model generated by C4.5 algorithm with balanced data
Analysis of the Poverty Level Model for West Sumatra Province Using Geographically Weighted Binary Logistic Regression april leniati; Dony Permana; Nonong Amalita; Zamahsary Martha
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/80

Abstract

T   West Sumatra Province (West Sumatra) ranks third lowest in terms of the poverty rate on the island of Sumatra in 2022, with a figure of 5.92%. Although this figure is lower than the national average, the Province of West Sumatra is targeting a reduction in the poverty rate to 5.62% in 2024 in the vision of the 2021–2026 Regional Development Plan. The purpose of this study is to analyze the factors that contribute to the poverty rate in West Sumatra Province based on geography in 2022. The method used to address poverty problems is Geographically Weighted Binary Logistic Regression (GWBLR), which takes geographical influences into account in the analysis. This study uses data on the percentage of poor people (Y) and the influencing factors, namely life expectancy (X1), literacy rate (X2), labor force participation (X3), and economic growth (X4). The results showed that based on the lowest Akaike Information Criterion Corrected (AICc) value, the GWBLR model with a Fixed Gaussian Kernel weight is the best at modeling the problem of poverty in West Sumatra in 2022. According to the model, the life expectancy variable will have a significant impact on the level of poverty in 13 districts and cities in West Sumatra Province in 2022.
Comparison of Error Rate Prediction Methods in Classification Modeling with the CHAID Method for Imbalanced Data Seif Adil El-Muslih; Dodi Vionanda; Nonong Amalita; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/81

Abstract

CHAID (Chi-Square Automatic Interaction Detection) is one of the classification algorithms in the decision tree method. The classification results are displayed in the form of a tree diagram model. After the model is formed, it is necessary to calculate the accuracy of the model. The aims is to see the performance of the model. The accuracy of this model can be done by calculating the predicted error rate in the model. There are three methods, such as Leave one out cross-validation (LOOCV), Hold-out, and K-fold cross-validation. These methods have different performances in dividing data into training and testing data, so each method has advantages and disadvantages. Imbalanced data is data that has a different number of class observations. In the CHAID method, imbalanced data affects the prediction results. When the data is increasingly imbalanced the prediction result will approach the number of minority classes. Therefore, a comparison was made for the three error rate prediction methods to determine the appropriate method for the CHAID method in imbalanced data. This research is included in experimental research and uses simulated data from the results of generating data in RStudio. This comparison was made by considering several factors, for the marginal opportunity matrix, different correlations, and several observation ratios. The results of the comparison will be observed using a boxplot by looking at the median error rate and the lowest variance. This research finds that K-fold cross-validation is the most suitable error rate prediction method applied to the CHAID method for imbalanced data.
Comparison of Error Rate Prediction Methods in Binary Logistic Regression Model for Balanced Data Shavira Asysyifa S; Dodi Vionanda; Nonong Amalita; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/90

Abstract

Binary Logistic Regression is one of the statistical methods that can be  used to see the relations between dependent variable with some independent variables, where the dependent variable split into two categories, namely the category declaring a successful event and the category declaring a failed event. The performance of binary logistic regression can be seen from the accurary of the model. Accuracy can be measured by predicting the error rate. One method that can be used to predict error rate is cross validation. The cross validation method works by dividing the data into two parts, namely testing data and training data. Cross validation has several learning methods that are commonly used, namely Leave One Out (LOO), Hold out, and K-fold cross validation. LOO has unbiased estimation of accuracy but take a long time, hold out can avoid overfitting and works faster because no iterations, and k-fold cross validation has smaller error rate prediction. Meanwhile, data cases with different correlation are useful to find out the different correlations effect performance of error rate prediction method. In this study uses artificially generated data with a normal distribution, including univariate, bivariate, and multivariate datasets with various combination of mean differences and correlation. Considering these factors, this study focuses on comparing the three cross validation methods for predicting error rate prediction in binary logistic regression. This study finds out that k-fold cross validation method is the most suitable method to predict errors in binary logistic regression modeling for balanced data.
Classification of Nutrition Problems for Indonesian Toddler With Decision Tree Algorithm C4.5 Nadha Ovella Syaqhasdy; Zamahsary Martha; Nonong Amalita; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/98

Abstract

Having excellent human resources is essential for Indonesia's development. The development of Indonesia is the key to improving the quality of life for its citizens, and a focus on this development can have a positive impact on the health and economy of the community. A healthy and educated generation is fundamental for the expected progress of this nation, as nutritional status is a significant factor affecting the quality of human resources. Nutritional problems can lead to serious consequences, such as abnormal physical growth, a decline in IQ quality, and even death. The objective of this research is to analyze the factors that influence the nutritional status of toddlers by classifying each variable using a decision tree. A decision tree is a flowchart resembling a branching tree structure. The C4.5 algorithm was utilized in this study. This algorithm can process both numeric and categorical data, handle missing attribute values, and generate easily interpretable rules. After conducting the analysis, it was found that the decision tree's results indicated that the attribute "Stunting < 20%" is a determining factor for acutechronic malnutrition issues in toddlers. There are 392 districts and cities in Indonesia where the prevalence of stunted toddler nutritional status is less than 20%. The model created using the C4.5 algorithm was evaluated using a confusion matrix, resulting in an accuracy of 99.8% and a kappa value close to 1. This indicates that the model is capable of accurately classifying toddler nutrition problems in Indonesia.
Sentiment Analysis of TikTok Application on Twitter using The Naïve Bayes Classifier Algorithm Denia Putri Fajrina; Syafriandi Syafriandi; Nonong Amalita; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/103

Abstract

TikTok is a popular social media platform that has gained a lot of attention lately. People of all ages are using this application to share short videos with their friends and followers. The content on TikTok is diverse and can be tailored to individual preferences, but there have been concerns about the presence of vulgar content that can be viewed by minors as there are no age restrictions. This has led to public scrutiny of the application on social media platforms like Twitter. To address this issue, sentiment analysis was conducted on reviews of the TikTok application to help users make informed decisions about its use. The aim of this analysis was to determine whether people's opinions about TikTok were positive or negative. To achieve this goal, researchers used the hashtag "TikTok Application".The results were classified into two categories positive and negative using the Naïve Bayes Classifier method. The analysis was carried out using 80% training data and 20% testing data, and the results showed an accuracy rate of 80.32%, with a recall value of 97% and a precision value of 78%. In general, positive feedback from Indonesians on the TikTok application is related to the invitation to download the TikTok application, while in negative feedback, information is obtained that the TikTok application is based on content that is inappropriate for TikTok users to download This information can help users make informed decisions about using the TikTok application.
Backpropagation Neural Network Application in Predicting The Stock Price of PT Bank Rakyat Indonesia Tbk Dewi Febiyanti; Nonong Amalita; Dony Permana; Tessy Octavia Mukhti
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/113

Abstract

Investors often make mistakes when making stock transactions even though having chosen good company stocks. The thing that needs to be considered in making stock transactions is to see the movement of stock prices. The movement of the stock price in PT Bank Rakyat Indonesia Tbk has changed in the form of a decrease or increase. The increase in stock price will provide benefits for investors by selling stocks. However, the occurrence of mistakes when choosing the time to make stock transactions results in investors being able to take high risks because stock prices fluctuate. Therefore, to anticipate the occurrence of high risk to investors, stock price predictions is made using a Backpropagation Neural Network (BPNN). BPNN can adapt quickly and is able to predict nonlinear data such as stock prices and produce a high level of accuracy. The results of this study obtained the best BPNN model, namely the BP(5,3,1) model with a Mean Absolute Percentage Error (MAPE) of 0,8193%. These results show that the model has good network performance so that it can predict stock prices well because it gets a small prediction error
Forecasting the Exchange Rate of Yen to Rupiah Using the Long Short-Term Memory Method Anggi Adrian Danis; Yenni Kurniawati; Nonong Amalita; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/114

Abstract

Long Short-Term Memory (LSTM) is a modification of the Recurrent Neural Network (RNN) to address the problems of exploding and vanishing gradients and make it possible to manage long-term information. To tackle these problems, modifications were made to the RNN by providing memory cells that can store information for long periods. This study aimed to forecast the exchange rate of  Yen to Rupiah using the LSTM method. The data used in this research is daily purchasing rate data from January 2020 to May 2023, which consists of 848 observations. The data was divided into two sets: 80% for training and 20% for testing. For the forecasting process, experiments were conducted to identify the best model by adjusting several hyperparameters. The performance of each model was evaluated using the Mean Absolute Percentage Error (MAPE). According to the experimental results, the best model was the LSTM model with a batch size of 20, 150 epochs, and 50 neurons per layer, which yielded an MAPE value of 1,5399.
Bitcoin Price Prediction Using Support Vector Regression Wulan Septya Zulmawati; Nonong Amalita; Syafriandi Syafriandi; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/121

Abstract

Cryptocurrency provides the most return compared to other investment instruments, causing many novice traders to be attracted to crypto as a tool to make significant profits in the short term. One of the most widely used cryptocurrencies is Bitcoin. Trading is closely related to technical analysis. Various techniques in technical analysis cause beginner traders to have difficulties choosing the right technique. Machine learning methods can be an alternative to overcoming the barriers of beginner traders in the crypto market with predictive methods. One method of machine learning for prediction is Support Vector Regression (SVR). Using the grid search algorithm shows that this method has a good predictive accuracy value of 99,25% and MAPE 0,1206%.
Implementation of Backpropagation Artificial Neural Network on Forecasting Export of Palm Oil in Indonesia Adinda Dwi Putri; Dina Fitria; Nonong Amalita; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/123

Abstract

Export activities are one of the largest revenues in Indonesia with the largest contributor to export is being palm oil. Increasing volume of palm oil exports, it will be able to spur economic growth in Indonesia. In this research, palm oil export forecasting in Indonesia is carried out based on the main destination countries using the Artificial Neural Network (ANN) method with the Backpropagation algorithm. The data used is palm oil export data for 2012-2022 obtained from the Central Statistics Agency (BPS) website. From the data used, the optimal architecture model is 10-1-3-3-1 with a MAPE of 9.68%, which means that this architecture uses 10 input data, 3 hidden layers with the number of each input neuron (1,3,3), and there is 1 output output. From this study, it is estimated that 90% of the results of palm oil export forecasting using the ANN method are close to the actual value.
Co-Authors Addini, Vidhiya Ade Eriyen Saputri Adinda Dwi Putri Admi Salma Aldwi Riandhoko Ali Asmar Amanda, Abilya Amelia Fadila Rahman Andini Yulianti Anggi Adrian Danis Anjelisni, Nining april leniati Arnellis Arnellis Atika Ahmad Atus Amadi Putra Azwar Ananda Chairina Wirdiastuti Cindy Febrianita Denia Putri Fajrina Dewi Febiyanti Dewi Murni Dina Fitria Dina Fitria Dina Fitria, Dina Dodi Vionanda Dony Permana Dwi Sulistiowati Edwin Musdi Elita Zusti Jamaan Elsa Oktaviani Fadhilah Fitri Fajrin Putra Hanifi fajriyanti nur, Putri Fatma Yulia Sari Faulina FAZHIRA ANISHA Fikra, Hidayatul Fitri, Fadhilah Gezi Fajri Ghaly, Fayyadh Hamida, Zilfa Hana Rahma Trifanni haniyathul husna Hasna, Hanifa Helma Helma Helma Helma Herlena Purnama Sari Huriati Khaira Ichlas Djuazva Inna Auliya Jihe Chen Juwita Juwita Khairani, Putri Rahmatun Lilis Sulistiawati Media Rosha Media Rosha Meira Parma Dewi Melly Kurniawati Miftahurrahmi, Syifa Minora Longgom Mohammad Reza febrino Mudjiran Mudjiran Muhammad Tibri Syofyan Mukhti, Tessy Octavia nabillah putri Nadha Ovella Syaqhasdy Natasya Dwi Ovalingga, natasyalinggaa Nini Erdiani Nur Fadillah, Nur Nurhizrah Gistituati Okia Dinda Kelana Oktaviani, Bernadita Permana, Dony Prida Nova Sari Puti Utari Maharani Rahma, Dzakyyah Resti Febrina Retsya Lapiza Rizki Amalia, Annisa Rizqia Salsabila Rusdinal Rusdinal Saddam Al Aziz Safitri, Melda Salma, Admi Seif Adil El-Muslih Shavira Asysyifa S Sondriva, Wilia Sujantri Wahyuni Suparman Suparman Swithania Rizka Putri Syafriandi Syafriandi Syafriandi Syafriandi Syafriandi Syahfitrri, Nindi Tamur, Maximus Tessy Octavia Mukhti Tri Wahyuni Nurmulyati Venny Oktarinda Viola Yuniza Wella Saputri Wulan Septya Zulmawati Yarman Yarman, Yarman Yenni Kurniawati Yulia Pertiwi Zamahsary Martha Zilla Zalila Zilrahmi, Zilrahmi