Claim Missing Document
Check
Articles

Found 52 Documents
Search
Journal : UNP Journal of Statistics and Data Science

Comparison of Error Rate Prediction Methods in Binary Logistic Regression Model for Balanced Data Shavira Asysyifa S; Dodi Vionanda; Nonong Amalita; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/90

Abstract

Binary Logistic Regression is one of the statistical methods that can be  used to see the relations between dependent variable with some independent variables, where the dependent variable split into two categories, namely the category declaring a successful event and the category declaring a failed event. The performance of binary logistic regression can be seen from the accurary of the model. Accuracy can be measured by predicting the error rate. One method that can be used to predict error rate is cross validation. The cross validation method works by dividing the data into two parts, namely testing data and training data. Cross validation has several learning methods that are commonly used, namely Leave One Out (LOO), Hold out, and K-fold cross validation. LOO has unbiased estimation of accuracy but take a long time, hold out can avoid overfitting and works faster because no iterations, and k-fold cross validation has smaller error rate prediction. Meanwhile, data cases with different correlation are useful to find out the different correlations effect performance of error rate prediction method. In this study uses artificially generated data with a normal distribution, including univariate, bivariate, and multivariate datasets with various combination of mean differences and correlation. Considering these factors, this study focuses on comparing the three cross validation methods for predicting error rate prediction in binary logistic regression. This study finds out that k-fold cross validation method is the most suitable method to predict errors in binary logistic regression modeling for balanced data.
Classification of Coronary Heart Disease at Semen Padang Hospital using Algorithm Classification And Regression Trees (CART) defal aditya defran; Atus Amadi Putra; Dodi Vionanda; Tessy Octavia Mukhti
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/104

Abstract

Cardiovascular disease is a degenerative disease caused by decreased function of the heart and blood vessels. One of the heart diseases that is very popular today is coronary heart disease (CHD). The main factors that cause CHD include age, gender, hypertension, blood sugar and cholesterol. One method that can be used to group CHD is classification. Classification And Regression Trees (CART )is a decision tree that describes the relationship between a response variable and one or more predictor variables. The goal of CART is to obtain an accurate data group as a characteristic of a classification. Based on the results of the optimal tree, the attribute that is the main characteristic in classifying CHD patients at Semen Padang Hospital is age. The determination of the classification results using the confusion matrix produced an accuracy value of 66.67%, a sensitivity of 56.52% for classifying CHD patients, and a specificity of 84.61% for classifying non-CHD patients.
Prediction Of Bogor City Rainfall Parameters Using Long Short Term Memory (LSTM) Sherly Amora Jofipasi; Admi Salma; Dodi Vionanda; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/110

Abstract

Bogor is a city that has high intensity of rainfall and has erratic rainfall. So it is necessary to predict Bogor's rainfall. Rainfall prediction can be done using the LSTM algorithm. In the LSTM algorithm, there are neuron hidden layer and epoch parameters. Neuron hidden layer and epoch greatly affect the resulting prediction results, therefore it is necessery to determine the best neuron hidden layer and epoch values to produce good prediction results in Bogor rainfall. The prediction parameters results obtained by LSTM have worked well using optimal neuron hidden values of 256, optimal epoch of 150, MAPE of 1,64%, and the comparison of actual data patterns and prediction data already has the same data patterns.
Comparison of Error Prediction Methods in Claassification Modeling with CHAID Methods for Balanced Data Findri Wara Putri; Dodi Vionanda; Atus Amadi Putra; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/116

Abstract

Chi-Squared Automatic Interaction Detection (CHAID) is an exploratory method for classifying data by building classification trees. The classification result are displayed in the form of a tree diagram model. After the model is formed, it is necessary to calculate the accuracy of the model. The goal is to see the performance of the model. The accuracy of this model can be determined by calculating the level of prediction error in the model. The error rate prediction method works by dividing data into training data and testing data. There are three methods in the error rate prediction method, such as Leave one out cross validation (LOOCV), Hold out, and k-fold cross validation. These methods have different performance in dividing data into training data and test data, so that each method has advantages and disadvantages. Therefore, a comparison of the three error rate prediction methods was carried out with the aim of determining the appropriate method for the CHAID. This research is included in experimental research and uses simulation data from data generation results in RStudio. This comparison is carried out by considering several factors, namely the marginal probability matrix and different correlations. The comparison results will be observed using a boxplot by looking at the median error rate and lowest variance. This research found that k-fold cross validation is the most suitable error rate prediction method applied to the CHAID method for balanced data.
Comparison of Error Rate Prediction in CART for Imbalanced Data Lifia Zullani; Dodi Vionanda; Syafriandi Syafriandi; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/117

Abstract

CART is one of the tree based classification algorithms. CART is a tree consisting of root nodes, internal nodes, and terminal nodes. The accuracy of the model in CART can be calculated by measuring prediction errors in the model. One common method used to predict error rates is cross-validation. There are three cross-validation algorithms, namely leave one out, hold out, and k-fold cross-validation. These methods have different performance in dividing data into training data and testing data, so there are advantages and disadvantages to each method. Every algorithm has its shortcomings; hold out cannot guarantee that the training set represents the entire dataset, leave one out is very time-consuming and requires significant computation because it has to train the model as many times as there are data points, and k-fold provides longer computation time because the training algorithm must be run k times. In reality, the data often encountered is imbalanced. Imbalanced data refers to data with a different number of observations in each class. In CART, imbalanced data affects the prediction results. This research focuses on comparing error rate prediction methods in the CART model with imbalanced data. The study uses three types of data: univariate, bivariate, and multivariate, obtained from differences in population means and correlations between independent variables. The results obtained indicate that the k-fold algorithm is the most suitable error rate prediction algorithm applied to CART with imbalanced data.
Diagnosis of the type of delivery of pregnant women at Semen Padang Hospital Using the C4.5 Method rama novialdi; Dony Permana; Dodi Vionanda; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 2 No. 1 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss1/130

Abstract

The health of the mother and fetus is very important, but there are many challenges and risks associated with pregnancy and childbirth. According to WHO, in 2020 there were 287,000 cases of women dying during pregnancy and childbirth. Causative factors that affect the type of delivery include the age of pregnant women, MGG, systole, diastole, and pulse. One method that can be used to group the types of childbirth of pregnant women is classification. C4.5 is one of the methods used in forming decision trees to produce decisions. The purpose of C4.5 is to obtain attributes that will be the main criteria in the classification. Based on optimal tree results, the attribute that is the main criterion in classifying the type of delivery of pregnant women who give birth by caesar section and normal delivery at Semen Padang Hospital is MGG. Determination of classification results using confusion matrix resulted in an accuracy value of 74%, sensitivity of 80% to classify the type of delivery of pregnant women who gave birth caesar, and specificity of 66.67% to classify the type of delivery of pregnant women who gave birth normally.
Forecasting Gold Prices in Indonesia using Support Vector Regression with the Grid Search Algorithm Syahfitrri, Nindi; Nonong Amalita; Dodi Vionanda; Zamahsary Martha
UNP Journal of Statistics and Data Science Vol. 2 No. 1 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss1/145

Abstract

Investment is an effort to increase economic growth in Indonesia.  A popular investment in the community is gold investment.  The value of gold investments tends to increase but is not immune from price fluctuations, therefore it is important to forecast the price of gold in Indonesia. The method that can be used to make this forecast is Support Vector Regression (SVR).  SVR is a method that looks for a function that has a deviation of no more than ε to get the target value from all training data. The best SVR model with a linear kernel was obtained from a combination of parameters C=0,0625 and ε=0,001 with a RMSE value of 0,19734 and a value of 0,974112.  So, the SVR method is appropriate to use for forecasting gold prices in Indonesia.
Comparison of Modeling Infant Mortality Rate in West Sumatra and West Java Province in 2021 Using Negative Binomial Regression Afdhal, Afdhal Rezeki; Fadhilah Fitri; Dodi Vionanda; Dony Permana
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/156

Abstract

In Poisson regression analysis, there is an assumption that must be met, namely equidispersion (the variance value of the response variable is the same as the mean). In reality, conditions like this very rarely occur because overdispersion usually occurs (the variance value of the response variable is greater than the mean). One way to overcome this problem is to use the Negative Binomial regression method. The aim of this article is to obtain the best modeling results in Negative Binomial regression analysis to overcome overdispersion in cases of infant mortality in West Sumatra Province and West Java Province. The model obtained using Negative Binomial regression produces an AIC value in West Sumatra province of 192.65 which is smaller than the AIC value in West Java Province it was 283.47. Based on the Negative Binomial regression model equation obtained in West Sumatra Province, it can be explained that the number of health centers (X3) has a significant influence on the infant mortality rate and in West Java Province it can be explained that the number of medical personnel (X1) has a significant influence on the infant mortality rate.
Classification of Poor Households in West Sumatra Province using Decision Tree Algorithm C4.5 Dinda Fitriza; Atus Amadi Putra; Dodi Vionanda; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/157

Abstract

The significant and increasingly complex issue of poverty poses a considerable challenge to Indonesia's development, including West Sumatra Province, with a poverty rate was 5.92% in 2022. The government has initiated programs to address poverty by focusing on the criteria of impoverished households. Data on impoverished households can be obtained through the National Socio-Economic Survey (Susenas). One method that can classify impoverished households is the decision tree. Decision tree is a flowchart that resembles a tree. The C4.5 algorithm used in this research has the ability handle discrete and continuous data, manage variables with missing values, and prune decision tree branches. The result of the analysis shows that the variables affecting the classification of poor households are the number of household members, then the age of the household head, type of house floor, type of house wall, source of drinking water, and cooking fuel. The accuracy of the test data using a confusion matrix is 69.89%, sensitivity of 71.15% for classifying regular households, and specificity of 68.72% for classifying impoverished households.
Impelementation of Subtractive Fuzzy C-Means Method in Clustering Provinces in Indonesia Based on Factors Causing Stunting in Toddlers Hariati Ainun Nisa; Admi Salma; Dodi Vionanda; Tessy Octavia Mukhti
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/164

Abstract

Indonesia in 2022 has a stunting rate that is still relatively high at 21.6%. For this reason, it is necessary to make various efforts to reduce the stunting rate. One of the efforts that can be made is to understand the characteristics of each province in Indonesia with cluster analysis. This study aims to cluster provinces in Indonesia based on factors that cause stunting in children under five. The method used is Subtractive Fuzzy C-Means which has advantages in terms of speed, iteration, thus producing more stable and accurate results. The results of the validity test with Silhouette Coefficient Index, the optimum number of clusters is 8 clusters with a radius (r) of 0.70. There are 8 provinces that have provided maximum handling and efforts in reducing stunting rates, namely the provinces of Bangka Belitung Islands, Riau Islands, DKI Jakarta, DI Yogyakarta, Bali, East Kalimantan, South Kalimantan, and South Sulawesi. Meanwhile, 7 provinces namely East Nusa Tenggara, South Kalimantan, Central Sulawesi, West Sulawesi, Maluku, North Maluku, and West Papua, still need special attention from the government in reducing stunting rates based on the factors that cause stunting discussed in this study.
Co-Authors Admi Salma Admi Salma Afdhal, Afdhal Rezeki Afifah Salsabilah Putri Aidillah, Kerin Hagia Alandra, Cindy Resha Aldwi Riandhoko Alfathan, Muhammad Luthfi Amanda, Abilya Amannia zeze Andini Yulianti Aprilla Suhada Ardhi, Sonia Atus Amadi Putra Bahri Annur Sinaga Cindy Pratiwi, Cindy defal aditya defran Dina Fitria Dina Fitria Dina Fitria, Dina Dinda Fitriza Dony Permana Dwi Sulistiowati, Dwi Eujeniatul Jannah Fadhilah Fitri Fadhillah Fitri Fashihullisan Fayyadh Ghaly Fayza Annisa Febrianti FAZHIRA ANISHA Febri Ramayanti Findri Wara Putri Fitri, Fadhilah Fitri, Fitri Hayati Fitria Panca Ramadhani Hariati Ainun Nisa Husni, Nabila Ichlas Djuazva Ihsanul Fikri Isra Miraltamirus Jimmi Darma putra Jumiati, Susi Kamil, Fakhri Larissa, Dwika Latifah Jayatri Febiola Lifia Zullani Mardhiatul Azmi martha, Ully Martha Muhammad Ravi Azzaki Muhammad Tibri Syofyan Mukhti, Tessy Octavia nabillah putri Nanda P, Muhamad Rayhan Nazifatul Azizah Nikma Hasanah Nonong Amalita Nufhika Fishuri Nur Leli Nurul Afifah Permana, Dony Putra, Dio Afdal Putra, M. Farel Rusde Putri, Triana Rahmadina Adityana Rahmanesta, Frandito rama novialdi Rivani, Putri Rizki Akbar Robiati, Silfi Salma, Admi Seif Adil El-Muslih Shavira Asysyifa S Sherly Amora Jofipasi Silvia Agustina Siti Nurhaliza Susrifalah, Amelia Syafriandi Syafriandi Syafriandi Syahfitrri, Nindi Syifa Azahra Tessy Octavia Mukhti Wood, Raihan Attaya Yarman Yarman, Yarman Yenni Kurniawati Yunistika Ilanda Zamahsary Martha Zilrahmi, Zilrahmi Zulzila, Alivia