Claim Missing Document
Check
Articles

Application of singular spectrum analysis method to forecast rice production in west sumatra: Artikel nazifatul azizah Nazifatul Azizah; Fadhilah Fitri; Dodi Vionanda; Zamahsary Martha
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/58

Abstract

The imbalance between the population and rice production will cause various negative impacts such as food crises and increasing poverty, so forecasting needs to be done to maintain food availability in the future. This study aims to determine the results of rice production in West Sumatra Province for 12 periods in 2023 using the SSA method. Based on the results of the analysis, rice production in 2023 for 12 periods tends to decrease compared to the previous year. Forecasting rice production using the SSA method with L=21 can be said to be accurate with a MAPE obtained of 17.69%.
Grouping Level of Poverty Based on District/City in Indonesia Using K-Harmonic Means nabillah putri; Nonong Amalita; Dodi Vionanda; Dony Permana
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/60

Abstract

Indonesia still has a relatively high poverty rate, although nationally it has declined in recent years. There are areas that are still experiencing increasing poverty rates. So that the currently planned poverty alleviation plans are no longer uniform, but need to pay attention to the conditions of each dimension that cause poverty in an area, so it is necessary to group districts/cities in Indonesia on poverty. Grouping was performed using K-Harmonic Means analysis. K-Harmonic Means is a non-hierarchical clustering that takes the average of the harmonic distance between each data point and the cluster’s center. The data used in this research is secondary data sourced from BPS publications on poverty and inequality in 2022. The analysis technique is carried out by standardizing the data, conducting cluster analysis, and validating clusters. Based on the results of the K-Harmonic Means analysis, the optimal number of clusters is two clusters that first cluster has 54 districts/cities while second cluster has 460 districts/cities and the Dunn Index value for cluster validation is 0,03492. So that a better grouping level of poverty based on district/city in Indonesia is obtained by using the K-Harmonic Means method with p = 2,25.
Geographically Weighted Panel Regression for Modeling The Percentage of Poor Population in West Sumatra Jimmi Darma putra; Dina Fitria; Dodi Vionanda; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/64

Abstract

Geographically Weighted Panel Regression (GWPR) model applies panel regression to spatial data, and parameter estimation is carried out using spatial weight at each observation point. The purpose of this study is to determine the GWPR model and the factors that influence the percentage of poor people in each district/city in West Sumatra Province from 2015 to 2021. And the adaptive bisquare kernel function was used to provide spatial weighting, and Cross-Validation (CV) criteria were used to identify the optimal bandwidth. The research data was secondary data sourced from the official website and West Sumatra published books in Sumatera Barat Dalam Angka from 2015 to 2021. The GWR model and the FEM panel data regression model are combined to create the GWPR model. The results of this study is there are a differences between models and factors that affecting the poor percentages in 19 districts/cityes of West Sumatra.
Sentiment Analysis of Electric Cars Using Naive Bayes Classifier Method NURUL AFIFAH; Dony Permana; Dodi Vionanda; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/68

Abstract

In recent years, electric cars have become increasingly popular as an alternative to environmentally friendly vehicles in the automotive industry. These vehicles use electric power as an energy source that can mitigate the reliance on fossil fuels contribute to efforts to minimize greenhouse gas emissions and air pollution. However, the presence of electric cars raises pro and con opinions from the public. the conversation about electric cars has become one of the hot on social media. Twitter is a social media microblogging that permits its users to create short messages and share them easily and quickly. These opinions require sentiment analysis. The purpose of conducting sentiment analysis is to find out how people's perceptions and opinions on electric cars are leading in a favorable or unfavorable direction. Thus, sentiment analysis can help companies marketing strategies, and better business decisions. Then the opinions will be classified based on positive and negative categories. This investigation employs the naive classifier method to generate positive and negative sentiment towards electric cars on Twitter. The accuracy results of naive bayes obtained by using a confusion matrix in this research are 77.8%, with a dataset split composition of 70%:30%.
Comparison of Error Rate Prediction Methods in Classification Modeling with Classification and Regression Tree (CART) Methods for Balanced Data Fitria Panca Ramadhani; Dodi Vionanda; Syafriandi Syafriandi; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/73

Abstract

CART (Classification and Regression Tree) is one of the classification algorithms in the decision tree method. The model formed in CART is a tree consisting of root nodes, internal nodes, and terminal nodes. After the model is formed, it is necessary to calculate its accuracy. The aim is to see the performance of the model. The accuracy of this model can be determined by calculating the predicted error rate in the model. The error rate prediction method works by dividing the data into training data and testing data. There are three methods in the error rate prediction method: Leave One Out Cross Validation (LOOCV), Hold Out (HO), and K-Fold Cross Validation. These methods have different performance in dividing data into training data and testing data, so there are advantages and disadvantages to each method. Therefore, a comparison was made between the three error rate prediction methods with the aim of determining the appropriate method for the CART algorithm. This comparison was made by considering several factors, for instance, variations in the mean, the number of variables, and correlations in normally distributed random data. The results of the comparison will be observed using a boxplot by looking at the median error rate and the lowest variance. The results of this study indicate that the K-Fold Cross Validation method has the lowest median error rate and the lowest variance, so the most suitable error prediction method for the CART method is the K-Fold Cross Validation method
Comparasion of Error Rate Prediction Methods of C4.5 Algorithm for Balanced Data Ichlas Djuazva; Dodi Vionanda; Nonong Amalita; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/74

Abstract

C4.5 is a highly effective decision tree algorithm for classification purposes. Compared to CHAID, Cart, and ID3, C4.5 generates the decision tree faster and is easier to understand. However, C4.5 algorithm is also not exempt from errors in classification, which can impact the accuracy of the resulting model. Model accuracy could be measured by predicting the error rate. One commonly used method for error rate prediction is cross-validation. The cross-validation method divides data into two parts: training set to build model and testing set to test the model. There are several cross-validation techniques commonly used to predict the error rate, such as Leave One Out Cross Validation (LOOCV), Hold Out (HO), and k-fold cross-validation. LOO has unbiased estimation but takes a long time and depends on the data size; HO could avoid overfitting and work faster; and k-folds cross validation has a smaller error rate prediction.   This study uses artificially generated data with a normal distribution, including univariate, bivariate, and multivariate datasets with various combinations of mean differences and different correlations. Different correlation structures are applied to see the impact of these different correlations on the error rate prediction method. Considering these factors, this research focuses on comparing three cross-validation methods to predict error rates for the decision tree model generated by C4.5 algorithm. This research found that k-folds cross-validation is the most suitable cross-validation method to apply when testing the model generated by C4.5 algorithm with balanced data
Comparison of Fuzzy Time Series Markov Chain and Fuzzy Time Series Cheng to Predict Inflation in Indonesia Ihsanul Fikri; Admi Salma; Dodi Vionanda; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/76

Abstract

Inflation is one of the main microeconomic problems which is a very important economic indicator. Unstable inflation has a negative impact on people’s welfare, thus controlling inflation is important thing for a country. Forecasting is needed to monitor future movements in the inflation rate. In this study, the Fuzzy Time Series Markov Chain and fuzzy time series Cheng methods will be compared in forecasting inflation. The advantage of the fuzzy time series method is that it does not have any special assumptions thet must be met. The purpose of this study is to determine the results of forecasting based on the results of the comparison of the two methods. The result of the comparison of the two methods based on the MAPE value is that fuzzy time series Markov Chain has the smallest value of 6,97%. The result of inflation forecasting for the next 5 periods using the fuzzy time series Markov Chain method is 5,42; 5,71; 5,95; 5,82 and 6,10.
Comparison of Error Rate Prediction Methods in Classification Modeling with the CHAID Method for Imbalanced Data Seif Adil El-Muslih; Dodi Vionanda; Nonong Amalita; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/81

Abstract

CHAID (Chi-Square Automatic Interaction Detection) is one of the classification algorithms in the decision tree method. The classification results are displayed in the form of a tree diagram model. After the model is formed, it is necessary to calculate the accuracy of the model. The aims is to see the performance of the model. The accuracy of this model can be done by calculating the predicted error rate in the model. There are three methods, such as Leave one out cross-validation (LOOCV), Hold-out, and K-fold cross-validation. These methods have different performances in dividing data into training and testing data, so each method has advantages and disadvantages. Imbalanced data is data that has a different number of class observations. In the CHAID method, imbalanced data affects the prediction results. When the data is increasingly imbalanced the prediction result will approach the number of minority classes. Therefore, a comparison was made for the three error rate prediction methods to determine the appropriate method for the CHAID method in imbalanced data. This research is included in experimental research and uses simulated data from the results of generating data in RStudio. This comparison was made by considering several factors, for the marginal opportunity matrix, different correlations, and several observation ratios. The results of the comparison will be observed using a boxplot by looking at the median error rate and the lowest variance. This research finds that K-fold cross-validation is the most suitable error rate prediction method applied to the CHAID method for imbalanced data.
Comparing Classification and Regression Tree and Logistic Regression Algorithms Using 5×2cv Combined F-Test on Diabetes Mellitus Dataset Fashihullisan; Dodi Vionanda; Yenni Kurniawati; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/84

Abstract

Classification is the process of finding a model that describes and distinguishes data classes that aim to be used to predict the class of objects whose class labels are unknown. There are several algorithms in classification, such as classification trees and regression trees (CART) and logistic regression. The k-fold cross validation method has a weakness for algorithm comparison problems it is possible at different folds to produce different error predictions, so that the results of comparing algorithm performance will also be different. There for in the problem of comparison of algorithms, the researcher will apply the 52cv t test method and the 52cv combined F test. Out of 100 iterations the 10-fold cross validation method was only consistent three times which shows that the k-fold cross validation method has poor consistency in comparing the CART algorithm and logistic regression for diabetes mellitus data. In addition, 52cv combined F test and 52cv t test methods that have been carried out show that 52cv combined F test is better used to get conclusions from the results of a comparison of the two algorithms because it only produces one decision, in contrast to 52cv t test which has the possibility to get different decisions from 10 test statistics which results makes it difficult for researchers to draw conclusions in comparing the cart algorithm and logistic regression
Emprical Study for Algorithms Comparison of Classification and Regression Tree and Logistic Regression Using Combined 5×2cv F Test Fayza Annisa Febrianti; Dodi Vionanda; Yenni Kurniawati; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/85

Abstract

Classification is a method to estimate the class of an object based on its characteristics. Several learning algorithms can be applied in classification, such as Classification and Regression Tree (CART) and logistic regression. The main goal of classification is to find the best learning algorithm that can be applied to get the best classifier. In comparing two learning algorithms, a direct comparison by seeing the smaller prediction error rate may be possible when the difference is very clear. In this case, direct comparison is misleading and resulting inadequate conclusions. Therefore, a statistical test is needed to determine whether the difference is real or random. The results of the 5×2cv paired t-test sometimes reject and sometimes fail to reject the hypothesis. It is distracting because the changing of the error rate difference should not affect the test result. Meanwhile, the overall results of the combined 5×2cv F test show that the tests fail to reject the hypothesis. This indicates that CART and logistic regression perform identically in this case.