Claim Missing Document
Check
Articles

Comparison of Queen Contiguity and Customized Weighting Matrices on Spatial Regression to Identify Factors Impacting Poverty in East Java Gezi Fajri; Syafriandi Syafriandi; Nonong Amalita; Zamahsary Martha
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/67

Abstract

Poverty is crucial problem that negative impact on all sectors, including economic, social, and cultural development in East Java Province. Poverty can also increase unemployment, crime, trigger social disasters and hinder progress East Java province. One efforts overcome problem of poverty in East Java province is detect factors that influence. This effort can be done through statistical modeling to determine factors that influence poverty in East Java province. statistical model that can identify factors that influence poverty and explain relationship between region and surrounding area is spatial regression analysis. In spatial regression analysis, spatial weighting matrix is needed to determine spatial influences between regions where one region influences neighboring regions. spatial weighting matrices that is often used is queen contiguity, and according to Anselin (1988:20), this spatial weighting also considers initial information, purpose of case studied, and theory underlying the research. This weighting uses social and economic variables case under study, namely customized weighting matrix. Based on results of this study, shows that best spatial regression and spatial weighting models are General Spatial Model (GSM) with customized weighting because customized weighting produces better estimation results than SAR, SEM and GSM models with queen contiguity weighting in district and city poverty modeling in East Java province with Akaike Infomation Criterion (AIC) value of 188.77 and detemination coefficient (R2) of 84.95%. School's Expected Time, Life Expectancy Score, and Employment Participation Rate are factors that will have substantial impact on percentage of population living in poverty East Java's districts and cities in 2021.
Comparison of Error Rate Prediction Methods in Classification Modeling with Classification and Regression Tree (CART) Methods for Balanced Data Fitria Panca Ramadhani; Dodi Vionanda; Syafriandi Syafriandi; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/73

Abstract

CART (Classification and Regression Tree) is one of the classification algorithms in the decision tree method. The model formed in CART is a tree consisting of root nodes, internal nodes, and terminal nodes. After the model is formed, it is necessary to calculate its accuracy. The aim is to see the performance of the model. The accuracy of this model can be determined by calculating the predicted error rate in the model. The error rate prediction method works by dividing the data into training data and testing data. There are three methods in the error rate prediction method: Leave One Out Cross Validation (LOOCV), Hold Out (HO), and K-Fold Cross Validation. These methods have different performance in dividing data into training data and testing data, so there are advantages and disadvantages to each method. Therefore, a comparison was made between the three error rate prediction methods with the aim of determining the appropriate method for the CART algorithm. This comparison was made by considering several factors, for instance, variations in the mean, the number of variables, and correlations in normally distributed random data. The results of the comparison will be observed using a boxplot by looking at the median error rate and the lowest variance. The results of this study indicate that the K-Fold Cross Validation method has the lowest median error rate and the lowest variance, so the most suitable error prediction method for the CART method is the K-Fold Cross Validation method
Step Function Intervention Analysis Model to Estimate Number of Aircraft Passengers in Minangkabau International Airport Velya Rahma Putri; Zilrahmi; Syafriandi Syafriandi; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/77

Abstract

Pandemic of Covid-19 had a quite big impact in air transportation. Minangkabau International Airport (BIM) has also felt the impact of this pandemic, namely a drastic decrease in the number of airplane passengers or there was an intervention event.a stable of airplane passengers is needed to indicate a stable economy in the transportation sector. If there are no passengers or flight activity in an area, it means that there are no entry and exit of economic activities, industrial activities, tourism and trade which help economic development. For this reason, it is necessary to do forecasting so that the problems that arise as a result of the drastic decline can be resolved by making new policies. Forecasting was carried out in this study to obtain an intervention model that will be used for forecast the next 12 months and predict how long the effect of the intervention will last for avoid further losses due to the continued decline in the number of passengers. The intervention model is considered better for data that has intervention variable compared to SARIMA models. The results of forecasting showed that the SARIMA model (0,1,1)(1,1,1)12 b = 0, s = 8, r = 1 is the best model that can be used for forecasting data containing interventions. This is evidenced by the small MAPE of 36.34% so that the model is feasible to use because the accuracy is quite high and close to the actual value.
Sentiment Analysis of TikTok Application on Twitter using The Naïve Bayes Classifier Algorithm Denia Putri Fajrina; Syafriandi Syafriandi; Nonong Amalita; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/103

Abstract

TikTok is a popular social media platform that has gained a lot of attention lately. People of all ages are using this application to share short videos with their friends and followers. The content on TikTok is diverse and can be tailored to individual preferences, but there have been concerns about the presence of vulgar content that can be viewed by minors as there are no age restrictions. This has led to public scrutiny of the application on social media platforms like Twitter. To address this issue, sentiment analysis was conducted on reviews of the TikTok application to help users make informed decisions about its use. The aim of this analysis was to determine whether people's opinions about TikTok were positive or negative. To achieve this goal, researchers used the hashtag "TikTok Application".The results were classified into two categories positive and negative using the Naïve Bayes Classifier method. The analysis was carried out using 80% training data and 20% testing data, and the results showed an accuracy rate of 80.32%, with a recall value of 97% and a precision value of 78%. In general, positive feedback from Indonesians on the TikTok application is related to the invitation to download the TikTok application, while in negative feedback, information is obtained that the TikTok application is based on content that is inappropriate for TikTok users to download This information can help users make informed decisions about using the TikTok application.
Fuzzy Geographically Weighted Clustering Analysis for Sectoral Potential Gross Regional Domestic Product in West Sumatera Syifa Nabilah Wandira; Zilrahmi; Syafriandi Syafriandi; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/109

Abstract

Gross Regional Domestic Product (GRDP) is the sum of the added value of all goods and services produced or produced in an area that arises as a result of various economic activities in a certain period. Each region certainly has its own advantages and potential, such as in sectors or business fields. GRDP inequality occurs due to differences in geographical conditions and natural resources in each region. The method that can be used to overcome this inequality is cluster analysis. Cluster analysis can group data objects that have the same characteristics so that the inequality that occurs can be seen from the clusters formed. Fuzzy Geographically Weighted Clustering is a clustering method using fuzzy logic which gives a geographic effect to each cluster so that it can better describe the actual cluster situation. The results of  research obtained 3 optimum clusters with different characteristics. Cluster 1 has high potential, cluster 2 has low potential and cluster 3 has medium potential in forming GRDP.
Comparison of Error Rate Prediction in CART for Imbalanced Data Lifia Zullani; Dodi Vionanda; Syafriandi Syafriandi; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/117

Abstract

CART is one of the tree based classification algorithms. CART is a tree consisting of root nodes, internal nodes, and terminal nodes. The accuracy of the model in CART can be calculated by measuring prediction errors in the model. One common method used to predict error rates is cross-validation. There are three cross-validation algorithms, namely leave one out, hold out, and k-fold cross-validation. These methods have different performance in dividing data into training data and testing data, so there are advantages and disadvantages to each method. Every algorithm has its shortcomings; hold out cannot guarantee that the training set represents the entire dataset, leave one out is very time-consuming and requires significant computation because it has to train the model as many times as there are data points, and k-fold provides longer computation time because the training algorithm must be run k times. In reality, the data often encountered is imbalanced. Imbalanced data refers to data with a different number of observations in each class. In CART, imbalanced data affects the prediction results. This research focuses on comparing error rate prediction methods in the CART model with imbalanced data. The study uses three types of data: univariate, bivariate, and multivariate, obtained from differences in population means and correlations between independent variables. The results obtained indicate that the k-fold algorithm is the most suitable error rate prediction algorithm applied to CART with imbalanced data.
Bitcoin Price Prediction Using Support Vector Regression Wulan Septya Zulmawati; Nonong Amalita; Syafriandi Syafriandi; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/121

Abstract

Cryptocurrency provides the most return compared to other investment instruments, causing many novice traders to be attracted to crypto as a tool to make significant profits in the short term. One of the most widely used cryptocurrencies is Bitcoin. Trading is closely related to technical analysis. Various techniques in technical analysis cause beginner traders to have difficulties choosing the right technique. Machine learning methods can be an alternative to overcoming the barriers of beginner traders in the crypto market with predictive methods. One method of machine learning for prediction is Support Vector Regression (SVR). Using the grid search algorithm shows that this method has a good predictive accuracy value of 99,25% and MAPE 0,1206%.
Comparison of the C5.0 Algorithm and the CART Algorithm in Stroke Classification Indah Lestari; Dina Fitria; Syafriandi Syafriandi; Admi Salma
UNP Journal of Statistics and Data Science Vol. 2 No. 1 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss1/144

Abstract

The C5.0 and CART algorithms are similar in terms of velocity and handling of categorical and numeric type data. However, these two algorithms are differences in terms the CART algorithm is binary and classifies categorical, numerical and continuous response variables resulting in classification and regression decision trees. Meanwhile, the C5.0 algorithm is non-binary and classifies categorical response variables resulting in a classification tree. This research aims to classify the Kaggle’s Stroke Prediction Dataset to find out the variables that most influence the risk of stroke, as well as to compare the results of the classification accuracy of the both algorithms. The results of the study showed that CART algorithm has a higher value of accuracy and precision, but its recall value is lower than C5.0. The accuracy value of each algorithm is 77.9% and 77.5%, presision is 89.5% and 83.2%, recall is 67% and 71.4%. Overrall, it can be concluded that there is no difference in classification between the two algorithm. Beside that, in the CART there were 3 variables that most influence on stroke risk, they are age, BMI, and average blood glucose levels. Meanwhile, in C5.0 only 2 variable that most influence, there are age and average blood glucose levels.
Sentiment Analysis about Anti-LGBT Campaign using the Naïve Bayes Classifier rios; Syafriandi Syafriandi; Dony Permana; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 2 No. 1 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss1/146

Abstract

Social media is growing so that the news that is discussed is also very fast to be known by everyone. The news or topic that is being discussed on social media is the anti-LGBT campaign. The conversation about the anti-LGBT campaign is expressed in the form of opinions that contain positive and negative feelings. The opinion is conveyed through Twitter. Twitter is a microblogging social media site that allows users to create short messages and share them easily and quickly. Opinions on Twitter are used to see whether the opinion rejects or supports the anti-LGBT campaign. The use of sentiment analysis helps to see the opinion supports or rejects the anti-LGBT campaign. The algorithm used to perform sentiment analysis is the Naïve Bayes Classifier. The purpose of this study is to determine the sentiment analysis of anti-LGBT campaign tweets on Twitter. This study using Phython as the tools. The dataset used is 3103 tweets with 80% training data and 20% test data. The sentiment analysis results obtained in this study show that Twitter users in Indonesia have more positive opinions. The use of the Naïve Bayes Classifier algorithm produces an accuracy of 68,75%, precision of 99,6%, and recall of 92,8%.
Sentiment Analysis of DANA Application Reviews on Google Play Store Using Naïve Bayes Classifier Algorithm Based on Information Gain Cindy Caterine Yolanda; Syafriandi Syafriandi; Yenni Kurniawati; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 2 No. 1 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss1/147

Abstract

DANA is a digital payment platform that provides various features to make it easier for users to make payments, transfers, and balance replenishment online. DANA application users provide a variety of reviews that include both constructive and critical opinions, which can be valuable input for DANA application developers. The purpose of this research is to evaluate the results of sentiment classification of DANA application user reviews on the Google Play Store service using the Naïve Bayes Classifier method and Information Gain feature selection. In addition, this study aims to assess the effect of applying IG feature selection on the performance of the resulting model. In this study, reviews are divided into two categories, namely positive and negative based on lexicon-based labeling. Furthermore, data weighting, feature selection, and data division are carried out with a proportion of 80% train data and 20% test data before model building. There are two models, namely a model without feature selection (NBC model) and a model with feature selection (NBC-IG model). The evaluation results showed that the NBC model with 1106 features performed well, with 82.91% accuracy, 83.96% precision, and 90.23% recall. Meanwhile, the NBC-IG model with 536 features showed higher performance, with 85.09% accuracy, 85.79% precision, and 92.09% recall. The application of IG feature selection with the IG value limit parameter > 0.01 in the NBC model successfully reduced the number of features by 570, and improved model performance with an increase in accuracy by 2.18%, precision by 1.83%, and recall by 1.86%.