Claim Missing Document
Check
Articles

Found 41 Documents
Search
Journal : UNP Journal of Statistics and Data Science

Comparison K-Means and Fuzzy C-Means Methods to Grouping Human Development Index Indicators in Indonesia Belia Mailien; Admi Salma; Syafriandi; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 1 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (798.41 KB) | DOI: 10.24036/ujsds/vol1-iss1/4

Abstract

The Human Development Index (HDI) is an important indicator to measure the success of efforts to improve people's quality of life. The increase in the human development index in Indonesia is not accompanied by an even distribution of the human development index in every district/city in Indonesia. To facilitate the government in making policies and plans in overcoming the uneven HDI in Indonesia, it is necessary to group districts/cities in Indonesia based on HDI indicators. This study discusses the use of the K-means and Fuzzy C-Means algorithms with a total of 4 clusters. The grouping results obtained summarize that most districts/cities in Papua Island have low HDI indicators. The achievement of the HDI indicator in the medium category on the K-Means and Fuzzy C-Means methods is the same, spread across all major islands in Indonesia. However, the Nusa Tenggara Islands generally have a medium HDI indicator achievement. The achievements of the HDI indicators with high categories in the K-Means and Fuzzy C-Means methods are mostly found on the islands of Sumatra, Java, Kalimantan, and Sulawesi. The achievement of the HDI indicator in the very high category in the K-Means and Fuzzy C-Means methods is found in provincial capitals in several provinces in Indonesia as well as in big cities in Indonesia. The results of this study indicate that the S_DBW index and C_index values of the Fuzzy c-means method are smaller than the K-Means method, namely 2.312 and 0.105.
Adding Exogenous Variable in Forming ARIMAX Model to Predict Export Load Goods in Tanjung Priok Port Elvina Catria; Atus Amadi Putra; Dony Permana; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 1 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (892.487 KB) | DOI: 10.24036/ujsds/vol1-iss1/10

Abstract

The main idea of world maritime has been launched by the Indonesia’s Government through the development of inter-island connectivity, namely a logistics distribution line system using cargo ships with scheduled routes. However, in terms of inter-island sea transportation connectivity using sea transportation, the number of ships used for loading and unloading activities at Tanjung Priok in 2020 reached 11,876 units, which number decreased by 12.6% compared to the previous year, this figure was not sufficient for transportation of Indonesian loading and unloading goods (exports). This condition is important to note because the implementation of sea transportation, especially for sea toll transportation, if it cannot reach all regions, will cause freight transportation in some areas to be limited and regional economic growth cannot be distributed evenly. The purpose of this study is to predict the number of goods loaded (exported) at the Port of Tanjung Priok, by establishing an export forecasting model. Exogenous variable in the form of the Indonesian Wholesale Price Index. After analyzing the data, the order of the ARIMA model (5,1,1) was obtained as a parameter to estimate the ARIMAX model. From the ARIMAX model (5,1,1), the model's accuracy rate is 13.25% which is quite feasible to use to predict the total export cargo for the period January 2021-December 2021. Forecasting results show better fluctuations than in 2020.
Grouping The Districts in Sumatera Region Based on Economic Development Indicators Using K-Medoids and CLARA Methods Retsya Lapiza; Syafriandi; Nonong Amalita; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 1 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (795.074 KB) | DOI: 10.24036/ujsds/vol1-iss1/13

Abstract

Inequality in economic development is an economic problem that is often felt by developing countries. In Indonesia, one of the regional areas that has not yet experienced equal distribution of economic development is the regencies/cities of the Sumatera Region. This study aims to determine regional groups and compare the results of grouping with the K-Medoids and CLARA methods. The K-Medoids and CLARA methods are non-hierarchical methods that are strong against outliers. While the best selection method is done by comparing the silhouette coefficient. The results obtained in this study using the K-Medoids and CLARA methods with 2 groups being better than forming 3 groups. The K-Medoids method resulted in cluster 1 as many as 59 districts/cities and cluster 2 as many as 95 districts/cities. Meanwhile, the grouping of districts/cities using the CLARA method with 2 groups resulted in cluster 1 as many as 74 districts/cities and cluster 2 as many as 80 districts/cities. From the comparison of the two methods, the silhouette coefficient values using the K-Medoids and CLARA methods are 0.13 and 0.15 respectively. Therefore, the CLARA method with 2 groups gave better cluster results
Self Organizing Maps Method for Grouping Provinces in Indonesia Based on the Landslide Impact Suwanda Risky; Syafriandi; Dony Permana; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/15

Abstract

Indonesia is a disaster-prone country due to its climatic, soil, hydrological, geological, and geomorphological conditions. A disaster is an event or chain of events that threatens and disrupts people's lives and livelihoods. A natural disaster is a disaster caused by an event or series of events caused by nature such as a landslide. The number of landslide disaster events in Indonesia varies from province to province, this is due to differences in the characteristics of each province in Indonesia. So that the impact caused by the landslide disaster is also different. Therefore, it is necessary to group and profile so that it can be known which province has the largest impact on landslide disasters. This study used the Self Organizing Maps method in a grouping. The number of clusters to be formed is 3 based on the optimal value of internal cluster validation (Dunn, Connectivity, and Silhouette Index). Cluster 1 consists of 31 provinces, and the average impact of landslides is small. In cluster 2 consisting of 2 provinces, there are 4 dominantly more significant impacts. Cluster 3 consisting of 1 province has 1 dominant impact greater. So it can be concluded that most provinces in Indonesia have a relatively small impact on landslide disasters. However, some provinces have a very large impact on landslides, namely the provinces of West Java, Central Java, and East Java.
Vector Error Correction Model for Cointegration Analysis of Factors Affecting Indonesia's Economic Growth during the Pandemic Period Rizqa Fajriaty Fitri MY; Dina Fitria; Syafriandi Syafriandi; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/40

Abstract

Stabel economic growth is the ultimate goal of monetary policy s seen from the stability of the rupiah. The economic situation has decreased due to the spread of Covid-19. In an effort to stabilize the economy, the relationship between factors supporting Indonesia's economic growth is analyzed using the VECM approach. This approach is able to determine the long-term and short-term relationships of time series data. The model results after fulfilling several tests are three significant equations. The model explains that there is an effect in the short term of the inflation and BI Rate variables on inflation as well as the inverse effect between BI-rate one period earlier on the exchange rate. The cointegration coefficient is negative, it indicates that there is a short-term to long-term adjustment mechanism that occurs in the inflation variable. The two cointegration equations for the long term show that for the long term, inflation can be positively influenced by the visa variable. Variable BI-rate in the long run is influenced by the variable exchange rate and visa. The VECM model can explain more than 50% of the variables.
Comparison of Distance Function in K-Nearest Neighbor Algorithm to Predict Prospective Customers in Term Deposit Subscriptions Muhammad Tibri Syofyan; Nonong Amalita; Dodi Vionanda; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/47

Abstract

Data mining is often used to analysis of the big data to obtain new useful information that will be used in the future. One of the best algorithms in data mining is K-Nearest Neighbor (KKN). K-NN classifier is a distance-based classification algorithm. The distance function is a core component in measuring the distance or similarity between the tested data and the training data. Various measure of distance function exist make this a topic of kind literature problems to determining the best distance function for the performance of the K-NN classifier. This study aims to compare which distance function produces the best K-NN performance. The distance function to be compared is the Manhattan distance and Minkowski distance. The application of K-NN classifier using bank dataset about predict prospective customers in Term Deposit Subscriptions. This study show that Minkowski distance on K-NN algorithm achieved the best result compared to Manhattan distance. Minkowski distance with power p = 1.5 produces an accuracy rate of 88.40% when the K value is 7. Thus, performance of K-NN algorithm using Minkowski distance (p=1,5, K=7) is best algorithm in predicting prospective costumers in Term Deposit Subscription
Geographically Weighted Panel Regression for Modeling The Percentage of Poor Population in West Sumatra Jimmi Darma putra; Dina Fitria; Dodi Vionanda; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/64

Abstract

Geographically Weighted Panel Regression (GWPR) model applies panel regression to spatial data, and parameter estimation is carried out using spatial weight at each observation point. The purpose of this study is to determine the GWPR model and the factors that influence the percentage of poor people in each district/city in West Sumatra Province from 2015 to 2021. And the adaptive bisquare kernel function was used to provide spatial weighting, and Cross-Validation (CV) criteria were used to identify the optimal bandwidth. The research data was secondary data sourced from the official website and West Sumatra published books in Sumatera Barat Dalam Angka from 2015 to 2021. The GWR model and the FEM panel data regression model are combined to create the GWPR model. The results of this study is there are a differences between models and factors that affecting the poor percentages in 19 districts/cityes of West Sumatra.
Sentiment Analysis of Electric Cars Using Naive Bayes Classifier Method NURUL AFIFAH; Dony Permana; Dodi Vionanda; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/68

Abstract

In recent years, electric cars have become increasingly popular as an alternative to environmentally friendly vehicles in the automotive industry. These vehicles use electric power as an energy source that can mitigate the reliance on fossil fuels contribute to efforts to minimize greenhouse gas emissions and air pollution. However, the presence of electric cars raises pro and con opinions from the public. the conversation about electric cars has become one of the hot on social media. Twitter is a social media microblogging that permits its users to create short messages and share them easily and quickly. These opinions require sentiment analysis. The purpose of conducting sentiment analysis is to find out how people's perceptions and opinions on electric cars are leading in a favorable or unfavorable direction. Thus, sentiment analysis can help companies marketing strategies, and better business decisions. Then the opinions will be classified based on positive and negative categories. This investigation employs the naive classifier method to generate positive and negative sentiment towards electric cars on Twitter. The accuracy results of naive bayes obtained by using a confusion matrix in this research are 77.8%, with a dataset split composition of 70%:30%.
Step Function Intervention Analysis Model to Estimate Number of Aircraft Passengers in Minangkabau International Airport Velya Rahma Putri; Zilrahmi; Syafriandi Syafriandi; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/77

Abstract

Pandemic of Covid-19 had a quite big impact in air transportation. Minangkabau International Airport (BIM) has also felt the impact of this pandemic, namely a drastic decrease in the number of airplane passengers or there was an intervention event.a stable of airplane passengers is needed to indicate a stable economy in the transportation sector. If there are no passengers or flight activity in an area, it means that there are no entry and exit of economic activities, industrial activities, tourism and trade which help economic development. For this reason, it is necessary to do forecasting so that the problems that arise as a result of the drastic decline can be resolved by making new policies. Forecasting was carried out in this study to obtain an intervention model that will be used for forecast the next 12 months and predict how long the effect of the intervention will last for avoid further losses due to the continued decline in the number of passengers. The intervention model is considered better for data that has intervention variable compared to SARIMA models. The results of forecasting showed that the SARIMA model (0,1,1)(1,1,1)12 b = 0, s = 8, r = 1 is the best model that can be used for forecasting data containing interventions. This is evidenced by the small MAPE of 36.34% so that the model is feasible to use because the accuracy is quite high and close to the actual value.
Perbandingan Metode Prediksi Laju Galat dalam Pemodelan Klasifikasi Algoritma C4.5 untuk Data Tidak Seimbang Yunistika Ilanda; Dodi Vionanda; Yenni Kurniawati; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/89

Abstract

Classification modeling can be formed using the C4.5 algorithm. The model formed by the C4.5 algorithm needs to be seen for its prediction accuracy using the error rate prediction method. Imbalanced data causes an increase in the classification error of the C4.5 algorithm because the prediction results do not represent the entire data and worsen the performance of the error rate prediction method. Meanwhile, the case of data with different correlations is carried out to find out whether different correlations affect the performance of the error rate prediction method. The purpose of the research is to find out the most suitable error rate prediction method applied to the C4.5 algorithm in the case of imbalanced data and the influence of different correlations. The results show that the K-Fold CV method is the most suitable prediction method applied to the C4.5 algorithm for imbalanced data cases compared to the HO and LOOCV methods. In addition, high correlation can worsen the performance of error rate prediction methods.