Claim Missing Document
Check
Articles

Found 39 Documents
Search

Comparison K-Means and Fuzzy C-Means Methods to Grouping Human Development Index Indicators in Indonesia Belia Mailien; Admi Salma; Syafriandi; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 1 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (798.41 KB) | DOI: 10.24036/ujsds/vol1-iss1/4

Abstract

The Human Development Index (HDI) is an important indicator to measure the success of efforts to improve people's quality of life. The increase in the human development index in Indonesia is not accompanied by an even distribution of the human development index in every district/city in Indonesia. To facilitate the government in making policies and plans in overcoming the uneven HDI in Indonesia, it is necessary to group districts/cities in Indonesia based on HDI indicators. This study discusses the use of the K-means and Fuzzy C-Means algorithms with a total of 4 clusters. The grouping results obtained summarize that most districts/cities in Papua Island have low HDI indicators. The achievement of the HDI indicator in the medium category on the K-Means and Fuzzy C-Means methods is the same, spread across all major islands in Indonesia. However, the Nusa Tenggara Islands generally have a medium HDI indicator achievement. The achievements of the HDI indicators with high categories in the K-Means and Fuzzy C-Means methods are mostly found on the islands of Sumatra, Java, Kalimantan, and Sulawesi. The achievement of the HDI indicator in the very high category in the K-Means and Fuzzy C-Means methods is found in provincial capitals in several provinces in Indonesia as well as in big cities in Indonesia. The results of this study indicate that the S_DBW index and C_index values of the Fuzzy c-means method are smaller than the K-Means method, namely 2.312 and 0.105.
Comparison of the Performance of the K-Means and K-Medoids Algorithms in Grouping Regencies/Cities in Sumatera Based on Poverty Indicators Mardhiatul Azmi; Atus Amadi Putra; Dodi Vionanda; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 2 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1100.498 KB) | DOI: 10.24036/ujsds/vol1-iss2/25

Abstract

K-Means is a non-hierarchical approach that separates data into a number of groups according on how far an object is from the closest centroid. K-Medoids is a non-hierarchical clustering technique that separates data into a number of groups according on how far away an object is from the closest medoid. The two approaches were put to the test using data on poverty in Sumatra in 2021, when the Covid-19 outbreak had caused the poverty rate to increase from the year before. This research is an applied research which begins by studying relevant theories. The data used in this study is secondary data sources from the BPS website regarding poverty indicators. This study aims to determine regional groups and compare the results of grouping with the k-means and k-medoids methods. To find out the best performance between the two methods, that is by looking at the lowest Davies Bouldin Index (DBI). The results of this study are the k-means algorithm produces as many as 34 districts/cities incorporated in cluster 1, 52 districts/cities in cluster 2, 23 districts/cities in cluster 3, and 45 districts/cities in cluster 4. k-medoids, namely in clusters 1, 2, 3, and 4, respectively, as many as 53, 40, 37, and 24 districts/cities. Based on the results of the grouping, the DBI k-means of 1,584 and k-medoids of 2,359 were obtained. This means that the k-means algorithm is better than the k-medoids, because the k-means DBI is smaller than the k-medoids.
Comparison of Naïve Bayes and K-Nearest Neighbor for DKI Jakarta Air Pollution Standard Index Classification Nurdalia; Zilrahmi; Dony Permana; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 2 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (817.962 KB) | DOI: 10.24036/ujsds/vol1-iss2/29

Abstract

Data mining is the process of extracting and searching for useful knowledge and information using certain algorithms or methods according to knowledge or information. The data mining classification methods used in this study are Naïve Bayes and K-Nearest Neighbor. By using the Naïve Bayes and K-Nearest Neighbor methods, it is possible to classify the DKI Jakarta air pollution standard index in 2021 based on six air pollutants, namely dust particles (PM10), dust particles (PM2.5), sulfur dioxide (SO2), carbon monoxide. (CO), ozone (O3) and nitrogen dioxide (NO2). The test was carried out to determine the accuracy in predicting the DKI Jakarta air pollution standard index in 2021 using the confusion matrix evaluation value. So that the best performance of the two methods is found in the Naïve Bayes algorithm with high Naïve Bayes sensitivity values ​​for all categories even though there are data in minority or unbalanced categories, and the frequency of data from each category or in this case the data is not balanced, the Naïve Bayes algorithm shows good performance in accuracy, sensitivity, specificity.
Survey Training for Collecting Data of Nagari Tanjung Balik Dina Fitria; Nonong Amalita; Syafriandi Syafriandi; Zilrahmi Zilrahmi; Admi Salma; Dodi Vionanda; Yenni Kurniawati
Pelita Eksakta Vol 6 No 1 (2023): Pelita Eksakta Vol. 6 No. 1
Publisher : Fakultas MIPA Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/pelitaeksakta/vol6-iss1/202

Abstract

Collecting data is the initial stage of data processing. Such that, it is needed to make sure the data collected is representative. Surveyor is one of its principal components. But, Nagari as a small component of a residence lack of professional surveyor for the work of the survey. The Statistics Department as a producer of statistician gives training to local residents to collect their own data using the right method in Nagari Tanjung Balik
Rainfall Forcasting in Medan City Using Singular Spectrum Analysis (SSA) Silvia Agustina; Fadhilah Fitri; Dodi Vionanda; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/52

Abstract

Singular spectrum analysis is a time series analysis that can be used for data that has seasonal effects. Rainfall is one example that has a seasonal effect. High rainfall has an impact on natural disasters such as floods. Medan city is the capital city of North Sumatra province which has quite high rainfall and is a lowland area, so it has the potential for flooding. Rainfall forecasting can be done as disaster mitigation. The forecasting method used is SSA. The MAPE forecasting accuracy value obtained is 15.5% and the tracking signal is within tolerance limits, so that it can be concluded that the forecasting is done well.
Modeling Human Development Index in Papua and West Sumatera with Multivariate Adaptive Regression Spline Yulia Pertiwi; Dony Permana; Nonong Amalita; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/54

Abstract

The Human Development Index (HDI), is an indicator of the successful development of the quality of human life. The high value of HDI, shows the better development of a region. The purpose of this study is to model and determine the factors affect HDI in Papua Province and West Sumatera Province, using Multivariate Adaptive Regression Spline (MARS). MARS is one of the modeling methods that can handle high-dimensional data. The result of this study showed that the best MARS model for Papua Province is a combination of (BF=24, MI=2, and MO=0) with a minimum GCV value of 0.55953. while the best MARS model for West Sumatera Province is a combination of (BF=24, MI=2, and MO=0) with a minimum GCV value of 0.02697. Based on the model, the factors that significantly affect HDI in Papua Province and West Sumatera Province are average years of schooling (X2), adjusted per-capita income (X6), life expectancy (X1), percentage of poor people (X4), and gross regional domestic product (X3). The percentage level of importance of each variable for Papua Province is 100%, 45.26%, 29.24%, 6.55%, and 6.27%. Meanwhile, for West Sumatera Province it is 100%, 96.73%, 57.54%, 34.13%, and 29.6%, respectively. So in this case, based on the results of the study, the average years of schooling (X2) is the variable that most influences HDI in the two regions, with an importance level of 100%.  
Geographically Weighted Panel Regression for Modeling The Percentage of Poor Population in West Sumatra Jimmi Darma putra; Dina Fitria; Dodi Vionanda; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/64

Abstract

Geographically Weighted Panel Regression (GWPR) model applies panel regression to spatial data, and parameter estimation is carried out using spatial weight at each observation point. The purpose of this study is to determine the GWPR model and the factors that influence the percentage of poor people in each district/city in West Sumatra Province from 2015 to 2021. And the adaptive bisquare kernel function was used to provide spatial weighting, and Cross-Validation (CV) criteria were used to identify the optimal bandwidth. The research data was secondary data sourced from the official website and West Sumatra published books in Sumatera Barat Dalam Angka from 2015 to 2021. The GWR model and the FEM panel data regression model are combined to create the GWPR model. The results of this study is there are a differences between models and factors that affecting the poor percentages in 19 districts/cityes of West Sumatra.
Comparison of the Chen and Sinsgh’s Fuzzy Time Series Methods in Forecasting Farmer Exchange Rates in Indonesia Okia Dinda Kelana; Atus Amadi Putra; Nonong Amalita; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/36

Abstract

Chen and Singh's Fuzzy Time Series Model is a forecasting method that uses the basi fuzzy logic in the process. The differences in the models are in the fuzzy logic relations. Chen's model uses Fuzzy Logical Relationship Groups. Meanwhile, the Singh model uses only Fuzzy Logical Relationships in the forecasting process. To find out the best model between the two models, forecasting the Farmer's Exchange Rate is carried out. Farmers' exchange rates are the option for observers of agricultural development in assessing the level of welfare of farmers in Indonesia. With changes in farmer exchange rates every month, it is necessary to forecast data in order to obtain an overview for the following month. Research used is applied research where the initial step is to study and analyze the theories related to our research, then colect the necessary data. The data used is data secondary data obtained online from the official website of the Badan Pusat Statistika (BPS). the forecasting results of the two models were compared using MAPE. The results of the comparison of the accuracy of the prediction accuracy of Chen and Singh's fuzzy time series models on farmers' exchange rates obtained Chen's MAPE fuzzy time series values ​​of 0.679% and Singh's fuzzy time series models of 0.354%. This means that the best forecasting model for farmer exchange rates in Indonesia is the Singh model.
Comparison of Error Rate Prediction Methods in Classification Modeling with Classification and Regression Tree (CART) Methods for Balanced Data Fitria Panca Ramadhani; Dodi Vionanda; Syafriandi Syafriandi; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/73

Abstract

CART (Classification and Regression Tree) is one of the classification algorithms in the decision tree method. The model formed in CART is a tree consisting of root nodes, internal nodes, and terminal nodes. After the model is formed, it is necessary to calculate its accuracy. The aim is to see the performance of the model. The accuracy of this model can be determined by calculating the predicted error rate in the model. The error rate prediction method works by dividing the data into training data and testing data. There are three methods in the error rate prediction method: Leave One Out Cross Validation (LOOCV), Hold Out (HO), and K-Fold Cross Validation. These methods have different performance in dividing data into training data and testing data, so there are advantages and disadvantages to each method. Therefore, a comparison was made between the three error rate prediction methods with the aim of determining the appropriate method for the CART algorithm. This comparison was made by considering several factors, for instance, variations in the mean, the number of variables, and correlations in normally distributed random data. The results of the comparison will be observed using a boxplot by looking at the median error rate and the lowest variance. The results of this study indicate that the K-Fold Cross Validation method has the lowest median error rate and the lowest variance, so the most suitable error prediction method for the CART method is the K-Fold Cross Validation method
Comparison of Fuzzy Time Series Markov Chain and Fuzzy Time Series Cheng to Predict Inflation in Indonesia Ihsanul Fikri; Admi Salma; Dodi Vionanda; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/76

Abstract

Inflation is one of the main microeconomic problems which is a very important economic indicator. Unstable inflation has a negative impact on people’s welfare, thus controlling inflation is important thing for a country. Forecasting is needed to monitor future movements in the inflation rate. In this study, the Fuzzy Time Series Markov Chain and fuzzy time series Cheng methods will be compared in forecasting inflation. The advantage of the fuzzy time series method is that it does not have any special assumptions thet must be met. The purpose of this study is to determine the results of forecasting based on the results of the comparison of the two methods. The result of the comparison of the two methods based on the MAPE value is that fuzzy time series Markov Chain has the smallest value of 6,97%. The result of inflation forecasting for the next 5 periods using the fuzzy time series Markov Chain method is 5,42; 5,71; 5,95; 5,82 and 6,10.