Claim Missing Document
Check
Articles

Found 46 Documents
Search
Journal : UNP Journal of Statistics and Data Science

Grouping Level of Poverty Based on District/City in Indonesia Using K-Harmonic Means nabillah putri; Nonong Amalita; Dodi Vionanda; Dony Permana
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/60

Abstract

Indonesia still has a relatively high poverty rate, although nationally it has declined in recent years. There are areas that are still experiencing increasing poverty rates. So that the currently planned poverty alleviation plans are no longer uniform, but need to pay attention to the conditions of each dimension that cause poverty in an area, so it is necessary to group districts/cities in Indonesia on poverty. Grouping was performed using K-Harmonic Means analysis. K-Harmonic Means is a non-hierarchical clustering that takes the average of the harmonic distance between each data point and the cluster’s center. The data used in this research is secondary data sourced from BPS publications on poverty and inequality in 2022. The analysis technique is carried out by standardizing the data, conducting cluster analysis, and validating clusters. Based on the results of the K-Harmonic Means analysis, the optimal number of clusters is two clusters that first cluster has 54 districts/cities while second cluster has 460 districts/cities and the Dunn Index value for cluster validation is 0,03492. So that a better grouping level of poverty based on district/city in Indonesia is obtained by using the K-Harmonic Means method with p = 2,25.
Geographically Weighted Panel Regression Modeling on Human Development Index in West Sumatra Amelia Fadila Rahman; Syafriandi Syafriandi; Nonong Amalita; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/63

Abstract

  The Human Development Index (HDI) is an important issue that has a negative impact on the field of human development and people's welfare in West Sumatra Province. The HDI is being attempted to be solved by identifying the contributing components. Geographically Weighted Panel Regression (GWPR) is a technique that can be used to find influencing factors and explain the influence of characteristic areas of observation. GWPR is a combination of panel data regression method with GWR which is used when the data has the influence of spatial heterogeneity. The purpose of this study is to form a GWPR model that will be applied to the HDI in Regencies/Cities in West Sumatera from 2019 to 2022. Modeling using GWPR Fixed Effect Model. With a minimum CV of 0,000208, the wighter function utilized is a fixed exponential kernel. The findings demonstrated that the model obtained had an of 99.9%, meaning the predictor variable could account for the model by this percentage. Variables that have a significant on HDI are Life Expectancy, Expected Years of Schooling, Mean Years of Schooling, and Purchasing Power Parity.
Comparison of Queen Contiguity and Customized Weighting Matrices on Spatial Regression to Identify Factors Impacting Poverty in East Java Gezi Fajri; Syafriandi Syafriandi; Nonong Amalita; Zamahsary Martha
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/67

Abstract

Poverty is crucial problem that negative impact on all sectors, including economic, social, and cultural development in East Java Province. Poverty can also increase unemployment, crime, trigger social disasters and hinder progress East Java province. One efforts overcome problem of poverty in East Java province is detect factors that influence. This effort can be done through statistical modeling to determine factors that influence poverty in East Java province. statistical model that can identify factors that influence poverty and explain relationship between region and surrounding area is spatial regression analysis. In spatial regression analysis, spatial weighting matrix is needed to determine spatial influences between regions where one region influences neighboring regions. spatial weighting matrices that is often used is queen contiguity, and according to Anselin (1988:20), this spatial weighting also considers initial information, purpose of case studied, and theory underlying the research. This weighting uses social and economic variables case under study, namely customized weighting matrix. Based on results of this study, shows that best spatial regression and spatial weighting models are General Spatial Model (GSM) with customized weighting because customized weighting produces better estimation results than SAR, SEM and GSM models with queen contiguity weighting in district and city poverty modeling in East Java province with Akaike Infomation Criterion (AIC) value of 188.77 and detemination coefficient (R2) of 84.95%. School's Expected Time, Life Expectancy Score, and Employment Participation Rate are factors that will have substantial impact on percentage of population living in poverty East Java's districts and cities in 2021.
Comparison of the Chen and Sinsgh’s Fuzzy Time Series Methods in Forecasting Farmer Exchange Rates in Indonesia Okia Dinda Kelana; Atus Amadi Putra; Nonong Amalita; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/36

Abstract

Chen and Singh's Fuzzy Time Series Model is a forecasting method that uses the basi fuzzy logic in the process. The differences in the models are in the fuzzy logic relations. Chen's model uses Fuzzy Logical Relationship Groups. Meanwhile, the Singh model uses only Fuzzy Logical Relationships in the forecasting process. To find out the best model between the two models, forecasting the Farmer's Exchange Rate is carried out. Farmers' exchange rates are the option for observers of agricultural development in assessing the level of welfare of farmers in Indonesia. With changes in farmer exchange rates every month, it is necessary to forecast data in order to obtain an overview for the following month. Research used is applied research where the initial step is to study and analyze the theories related to our research, then colect the necessary data. The data used is data secondary data obtained online from the official website of the Badan Pusat Statistika (BPS). the forecasting results of the two models were compared using MAPE. The results of the comparison of the accuracy of the prediction accuracy of Chen and Singh's fuzzy time series models on farmers' exchange rates obtained Chen's MAPE fuzzy time series values ​​of 0.679% and Singh's fuzzy time series models of 0.354%. This means that the best forecasting model for farmer exchange rates in Indonesia is the Singh model.
Pemodelan Waktu Survival Pasien Tuberkulosis menggunakan Regresi Cox Proportional Hazard dengan Data Tersensor Elsa Oktaviani; Nonong Amalita; Atus Amadi Putra; Dony Permana
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/65

Abstract

Tuberculosis is an infectious disease that needs to be watched out for in West Sumatra Province. West Sumatra Province is the province with the 12th highest TB case in Indonesia in 2021 with a total of 8,216 TB cases and a TB treatment cure rate that is still far from the target of the Indonesian Ministry of Health. The purpose of this study is to determine the Cox proportional hazard regression model and factors that affect the survival time of tuberculosis patients at Dr. M. Djamil Padang Hospital. The survival period used is the time when the patient is taking TB treatment at RSUP Dr.  M. Djamil Padang in 2021 until the patient is declared dead. The method used in the Cox Proportional Hazard Regression analysis is the Maximum Partial Likelihood Estimation Method. By using the cox proportional hazard regression model, the factors that influence the survival time of tuberculosis patients at RSUP Dr.  M. Djamil's BMI , leukocytes , fever , shortness of breath , and decreased appetite . 
Comparasion of Error Rate Prediction Methods of C4.5 Algorithm for Balanced Data Ichlas Djuazva; Dodi Vionanda; Nonong Amalita; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/74

Abstract

C4.5 is a highly effective decision tree algorithm for classification purposes. Compared to CHAID, Cart, and ID3, C4.5 generates the decision tree faster and is easier to understand. However, C4.5 algorithm is also not exempt from errors in classification, which can impact the accuracy of the resulting model. Model accuracy could be measured by predicting the error rate. One commonly used method for error rate prediction is cross-validation. The cross-validation method divides data into two parts: training set to build model and testing set to test the model. There are several cross-validation techniques commonly used to predict the error rate, such as Leave One Out Cross Validation (LOOCV), Hold Out (HO), and k-fold cross-validation. LOO has unbiased estimation but takes a long time and depends on the data size; HO could avoid overfitting and work faster; and k-folds cross validation has a smaller error rate prediction.   This study uses artificially generated data with a normal distribution, including univariate, bivariate, and multivariate datasets with various combinations of mean differences and different correlations. Different correlation structures are applied to see the impact of these different correlations on the error rate prediction method. Considering these factors, this research focuses on comparing three cross-validation methods to predict error rates for the decision tree model generated by C4.5 algorithm. This research found that k-folds cross-validation is the most suitable cross-validation method to apply when testing the model generated by C4.5 algorithm with balanced data
Analysis of the Poverty Level Model for West Sumatra Province Using Geographically Weighted Binary Logistic Regression april leniati; Dony Permana; Nonong Amalita; Zamahsary Martha
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/80

Abstract

T   West Sumatra Province (West Sumatra) ranks third lowest in terms of the poverty rate on the island of Sumatra in 2022, with a figure of 5.92%. Although this figure is lower than the national average, the Province of West Sumatra is targeting a reduction in the poverty rate to 5.62% in 2024 in the vision of the 2021–2026 Regional Development Plan. The purpose of this study is to analyze the factors that contribute to the poverty rate in West Sumatra Province based on geography in 2022. The method used to address poverty problems is Geographically Weighted Binary Logistic Regression (GWBLR), which takes geographical influences into account in the analysis. This study uses data on the percentage of poor people (Y) and the influencing factors, namely life expectancy (X1), literacy rate (X2), labor force participation (X3), and economic growth (X4). The results showed that based on the lowest Akaike Information Criterion Corrected (AICc) value, the GWBLR model with a Fixed Gaussian Kernel weight is the best at modeling the problem of poverty in West Sumatra in 2022. According to the model, the life expectancy variable will have a significant impact on the level of poverty in 13 districts and cities in West Sumatra Province in 2022.
Comparison of Error Rate Prediction Methods in Classification Modeling with the CHAID Method for Imbalanced Data Seif Adil El-Muslih; Dodi Vionanda; Nonong Amalita; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/81

Abstract

CHAID (Chi-Square Automatic Interaction Detection) is one of the classification algorithms in the decision tree method. The classification results are displayed in the form of a tree diagram model. After the model is formed, it is necessary to calculate the accuracy of the model. The aims is to see the performance of the model. The accuracy of this model can be done by calculating the predicted error rate in the model. There are three methods, such as Leave one out cross-validation (LOOCV), Hold-out, and K-fold cross-validation. These methods have different performances in dividing data into training and testing data, so each method has advantages and disadvantages. Imbalanced data is data that has a different number of class observations. In the CHAID method, imbalanced data affects the prediction results. When the data is increasingly imbalanced the prediction result will approach the number of minority classes. Therefore, a comparison was made for the three error rate prediction methods to determine the appropriate method for the CHAID method in imbalanced data. This research is included in experimental research and uses simulated data from the results of generating data in RStudio. This comparison was made by considering several factors, for the marginal opportunity matrix, different correlations, and several observation ratios. The results of the comparison will be observed using a boxplot by looking at the median error rate and the lowest variance. This research finds that K-fold cross-validation is the most suitable error rate prediction method applied to the CHAID method for imbalanced data.
Comparison of Error Rate Prediction Methods in Binary Logistic Regression Model for Balanced Data Shavira Asysyifa S; Dodi Vionanda; Nonong Amalita; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/90

Abstract

Binary Logistic Regression is one of the statistical methods that can be  used to see the relations between dependent variable with some independent variables, where the dependent variable split into two categories, namely the category declaring a successful event and the category declaring a failed event. The performance of binary logistic regression can be seen from the accurary of the model. Accuracy can be measured by predicting the error rate. One method that can be used to predict error rate is cross validation. The cross validation method works by dividing the data into two parts, namely testing data and training data. Cross validation has several learning methods that are commonly used, namely Leave One Out (LOO), Hold out, and K-fold cross validation. LOO has unbiased estimation of accuracy but take a long time, hold out can avoid overfitting and works faster because no iterations, and k-fold cross validation has smaller error rate prediction. Meanwhile, data cases with different correlation are useful to find out the different correlations effect performance of error rate prediction method. In this study uses artificially generated data with a normal distribution, including univariate, bivariate, and multivariate datasets with various combination of mean differences and correlation. Considering these factors, this study focuses on comparing the three cross validation methods for predicting error rate prediction in binary logistic regression. This study finds out that k-fold cross validation method is the most suitable method to predict errors in binary logistic regression modeling for balanced data.
Classification of Nutrition Problems for Indonesian Toddler With Decision Tree Algorithm C4.5 Nadha Ovella Syaqhasdy; Zamahsary Martha; Nonong Amalita; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/98

Abstract

Having excellent human resources is essential for Indonesia's development. The development of Indonesia is the key to improving the quality of life for its citizens, and a focus on this development can have a positive impact on the health and economy of the community. A healthy and educated generation is fundamental for the expected progress of this nation, as nutritional status is a significant factor affecting the quality of human resources. Nutritional problems can lead to serious consequences, such as abnormal physical growth, a decline in IQ quality, and even death. The objective of this research is to analyze the factors that influence the nutritional status of toddlers by classifying each variable using a decision tree. A decision tree is a flowchart resembling a branching tree structure. The C4.5 algorithm was utilized in this study. This algorithm can process both numeric and categorical data, handle missing attribute values, and generate easily interpretable rules. After conducting the analysis, it was found that the decision tree's results indicated that the attribute "Stunting < 20%" is a determining factor for acutechronic malnutrition issues in toddlers. There are 392 districts and cities in Indonesia where the prevalence of stunted toddler nutritional status is less than 20%. The model created using the C4.5 algorithm was evaluated using a confusion matrix, resulting in an accuracy of 99.8% and a kappa value close to 1. This indicates that the model is capable of accurately classifying toddler nutrition problems in Indonesia.
Co-Authors Addini, Vidhiya Ade Eriyen Saputri Adinda Dwi Putri Admi Salma Aldwi Riandhoko Ali Asmar Amanda, Abilya Amelia Fadila Rahman Andini Yulianti Anggi Adrian Danis Anjelisni, Nining april leniati Arnellis Arnellis Atika Ahmad Atus Amadi Putra Azwar Ananda Chairina Wirdiastuti Cindy Febrianita Denia Putri Fajrina Dewi Febiyanti Dewi Murni Dina Fitria Dina Fitria Dina Fitria, Dina Dodi Vionanda Dony Permana Dwi Sulistiowati Edwin Musdi Elita Zusti Jamaan Elsa Oktaviani Fadhilah Fitri Fajrin Putra Hanifi fajriyanti nur, Putri Fatma Yulia Sari Faulina FAZHIRA ANISHA Fikra, Hidayatul Fitri, Fadhilah Gezi Fajri Ghaly, Fayyadh Hamida, Zilfa Hana Rahma Trifanni haniyathul husna Hasna, Hanifa Helma Helma Helma Helma Herlena Purnama Sari Huriati Khaira Ichlas Djuazva Inna Auliya Jihe Chen Juwita Juwita Khairani, Putri Rahmatun Lilis Sulistiawati Media Rosha Media Rosha Meira Parma Dewi Melly Kurniawati Miftahurrahmi, Syifa Minora Longgom Mohammad Reza febrino Mudjiran Mudjiran Muhammad Tibri Syofyan Mukhti, Tessy Octavia nabillah putri Nadha Ovella Syaqhasdy Natasya Dwi Ovalingga, natasyalinggaa Nini Erdiani Nur Fadillah, Nur Nurhizrah Gistituati Okia Dinda Kelana Oktaviani, Bernadita Permana, Dony Prida Nova Sari Puti Utari Maharani Rahma, Dzakyyah Resti Febrina Retsya Lapiza Rizki Amalia, Annisa Rizqia Salsabila Rusdinal Rusdinal Saddam Al Aziz Safitri, Melda Salma, Admi Seif Adil El-Muslih Shavira Asysyifa S Sondriva, Wilia Sujantri Wahyuni Suparman Suparman Swithania Rizka Putri Syafriandi Syafriandi Syafriandi Syafriandi Syafriandi Syahfitrri, Nindi Tamur, Maximus Tessy Octavia Mukhti Tri Wahyuni Nurmulyati Venny Oktarinda Viola Yuniza Wella Saputri Wulan Septya Zulmawati Yarman Yarman, Yarman Yenni Kurniawati Yulia Pertiwi Zamahsary Martha Zilla Zalila Zilrahmi, Zilrahmi