cover
Contact Name
Tessy Octavia Mukhti
Contact Email
tessyoctaviam@fmipa.unp.ac.id
Phone
+6282283838641
Journal Mail Official
tessyoctaviam@fmipa.unp.ac.id
Editorial Address
LPPM Universitas Negeri Padang, Jalan Prof. Dr. Hamka, Air Tawar Barat, Kota Padang, Sumatera Barat 25131
Location
Kota padang,
Sumatera barat
INDONESIA
UNP Journal of Statistics and Data Science
ISSN : -     EISSN : 2985475X     DOI : 10.24036/ujsds
UNP Journal of Statistics and Data Science is an open access journal (e-journal) launched in 2022 by Department of Statistics, Faculty of Science and Mathematics, Universitas Negeri Padang. UJSDS publishes scientific articles on various aspects related to Statistics, Data Science, and its application. Articles can be in the form of research results, case studies, or literature reviews. All papers were reviewed by peer reviewers consisting of experts and academicians across universities.
Articles 202 Documents
Comparison of K-Means and Fuzzy C-Means Algorithms for Clustering Based on Happiness Index Components Across Provinces in Indonesia Inna Auliya; Fitri, Fadhilah; Nonong Amalita; Tessy Octavia Mukhti
UNP Journal of Statistics and Data Science Vol. 2 No. 1 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss1/150

Abstract

Cluster analysis is a statistical technique used to group objects based on their shared characteristics. This research aims to assess how 34 provinces in Indonesia are clustered using happiness index indicators for the year 2021. The study compares two non-hierarchical cluster analysis methods, K-Means and Fuzzy C-Means. K-Means categorizes objects into clusters based on their proximity to the nearest cluster center, while Fuzzy C-Means employs a fuzzy grouping model assigning membership degrees from 0 to 1. The results indicate that both methods form three clusters. Evaluating standard deviation values and ratios, Fuzzy C-Means proves superior, displaying a larger standard deviation between groups and a smaller ratio (0.6680004) compared to K-Means. Consequently, the study concludes that the Fuzzy C-Means method is more optimal than K-Means.
Karakteristik Kondisi Air Minum Menurut Wilayah Perkotaan dan Perdesaan di Indonesia Menggunakan Metode CHAID Aulia Wanda; Kurniawati, Yenni
UNP Journal of Statistics and Data Science Vol. 2 No. 1 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss1/152

Abstract

Drinking water is the basic needs of society’s basic instead of food, clothing and shelter. The availability and quality of drinking water needs to be considered, both in terms of quantity and suitability which must meet the requirements. Having clean water as drinking water can reduce diseases such as diarrhea, cholera, dysentery, typhus, worms, skin diseases and poisoning. Decent and clean drinking water is protected drinking water, including tap water, public taps, public hydrants, water terminals, rainwater reservoirs, or protected springs and wells, drilled wells/pumps with the closest distance being 10 meters from the location of waste disposal, waste storage and rubbish disposal. Access to drinking water in urban areas is different compared to that in rural areas. To determine the characteristics of drinking water in urban and rural areas, Chi-Square Automatic Interaction Detection (CHAID) analysis is used. This analysis is used on categorical type variables. Before the analysis stage, there is a data mining process to obtain knowledge from the data cluster and handle missing data in the data cluster. Handling of missing data in categorical variables is done by imputation mode. Using CHAID analysis, drinking water characteristics for rural areas with the highest percentage were filtered using cloth and not boiled and the water source was elsewhere. Meanwhile, in urban areas, the highest percentage of households with drinking water characteristics are treated with bleach/chlorine, not filtered using cloth, and not boiled with a water source in their own yard.
Artificial Neural Network Model for Estimating the Poor Population in Indonesia as an Effort to Alleviate Poverty Febiola Putri, Febi; Atus Amadi Putra; Yenni Kurniawati; Zamahsary Martha
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/154

Abstract

Forecasting the poverty rate in Indonesia is one of the activities that is considered to be able to help various parties, such as being able to help the government in planning more effective and efficient poverty alleviation programs. In this study, forecasting the poverty rate in Indonesia was carried out using the backpropagation artificial neural network method. The purpose of this research is to model and predict the poverty rate using the backpropagation artificial neural network model, and to determine the accuracy of the forecasting results produced by this method. This research is an applied researc. The data used is annual data on proverty in Indonesia from 2917-2021. The data is then divided into two parts, namely training data and test data. The results show that the best artificial network model is BP (7,7,2) with 7 neurons in the input layer, 7 neurons in the hidden layer, and 2 neurons in the output layer. The accuracy of this model is good with a MAPE value of 0.07633%. The forecasting results in the next period show that the highest number of poor people is East Java province with a value of 3604.1698 thousand people in the first semester (March) of 2022 and has increased in the second semester period (September) of 2022 with a value of 3698.822 thousand people
Klasifikasi Karies Gigi di Rumah Sakit Gigi dan Mulut Baiturrahmah Menggunakan Metode Random Forest Martia Rosada; Zilrahmi; Syafriandi Syafriandi; Tessy Octavia Mukhti
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/155

Abstract

The mouth cavity is the main gate through which germs and bacteria enter. Therefore, it is important to maintain oral hygiene. When dental and oral hygiene is not maintained it will cause dental and oral problems or diseases such as periodontitis, dental caries, tooth abscess, gingivitis and other dental and oral health problems. The dental and oral problems that many people experience are caries or cavities. West Sumatra itself has a fairly high prevalence of dental caries. Prevention of dental caries needs to be done by making the public aware of dental and oral hygiene in order to reduce the problem of dental caries in West Sumatra. Therefore, it is necessary to have a method that is able to classify dental caries based on its symptoms. The classification method is very useful for knowing the main factors that cause dental caries. One classification method that can be used is random forest. Random forest is an ensemble method, namely the development of several methods using bootstrap sampling. The results of this research use the smallest OOB level and the Variable Importance Measure (VIM). Random forest classification using dental and oral pain medical record data at Baiturrahmah Padang Hospital produces an OOB error rate of 32.08% or an accuracy rate of 67.92%. The optimal model is obtained using mtry=2 and ntree=200. From this research it can be concluded that dental plaque, age, and tooth brushing habits are the importance variables or main factors that influence dental caries.
Comparison of Modeling Infant Mortality Rate in West Sumatra and West Java Province in 2021 Using Negative Binomial Regression Afdhal, Afdhal Rezeki; Fadhilah Fitri; Dodi Vionanda; Dony Permana
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/156

Abstract

In Poisson regression analysis, there is an assumption that must be met, namely equidispersion (the variance value of the response variable is the same as the mean). In reality, conditions like this very rarely occur because overdispersion usually occurs (the variance value of the response variable is greater than the mean). One way to overcome this problem is to use the Negative Binomial regression method. The aim of this article is to obtain the best modeling results in Negative Binomial regression analysis to overcome overdispersion in cases of infant mortality in West Sumatra Province and West Java Province. The model obtained using Negative Binomial regression produces an AIC value in West Sumatra province of 192.65 which is smaller than the AIC value in West Java Province it was 283.47. Based on the Negative Binomial regression model equation obtained in West Sumatra Province, it can be explained that the number of health centers (X3) has a significant influence on the infant mortality rate and in West Java Province it can be explained that the number of medical personnel (X1) has a significant influence on the infant mortality rate.
Classification of Poor Households in West Sumatra Province using Decision Tree Algorithm C4.5 Dinda Fitriza; Atus Amadi Putra; Dodi Vionanda; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/157

Abstract

The significant and increasingly complex issue of poverty poses a considerable challenge to Indonesia's development, including West Sumatra Province, with a poverty rate was 5.92% in 2022. The government has initiated programs to address poverty by focusing on the criteria of impoverished households. Data on impoverished households can be obtained through the National Socio-Economic Survey (Susenas). One method that can classify impoverished households is the decision tree. Decision tree is a flowchart that resembles a tree. The C4.5 algorithm used in this research has the ability handle discrete and continuous data, manage variables with missing values, and prune decision tree branches. The result of the analysis shows that the variables affecting the classification of poor households are the number of household members, then the age of the household head, type of house floor, type of house wall, source of drinking water, and cooking fuel. The accuracy of the test data using a confusion matrix is 69.89%, sensitivity of 71.15% for classifying regular households, and specificity of 68.72% for classifying impoverished households.
Sentiment Analysis Using Support Vector Machine (SVM) of ChatGPT Application Users in Play Store Muthia Sakhdiah; Admi Salma; Dony Permana; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/158

Abstract

The ChatGPT application is an Articial Intelligence (AI) technology that responds to conversations in form text and voice messages, and is accessible via smartphones or computers. The ChatGPT provides answers and solutions related to the problems asked, the speed and complexity of the answers are also added values of this application. However, there are negative impacts, one of which is the vulnerability of scientific papers to plagiarism. Because of this, there are many reviews from the community that assess this application. These reviews can be seen on the Play Store which can be a reference before downloading the ChatGPT application. How the community responds can be seen through sentiment analysis, which will classify positive and negative assessments. Making it easier for companies to evaluate products. Then classification is carried out using Support Vector Machine (SVM), the classification model obtained is used to classify user reviews of the ChatGPT application. The results showed an accuracy of 93.9% with a linear kernel, and the sentiment of people who use the ChatGPT application is more positive.
Penanganan Ketidakseimbangan Multikelas pada Dataset Survei Kerangka Sampel Area menggunakan Metode SCUT Sondriva, Wilia; Kurniawati, Yenni; Amalita, Nonong; Salma, Admi
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/163

Abstract

Area Sampling Frame (ASF) is a survey used by the Indonesian government to measure rice productivity in Indonesia. ASF survey is important data because accurate and high-quality rice productivity data is highly needed. There is extreme imbalance in the ASF survey data, thus requiring handling of this imbalance. SMOTE and Cluster-based Undersampling Technique (SCUT) is a method that can be used to address the dataset imbalance. SCUT combines oversampling using SMOTE and undersampling using CUT. The results from SCUT show that the number of data points in each class becomes balanced. Subsequently, a two-sample mean test is conducted to observe the mean differences between the original dataset and the dataset after handling. The results show that in the early vegetative, late vegetative, and harvest phases, the means are significantly similar between the original dataset and the dataset after handling, but in the generative phase, the means are not significantly similar. Therefore, synthetically generated data using the SCUT method generally exhibit similar mean characteristics.
Impelementation of Subtractive Fuzzy C-Means Method in Clustering Provinces in Indonesia Based on Factors Causing Stunting in Toddlers Hariati Ainun Nisa; Admi Salma; Dodi Vionanda; Tessy Octavia Mukhti
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/164

Abstract

Indonesia in 2022 has a stunting rate that is still relatively high at 21.6%. For this reason, it is necessary to make various efforts to reduce the stunting rate. One of the efforts that can be made is to understand the characteristics of each province in Indonesia with cluster analysis. This study aims to cluster provinces in Indonesia based on factors that cause stunting in children under five. The method used is Subtractive Fuzzy C-Means which has advantages in terms of speed, iteration, thus producing more stable and accurate results. The results of the validity test with Silhouette Coefficient Index, the optimum number of clusters is 8 clusters with a radius (r) of 0.70. There are 8 provinces that have provided maximum handling and efforts in reducing stunting rates, namely the provinces of Bangka Belitung Islands, Riau Islands, DKI Jakarta, DI Yogyakarta, Bali, East Kalimantan, South Kalimantan, and South Sulawesi. Meanwhile, 7 provinces namely East Nusa Tenggara, South Kalimantan, Central Sulawesi, West Sulawesi, Maluku, North Maluku, and West Papua, still need special attention from the government in reducing stunting rates based on the factors that cause stunting discussed in this study.
K-Modes Analysis with Validation of the DBI in Grouping Provinces in Indonesia based on Indicators of Poor Households Syifa Azahra; Zilrahmi; Dodi Vionanda; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/165

Abstract

Poverty is the most pressing social problem in Indonesia. Efforts to alleviate poverty are to group provinces in Indonesia based on indicators of poor households using the K-modes algorithm. The data used is data from the 2017 Indonesian Demographic and Health Survey (IDHS) on the Household List. The analysis includes data noise detection, data clustering using K-Modes algorithm, and cluster validation with Davies Bouildin Index (DBI). Based on the clustering that has been done, two clusters are obtained, where cluster 1 consists of 26 provinces and cluster 2 consists of 8 provinces. cluster 1 is a cluster that fulfills 9 indicators of poor households and cluster 2 only a few indicators of poor households. So that the government can prioritize these 8 provinces to overcome poverty in Indonesia. For the DBI value obtained is 1.89 which means that 2 clusters are already well used in the algorithm.

Page 9 of 21 | Total Record : 202