cover
Contact Name
Tessy Octavia Mukhti
Contact Email
tessyoctaviam@fmipa.unp.ac.id
Phone
+6282283838641
Journal Mail Official
tessyoctaviam@fmipa.unp.ac.id
Editorial Address
LPPM Universitas Negeri Padang, Jalan Prof. Dr. Hamka, Air Tawar Barat, Kota Padang, Sumatera Barat 25131
Location
Kota padang,
Sumatera barat
INDONESIA
UNP Journal of Statistics and Data Science
ISSN : -     EISSN : 2985475X     DOI : 10.24036/ujsds
UNP Journal of Statistics and Data Science is an open access journal (e-journal) launched in 2022 by Department of Statistics, Faculty of Science and Mathematics, Universitas Negeri Padang. UJSDS publishes scientific articles on various aspects related to Statistics, Data Science, and its application. Articles can be in the form of research results, case studies, or literature reviews. All papers were reviewed by peer reviewers consisting of experts and academicians across universities.
Articles 202 Documents
Categorical Data Clustering with K-Modes Method on Fire Cases in DKI Jakarta Province Widia Handa Riska; Dony Permana; Atus Amadi Putra; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 2 No. 1 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss1/115

Abstract

In DKI Jakarta Province, the number of fires increases and decreases every year. For this reason, efforts need to be made to prevent and reduce the risk of fire. BPBD DKI Jakarta is responsible for this matter. However, for these efforts to be effective, information is needed regarding fire patterns that frequently occur. Fire patterns can be seen using K-Modes categorical clustering analysis. The data used is fire data in DKI Jakarta in 2018. The optimal number of clusters was obtained as 6 clusters based on the Davies Bouldin Index value with the smallest DBI value is 6,22. Of the six clusters, cluster 3 is the cluster with the highest number of fire cases. Cluster 3 has a centroid, namely that fire cases occurred on Friday, November, in Cakung District, due to an electrical short circuit, burning down residential houses and rarely causing minor injuries, serious injuries or deaths.
Comparison of Error Prediction Methods in Claassification Modeling with CHAID Methods for Balanced Data Findri Wara Putri; Dodi Vionanda; Atus Amadi Putra; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/116

Abstract

Chi-Squared Automatic Interaction Detection (CHAID) is an exploratory method for classifying data by building classification trees. The classification result are displayed in the form of a tree diagram model. After the model is formed, it is necessary to calculate the accuracy of the model. The goal is to see the performance of the model. The accuracy of this model can be determined by calculating the level of prediction error in the model. The error rate prediction method works by dividing data into training data and testing data. There are three methods in the error rate prediction method, such as Leave one out cross validation (LOOCV), Hold out, and k-fold cross validation. These methods have different performance in dividing data into training data and test data, so that each method has advantages and disadvantages. Therefore, a comparison of the three error rate prediction methods was carried out with the aim of determining the appropriate method for the CHAID. This research is included in experimental research and uses simulation data from data generation results in RStudio. This comparison is carried out by considering several factors, namely the marginal probability matrix and different correlations. The comparison results will be observed using a boxplot by looking at the median error rate and lowest variance. This research found that k-fold cross validation is the most suitable error rate prediction method applied to the CHAID method for balanced data.
Comparison of Error Rate Prediction in CART for Imbalanced Data Lifia Zullani; Dodi Vionanda; Syafriandi Syafriandi; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/117

Abstract

CART is one of the tree based classification algorithms. CART is a tree consisting of root nodes, internal nodes, and terminal nodes. The accuracy of the model in CART can be calculated by measuring prediction errors in the model. One common method used to predict error rates is cross-validation. There are three cross-validation algorithms, namely leave one out, hold out, and k-fold cross-validation. These methods have different performance in dividing data into training data and testing data, so there are advantages and disadvantages to each method. Every algorithm has its shortcomings; hold out cannot guarantee that the training set represents the entire dataset, leave one out is very time-consuming and requires significant computation because it has to train the model as many times as there are data points, and k-fold provides longer computation time because the training algorithm must be run k times. In reality, the data often encountered is imbalanced. Imbalanced data refers to data with a different number of observations in each class. In CART, imbalanced data affects the prediction results. This research focuses on comparing error rate prediction methods in the CART model with imbalanced data. The study uses three types of data: univariate, bivariate, and multivariate, obtained from differences in population means and correlations between independent variables. The results obtained indicate that the k-fold algorithm is the most suitable error rate prediction algorithm applied to CART with imbalanced data.
Implementation Self Organizing Maps Method In Cluster Analysis Based on Achievement Suistainable Development Goal/SDG’s West Sumatera Province AL Rezki Ivansyah; Fadhilah Fitri; Yenni Kurniawati; Tessy Octavia Mukhti
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/118

Abstract

Indonesian government's commitment to implementing the Sustainable Development Goals (SDG’s) agenda, particularly in West Sumatra. The government of West Sumatra supports the objectives and targets of achieving SDG’s by optimizing the implementation of SDG indicators in the Rencana Aksi Daerah (RAD) for SDG’s of West Sumatra Province for the years 2022-2026. However, in its execution, there is a need for annual monitoring and evaluation of the RAD for SDG’s in West Sumatra Province. Clustering is employed to serve as a consideration for evaluating the implementation of RAD for SDG’s in West Sumatra Province for the years 2022-2026. The clustering method used is Self Organizing Map (SOM), an effective tool for visualizing high-dimensional data and can be used to map high-dimensional data into one, two, or three dimensions, representing connected units or neurons. The data used consist of 14 SDG indicator variables across 19 regencies/cities in West Sumatra in the year 2022, sourced from the official website and publications of the Badan Pusat Statistika (BPS) of West Sumatra Province. The analysis results in the formation of 3 clusters with different characteristics, which can be used as references in making policy decisions and effective strategies to enhance the implementation performance of SDG’s programs in West Sumatra Province.
Bitcoin Price Prediction Using Support Vector Regression Wulan Septya Zulmawati; Nonong Amalita; Syafriandi Syafriandi; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/121

Abstract

Cryptocurrency provides the most return compared to other investment instruments, causing many novice traders to be attracted to crypto as a tool to make significant profits in the short term. One of the most widely used cryptocurrencies is Bitcoin. Trading is closely related to technical analysis. Various techniques in technical analysis cause beginner traders to have difficulties choosing the right technique. Machine learning methods can be an alternative to overcoming the barriers of beginner traders in the crypto market with predictive methods. One method of machine learning for prediction is Support Vector Regression (SVR). Using the grid search algorithm shows that this method has a good predictive accuracy value of 99,25% and MAPE 0,1206%.
Implementation of Backpropagation Artificial Neural Network on Forecasting Export of Palm Oil in Indonesia Adinda Dwi Putri; Dina Fitria; Nonong Amalita; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 5 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss5/123

Abstract

Export activities are one of the largest revenues in Indonesia with the largest contributor to export is being palm oil. Increasing volume of palm oil exports, it will be able to spur economic growth in Indonesia. In this research, palm oil export forecasting in Indonesia is carried out based on the main destination countries using the Artificial Neural Network (ANN) method with the Backpropagation algorithm. The data used is palm oil export data for 2012-2022 obtained from the Central Statistics Agency (BPS) website. From the data used, the optimal architecture model is 10-1-3-3-1 with a MAPE of 9.68%, which means that this architecture uses 10 input data, 3 hidden layers with the number of each input neuron (1,3,3), and there is 1 output output. From this study, it is estimated that 90% of the results of palm oil export forecasting using the ANN method are close to the actual value.
Biplot and Procrustes Analysis of Poverty Indicators By Province in Indonesia in 2015 dan 2019 Ade Eriyen Saputri; Admi Salma; Nonong Amalita; Dony Permana
UNP Journal of Statistics and Data Science Vol. 2 No. 1 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss1/124

Abstract

Poverty is one of the country's problems that the government should  overcome. Poverty is influenced by several indicators. The success of a government can be seen from changes in poverty. This study compares the percentage of Indonesia's poverty indicators at the beginning of office (2015) and the end of office (2019) of one government period. The indicators that most affect the poverty rate in 2015 and 2019 are seen using biplot analysis while to measure the similarity and the magnitude of the percentage change in poverty from 2015 to 2019 can use procrustes analysis. The results of the biplot analysis show households that have access to decent and sustainable sanitation services as the indicator with the highest diversity in 2015 while in 2019 it is the percentage of youth  (aged 15-24 years) not in education, employment or training and households that have access to decent and sustainable drinking water services. Kepulauan Riau, DKI Jakarta, DI Yogyakarta, and Bali are the provinces that have the highest values in almost all poverty indicators except the indicator of the percentage of youth  (aged 15-24 years) not in education, employment or training. The results of the procrustean analysis show an increase of 9.7% in Indonesia's poverty indicators in 2019 compared to 2015. So it can be said that the two configurations are very similar.
Fuzzy K-Nearest Neighbor to Predict Rainfall in Padang Pariaman District Rizki Amalia, Annisa; Nonong Amalita; Yenni Kurniawati; Zamahsary Martha
UNP Journal of Statistics and Data Science Vol. 2 No. 1 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss1/126

Abstract

Information about rainfall levels at a time and in a region is very important because rainfall influences human activities. Rainfall is the amount of water that falls to the earth in a certain period of time, measured in millimeters. One piece of information related to rainfall is daily rainfall predictions. In this study, an attempt was made to classify daily rainfall at the Padang Pariaman climatology station into 5 categories, namely very light rain, light rain, moderate rain, heavy rain and very heavy rain. There are 4 weather parameters used, namely air temperature, humidity, wind speed and duration of sunlight. One of the methods used to predict rainfall is data mining, a computer learning to analyze data automatically thus obtaining a perfect new model. One of the best prediction algorithms in data mining is Fuzzy K-Nearest Neighbor (FK-NN). FK-NN uses the largest membership degree value of the test data in each class to predict the class. The number of sample classes for rainfall data in Padang Pariaman Regency has an imbalance class. To overcome the imbalance class, Synthetic Minority Over-sampling Technique (SMOTE) method is used to generate minority data as much as majority data. The results of this study by using FK-NN classification with 343 test data, parameters K = 12, and euclidean distance is quite good at the accuracy level of 76,38%..
Classification the Characteristics of Traffic Accident Victims in Pariaman Using the Chi-square Automatic Interaction Detection Algorithm Manja Danova Putri; Dina Fitria; Yenni Kurniawati; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 2 No. 1 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss1/127

Abstract

Traffic accidents are incidents that occur when motor vehicles collide on the road, resulting in damage to vehicles and road infrastructure, as well as the potential for material losses, injuries, physical damage, and even death for those involved. Data from the Indonesian National Police show that the number of traffic accident victims between 2010 and 2020 ranged from 147.798 to 197.560 people, with fatalities predominantly occurring among individuals aged 15-34. The high number of traffic accident victims has negative impacts on various aspects of life, ranging from material losses to physical damage to the victims. Classification is a technique used to group objects or data into pre-defined classes or categories based on their attributes or features. One method in the field of classification is Chi-Square Automatic Interaction Detection (CHAID). The results of the classification using this method indicate that the age of the victims and the type of accident are the most significant variables influencing the condition of traffic accident victims. The evaluation of the model using a confusion matrix yielded an accuracy rate of 92%. This indicates that the model performs well in overall data classification.
Penerapan Algoritma Naive Bayes untuk Klasifikasi Demam Berdarah Dengue di RSUD dr. Achmad Darwis Viola Yuniza; Atus Amadi Putra; Nonong Amalita; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 2 No. 1 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss1/128

Abstract

  Dengue fever is a disease transmitted by the bite of the Aedes aegypti mosquito. Central Agency of Statistic of Lima Puluh Kota District reported that the morbidity rate of this disease was 14.40% per 100,000 population, which was higher than the previous year's morbidity rate of 3.30% per 100,000 population. The main symptoms of this disease are fever lasting 2-7 days, muscle and joint pain with or without rash, dizziness, and even vomiting blood. Dengue infection can cause various clinical symptoms ranging from dengue fever, dengue hemorrhagic fever to dengue shock syndrome. Therefore, a classification method is needed to help and facilitate early diagnosis of this disease. The method used is the Naive Bayes algorithm by classifying the positive and negative patients with dengue fever. The purpose of this research is to determine the classification of patients with dengue fever disease and the accuracy of using the Naive Bayes algorithm. The results of the analysis stated that the Naïve Bayes model successfully classified patients into 12  Dengue fever positive patients and 22  Dengue fever negative patients based on 34 testing data. The accuracy of the model is 91,18%, which shows that the model is very good  in classifying Dengue fever patients.

Page 7 of 21 | Total Record : 202