Claim Missing Document
Check
Articles

Found 52 Documents
Search
Journal : UNP Journal of Statistics and Data Science

Modeling Open Unemployment Rate in West Sumatera Province Using Truncated Spline Regression Aprilla Suhada; Syafriandi; Dodi Vionanda; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 1 No. 1 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (937.841 KB) | DOI: 10.24036/ujsds/vol1-iss1/3

Abstract

The Open Unemployment Rate (TPT) is an indicator used to measure the unemployment rate in the labor force which shows the percentage of the number of job seekers to the total workforce. In 2020 West Sumatra Province occupies the eighth position as the largest contributor to unemployment in Indonesia, this is a problem for the West Sumatra Provincial government. To deal with the unemployment problem, it is necessary to analyze the factors that are thought to affect the open unemployment rate in West Sumatra Province using truncated spline regression on the grounds that the data pattern between the response variables and each predictor variable does not form any pattern. Several factors are thought to influence the open unemployment rate, namely population, labor force participation rate, average length of schooling, dependency ratio. Based on the results of the analysis, the best model for modeling the open unemployment rate in West Sumatra Province is the truncated spline regression using three knot points with a GCV value of 0.061762. Variables that have a significant effect are population, labor force participation rate, average length of schooling and dependency ratio with a coefficient of determination of 99.97%.
Comparison of the Performance of the K-Means and K-Medoids Algorithms in Grouping Regencies/Cities in Sumatera Based on Poverty Indicators Mardhiatul Azmi; Atus Amadi Putra; Dodi Vionanda; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 2 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1100.498 KB) | DOI: 10.24036/ujsds/vol1-iss2/25

Abstract

K-Means is a non-hierarchical approach that separates data into a number of groups according on how far an object is from the closest centroid. K-Medoids is a non-hierarchical clustering technique that separates data into a number of groups according on how far away an object is from the closest medoid. The two approaches were put to the test using data on poverty in Sumatra in 2021, when the Covid-19 outbreak had caused the poverty rate to increase from the year before. This research is an applied research which begins by studying relevant theories. The data used in this study is secondary data sources from the BPS website regarding poverty indicators. This study aims to determine regional groups and compare the results of grouping with the k-means and k-medoids methods. To find out the best performance between the two methods, that is by looking at the lowest Davies Bouldin Index (DBI). The results of this study are the k-means algorithm produces as many as 34 districts/cities incorporated in cluster 1, 52 districts/cities in cluster 2, 23 districts/cities in cluster 3, and 45 districts/cities in cluster 4. k-medoids, namely in clusters 1, 2, 3, and 4, respectively, as many as 53, 40, 37, and 24 districts/cities. Based on the results of the grouping, the DBI k-means of 1,584 and k-medoids of 2,359 were obtained. This means that the k-means algorithm is better than the k-medoids, because the k-means DBI is smaller than the k-medoids.
Application of Random Forest for The Classification Diabetes Mellitus Disease in RSUP Dr. M. Jamil Padang FAZHIRA ANISHA; Dodi Vionanda; nonong amalita; zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 2 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1284.471 KB) | DOI: 10.24036/ujsds/vol1-iss2/30

Abstract

Diabetes Mellitus is a disease in which blood sugar levels go beyond normal (GDS>200 mg/dl). Diabetes Mellitus may be defined as an insulin function disorder in the pancreatic organ. Diabetes Mellitus is a world health problem as incidents of this disease are increasing in every part of the world, including Indonesia. Prevention and control of the disease need to be made so as not to cause complications in other organs even to death. Because of this, one needs to study a method to predict the occurance of this disease and to knows the variable that most affect a person suffered from it. This could be accomplished by using a classification methods. One of classification methods is Random Forest. In this case study using randomForest packages in RStudio software. In general, the result of this study are the smallest OOB’s error rates (%) and Variable Importance Measure (VIM) using Mean Decrease Accuracy (MDA) and Mean Decrease Gini (MDG) values.The classification by a Random Forest methods on the incidence of Diabetes Mellitus in RSUP Dr. M. Jamil Padang results in OOB’s error rate was 1,2% or accuracy rates was 98,8%. The most optimal model produced using mtry = 4 and ntree = 1000. If used MDA, the variables that most affect are Age, Polyphagia, Polyuria, HB, and BMI. While if used MDG, the variables that most affect are Age, Polyphagia, BMI, HB, and Delayed Healing.
Application of Random Forest to Identify for Poor Households in West Sumatera Province Febri Ramayanti; Dodi Vionanda; Dony Permana; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 2 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1438.133 KB) | DOI: 10.24036/ujsds/vol1-iss2/31

Abstract

Poverty is a socioeconomic problem in Indonesia. The number of people who were living in poverty in West Sumatera increases for 26.44 thousands from 2020 to 2021. The government has created programs to cope with poverty by taking into account the criteria for the poor households. These criteria have been developed by using the data obtained through The National Socioeconomic Survey (Susenas). However, instead of.showing the actual location of poor household, the existing data only interprets the number of poor household. Thus make the program less effective. This could be overcome by classification analysis of random forest (RF). RF is collection of many decision trees. Before fitting RF, one has to determine the values if three tuning parameters, mtry, ntree and node size. The result are the smallest OOB’s error rate (%) and Variable Importance Measure(VIM). The classification by RF in this research results in OOB’s error rate was 5.65% or accuracy rate was 94.35% with tuning parameter using mtry=5 and ntree=500. Based on the VIM, the poor household’s criteria include sources of drinking water such as protected or unprotected spring water and surface water, lighting tools such as non-PLN electricity or no usage of electricity, fuel for cooking such as charcoal and firewood, and the head of the household being self-employed, a family worker, or unpaid with at least a junior high degree.
The SMOTE Application of CART Methods for Coping Imbalanced Data in Classifying Status Work on Labor Force in the City of Padang Andini Yulianti; Fadhilah Fitri; Nonong Amalita; Dodi Vionanda
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/12

Abstract

Employment issues are one of the main concerns in every country, especially in developing countries including Indonesia. Employment problems faced by Indonesia are the lack of job opportunities, excess labor, and the uneven distribution of labor. This is because the growth of the labor force is higher than the growth of existing job opportunities, so that many workers do not get jobs which will cause unemployment. The city of Padang is the city that has the highest unemployment rate in West Sumatra from 2013 to 2021. The development of a smart city and identification of factors that influence unemployment is one of the efforts to reduce unemployment. This study uses the CART method to determine the factors that affect the number of the workforce in the city of Padang. The advantage of the CART method is that it is easy to interpret the results of the analysis, but the accuracy of the classification tree is low due to data imbalance. Therefore, this study uses the SMOTE method to overcome these problems. The optimal classification tree is formed from 8 terminal nodes and involves 4 explanatory variables consisting of marital status (X3), education level (X4), gender (X2) and age(X1), 5 terminal nodes which classify the labor force into the working category and 3 terminal nodes which classify the labor force into the unemployed category.
Prediksi Harga Saham PT Bank Syariah Indonesia Tbk Menggunakan Support Vector Regression Isra Miraltamirus; Fadhilah Fitri; Dodi Vionanda; Dony Permana
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/43

Abstract

A company needs funding from outside the company so that all aspects of development needed can be fulfilled. Companies that need capital can carry out public offerings and sell securities on a stock exchange company. The movement of stock prices tends to fluctuate, so that it will have an impact on the income that will be received by companies and investors. This problem is currently happening to PT BSI Tbk, so it is necessary to do stock price modeling to predict the value of PT BSI Tbk's stock price in the coming days. Support vector regression is a machine learning method that can deal with fluctuating data by producing good predictive models. SVR aims to find the optimal hyperplane to produce a good predictive model. SVR uses the kernel function to handle non-linear data by mapping data from the input space to a higher feature space, hence it will be easier to form an optimal hyperplane. The kernel function used in this study is the radial basis function. The results of this study are that the best parameters are obtained with C = 100, ϵ = 0.01, and γ = 0.001 and produce a model error accuracy of 0.87%.
Comparison of Distance Function in K-Nearest Neighbor Algorithm to Predict Prospective Customers in Term Deposit Subscriptions Muhammad Tibri Syofyan; Nonong Amalita; Dodi Vionanda; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/47

Abstract

Data mining is often used to analysis of the big data to obtain new useful information that will be used in the future. One of the best algorithms in data mining is K-Nearest Neighbor (KKN). K-NN classifier is a distance-based classification algorithm. The distance function is a core component in measuring the distance or similarity between the tested data and the training data. Various measure of distance function exist make this a topic of kind literature problems to determining the best distance function for the performance of the K-NN classifier. This study aims to compare which distance function produces the best K-NN performance. The distance function to be compared is the Manhattan distance and Minkowski distance. The application of K-NN classifier using bank dataset about predict prospective customers in Term Deposit Subscriptions. This study show that Minkowski distance on K-NN algorithm achieved the best result compared to Manhattan distance. Minkowski distance with power p = 1.5 produces an accuracy rate of 88.40% when the K value is 7. Thus, performance of K-NN algorithm using Minkowski distance (p=1,5, K=7) is best algorithm in predicting prospective costumers in Term Deposit Subscription
Rainfall Forcasting in Medan City Using Singular Spectrum Analysis (SSA) Silvia Agustina; Fadhilah Fitri; Dodi Vionanda; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/52

Abstract

Singular spectrum analysis is a time series analysis that can be used for data that has seasonal effects. Rainfall is one example that has a seasonal effect. High rainfall has an impact on natural disasters such as floods. Medan city is the capital city of North Sumatra province which has quite high rainfall and is a lowland area, so it has the potential for flooding. Rainfall forecasting can be done as disaster mitigation. The forecasting method used is SSA. The MAPE forecasting accuracy value obtained is 15.5% and the tracking signal is within tolerance limits, so that it can be concluded that the forecasting is done well.
Application of singular spectrum analysis method to forecast rice production in west sumatra: Artikel nazifatul azizah Nazifatul Azizah; Fadhilah Fitri; Dodi Vionanda; Zamahsary Martha
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/58

Abstract

The imbalance between the population and rice production will cause various negative impacts such as food crises and increasing poverty, so forecasting needs to be done to maintain food availability in the future. This study aims to determine the results of rice production in West Sumatra Province for 12 periods in 2023 using the SSA method. Based on the results of the analysis, rice production in 2023 for 12 periods tends to decrease compared to the previous year. Forecasting rice production using the SSA method with L=21 can be said to be accurate with a MAPE obtained of 17.69%.
Grouping Level of Poverty Based on District/City in Indonesia Using K-Harmonic Means nabillah putri; Nonong Amalita; Dodi Vionanda; Dony Permana
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/60

Abstract

Indonesia still has a relatively high poverty rate, although nationally it has declined in recent years. There are areas that are still experiencing increasing poverty rates. So that the currently planned poverty alleviation plans are no longer uniform, but need to pay attention to the conditions of each dimension that cause poverty in an area, so it is necessary to group districts/cities in Indonesia on poverty. Grouping was performed using K-Harmonic Means analysis. K-Harmonic Means is a non-hierarchical clustering that takes the average of the harmonic distance between each data point and the cluster’s center. The data used in this research is secondary data sourced from BPS publications on poverty and inequality in 2022. The analysis technique is carried out by standardizing the data, conducting cluster analysis, and validating clusters. Based on the results of the K-Harmonic Means analysis, the optimal number of clusters is two clusters that first cluster has 54 districts/cities while second cluster has 460 districts/cities and the Dunn Index value for cluster validation is 0,03492. So that a better grouping level of poverty based on district/city in Indonesia is obtained by using the K-Harmonic Means method with p = 2,25.
Co-Authors Admi Salma Admi Salma Afdhal, Afdhal Rezeki Afifah Salsabilah Putri Aidillah, Kerin Hagia Alandra, Cindy Resha Aldwi Riandhoko Alfathan, Muhammad Luthfi Amanda, Abilya Amannia zeze Andini Yulianti Aprilla Suhada Ardhi, Sonia Atus Amadi Putra Bahri Annur Sinaga Cindy Pratiwi, Cindy defal aditya defran Dina Fitria Dina Fitria Dina Fitria, Dina Dinda Fitriza Dony Permana Dwi Sulistiowati, Dwi Eujeniatul Jannah Fadhilah Fitri Fadhillah Fitri Fashihullisan Fayyadh Ghaly Fayza Annisa Febrianti FAZHIRA ANISHA Febri Ramayanti Findri Wara Putri Fitri, Fadhilah Fitri, Fitri Hayati Fitria Panca Ramadhani Hariati Ainun Nisa Husni, Nabila Ichlas Djuazva Ihsanul Fikri Isra Miraltamirus Jimmi Darma putra Jumiati, Susi Kamil, Fakhri Larissa, Dwika Latifah Jayatri Febiola Lifia Zullani Mardhiatul Azmi martha, Ully Martha Muhammad Ravi Azzaki Muhammad Tibri Syofyan Mukhti, Tessy Octavia nabillah putri Nanda P, Muhamad Rayhan Nazifatul Azizah Nikma Hasanah Nonong Amalita Nufhika Fishuri Nur Leli Nurul Afifah Permana, Dony Putra, Dio Afdal Putra, M. Farel Rusde Putri, Triana Rahmadina Adityana Rahmanesta, Frandito rama novialdi Rivani, Putri Rizki Akbar Robiati, Silfi Salma, Admi Seif Adil El-Muslih Shavira Asysyifa S Sherly Amora Jofipasi Silvia Agustina Siti Nurhaliza Susrifalah, Amelia Syafriandi Syafriandi Syafriandi Syahfitrri, Nindi Syifa Azahra Tessy Octavia Mukhti Wood, Raihan Attaya Yarman Yarman, Yarman Yenni Kurniawati Yunistika Ilanda Zamahsary Martha Zilrahmi, Zilrahmi Zulzila, Alivia