cover
Contact Name
Tessy Octavia Mukhti
Contact Email
tessyoctaviam@fmipa.unp.ac.id
Phone
+6282283838641
Journal Mail Official
tessyoctaviam@fmipa.unp.ac.id
Editorial Address
LPPM Universitas Negeri Padang, Jalan Prof. Dr. Hamka, Air Tawar Barat, Kota Padang, Sumatera Barat 25131
Location
Kota padang,
Sumatera barat
INDONESIA
UNP Journal of Statistics and Data Science
ISSN : -     EISSN : 2985475X     DOI : 10.24036/ujsds
UNP Journal of Statistics and Data Science is an open access journal (e-journal) launched in 2022 by Department of Statistics, Faculty of Science and Mathematics, Universitas Negeri Padang. UJSDS publishes scientific articles on various aspects related to Statistics, Data Science, and its application. Articles can be in the form of research results, case studies, or literature reviews. All papers were reviewed by peer reviewers consisting of experts and academicians across universities.
Articles 202 Documents
Comparison of Naïve Bayes and K-Nearest Neighbor for DKI Jakarta Air Pollution Standard Index Classification Nurdalia; Zilrahmi; Dony Permana; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 2 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (817.962 KB) | DOI: 10.24036/ujsds/vol1-iss2/29

Abstract

Data mining is the process of extracting and searching for useful knowledge and information using certain algorithms or methods according to knowledge or information. The data mining classification methods used in this study are Naïve Bayes and K-Nearest Neighbor. By using the Naïve Bayes and K-Nearest Neighbor methods, it is possible to classify the DKI Jakarta air pollution standard index in 2021 based on six air pollutants, namely dust particles (PM10), dust particles (PM2.5), sulfur dioxide (SO2), carbon monoxide. (CO), ozone (O3) and nitrogen dioxide (NO2). The test was carried out to determine the accuracy in predicting the DKI Jakarta air pollution standard index in 2021 using the confusion matrix evaluation value. So that the best performance of the two methods is found in the Naïve Bayes algorithm with high Naïve Bayes sensitivity values ​​for all categories even though there are data in minority or unbalanced categories, and the frequency of data from each category or in this case the data is not balanced, the Naïve Bayes algorithm shows good performance in accuracy, sensitivity, specificity.
Application of Random Forest for The Classification Diabetes Mellitus Disease in RSUP Dr. M. Jamil Padang FAZHIRA ANISHA; Dodi Vionanda; nonong amalita; zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 2 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1284.471 KB) | DOI: 10.24036/ujsds/vol1-iss2/30

Abstract

Diabetes Mellitus is a disease in which blood sugar levels go beyond normal (GDS>200 mg/dl). Diabetes Mellitus may be defined as an insulin function disorder in the pancreatic organ. Diabetes Mellitus is a world health problem as incidents of this disease are increasing in every part of the world, including Indonesia. Prevention and control of the disease need to be made so as not to cause complications in other organs even to death. Because of this, one needs to study a method to predict the occurance of this disease and to knows the variable that most affect a person suffered from it. This could be accomplished by using a classification methods. One of classification methods is Random Forest. In this case study using randomForest packages in RStudio software. In general, the result of this study are the smallest OOB’s error rates (%) and Variable Importance Measure (VIM) using Mean Decrease Accuracy (MDA) and Mean Decrease Gini (MDG) values.The classification by a Random Forest methods on the incidence of Diabetes Mellitus in RSUP Dr. M. Jamil Padang results in OOB’s error rate was 1,2% or accuracy rates was 98,8%. The most optimal model produced using mtry = 4 and ntree = 1000. If used MDA, the variables that most affect are Age, Polyphagia, Polyuria, HB, and BMI. While if used MDG, the variables that most affect are Age, Polyphagia, BMI, HB, and Delayed Healing.
Application of Random Forest to Identify for Poor Households in West Sumatera Province Febri Ramayanti; Dodi Vionanda; Dony Permana; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 2 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1438.133 KB) | DOI: 10.24036/ujsds/vol1-iss2/31

Abstract

Poverty is a socioeconomic problem in Indonesia. The number of people who were living in poverty in West Sumatera increases for 26.44 thousands from 2020 to 2021. The government has created programs to cope with poverty by taking into account the criteria for the poor households. These criteria have been developed by using the data obtained through The National Socioeconomic Survey (Susenas). However, instead of.showing the actual location of poor household, the existing data only interprets the number of poor household. Thus make the program less effective. This could be overcome by classification analysis of random forest (RF). RF is collection of many decision trees. Before fitting RF, one has to determine the values if three tuning parameters, mtry, ntree and node size. The result are the smallest OOB’s error rate (%) and Variable Importance Measure(VIM). The classification by RF in this research results in OOB’s error rate was 5.65% or accuracy rate was 94.35% with tuning parameter using mtry=5 and ntree=500. Based on the VIM, the poor household’s criteria include sources of drinking water such as protected or unprotected spring water and surface water, lighting tools such as non-PLN electricity or no usage of electricity, fuel for cooking such as charcoal and firewood, and the head of the household being self-employed, a family worker, or unpaid with at least a junior high degree.
Nonparametric Regression Modeling with Fourier Series Approach on Poverty Cases in West Sumatra Province Melin Wanike Ketrin; Fadhilah Fitri; Atus Amadi putra; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 2 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1221.941 KB) | DOI: 10.24036/ujsds/vol1-iss2/32

Abstract

Poverty is a complex problem that has an impact on various social problems such as education, unemployment, health and economic growth. Therefore, the problem of poverty is important to overcome in order to create population welfare. One of the analyses that can be used to model the percentage of poverty is regression analysis. Regression analysis is divided into two approaches, namely parametric and nonparametric. Parametric regression has several assumptions while, the only assumption nonparametric regression shape of the curve does not form a certain pattern. There are several approaches to nonparametric regression, one of which is the Fourier Series. The purpose of this study is to model the percentage of poverty in West Sumatra Province. The unclear shape of the curve in the data used is a consideration for using nonparametric regression. Then it is known that the data used in this study is data per region which tends to have a fluctuating nature. So it is suitable to use the Fourier series approach. In this research, nonparametric regression modeling with one, two, and three oscillation parameters was attempted. The best model was obtained which consisted of two oscillation parameters with a Generalized Cross Validation (GCV) value of 2.110 and R² of 92.44%.
The SMOTE Application of CART Methods for Coping Imbalanced Data in Classifying Status Work on Labor Force in the City of Padang Andini Yulianti; Fadhilah Fitri; Nonong Amalita; Dodi Vionanda
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/12

Abstract

Employment issues are one of the main concerns in every country, especially in developing countries including Indonesia. Employment problems faced by Indonesia are the lack of job opportunities, excess labor, and the uneven distribution of labor. This is because the growth of the labor force is higher than the growth of existing job opportunities, so that many workers do not get jobs which will cause unemployment. The city of Padang is the city that has the highest unemployment rate in West Sumatra from 2013 to 2021. The development of a smart city and identification of factors that influence unemployment is one of the efforts to reduce unemployment. This study uses the CART method to determine the factors that affect the number of the workforce in the city of Padang. The advantage of the CART method is that it is easy to interpret the results of the analysis, but the accuracy of the classification tree is low due to data imbalance. Therefore, this study uses the SMOTE method to overcome these problems. The optimal classification tree is formed from 8 terminal nodes and involves 4 explanatory variables consisting of marital status (X3), education level (X4), gender (X2) and age(X1), 5 terminal nodes which classify the labor force into the working category and 3 terminal nodes which classify the labor force into the unemployed category.
Self Organizing Maps Method for Grouping Provinces in Indonesia Based on the Landslide Impact Suwanda Risky; Syafriandi; Dony Permana; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/15

Abstract

Indonesia is a disaster-prone country due to its climatic, soil, hydrological, geological, and geomorphological conditions. A disaster is an event or chain of events that threatens and disrupts people's lives and livelihoods. A natural disaster is a disaster caused by an event or series of events caused by nature such as a landslide. The number of landslide disaster events in Indonesia varies from province to province, this is due to differences in the characteristics of each province in Indonesia. So that the impact caused by the landslide disaster is also different. Therefore, it is necessary to group and profile so that it can be known which province has the largest impact on landslide disasters. This study used the Self Organizing Maps method in a grouping. The number of clusters to be formed is 3 based on the optimal value of internal cluster validation (Dunn, Connectivity, and Silhouette Index). Cluster 1 consists of 31 provinces, and the average impact of landslides is small. In cluster 2 consisting of 2 provinces, there are 4 dominantly more significant impacts. Cluster 3 consisting of 1 province has 1 dominant impact greater. So it can be concluded that most provinces in Indonesia have a relatively small impact on landslide disasters. However, some provinces have a very large impact on landslides, namely the provinces of West Java, Central Java, and East Java.
Comparison of Haversine and Euclidean Distance Formula for Calculating Distance Between Regencies in West Sumatra Vinka Haura Nabilla; Indonesia; Dony Permana; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/39

Abstract

A distance is a number that indicates how far apart two place are. The benefits of using distance are widely used in research, one of which is in the application of spatial weighting matrices. The spatial weight matrix is obtained based on proximity information between regions. There are two types of spatial weights, namely, based on contiguity and distance. Determining the proximity of regions in West Sumatra is better to use spatial weighting based on distance because in West Sumatra there are islands and mountains that limit the regions. Some distance estimation equations that can be utilized are Haversine and Euclidean distance. The connection between the two points in Haversine takes into account the earth's curvature when calculating the distance, which is a difference between the two formulas. In contrast, the Euclidean distance method uses a straight line to connect two points. The purpose of this research is to ascertain whether the Haversine and Euclidean distance formulas produce significantly different results in terms of distance. Calculation of the coordinate point distance utilizes latitude and longitude obtained from Google Maps. The distances measured using both formulas were expressed as kilometers (km), then the data was processed using the z test. The findings demonstrated that the Haversine formula and the Euclidean distance formula did not significantly differ in the process of calculating distance.
Vector Error Correction Model for Cointegration Analysis of Factors Affecting Indonesia's Economic Growth during the Pandemic Period Rizqa Fajriaty Fitri MY; Dina Fitria; Syafriandi Syafriandi; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/40

Abstract

Stabel economic growth is the ultimate goal of monetary policy s seen from the stability of the rupiah. The economic situation has decreased due to the spread of Covid-19. In an effort to stabilize the economy, the relationship between factors supporting Indonesia's economic growth is analyzed using the VECM approach. This approach is able to determine the long-term and short-term relationships of time series data. The model results after fulfilling several tests are three significant equations. The model explains that there is an effect in the short term of the inflation and BI Rate variables on inflation as well as the inverse effect between BI-rate one period earlier on the exchange rate. The cointegration coefficient is negative, it indicates that there is a short-term to long-term adjustment mechanism that occurs in the inflation variable. The two cointegration equations for the long term show that for the long term, inflation can be positively influenced by the visa variable. Variable BI-rate in the long run is influenced by the variable exchange rate and visa. The VECM model can explain more than 50% of the variables.
Sentiment Analysis og Goride Services on Twitter Social Media Using Naive Bayes Algorithm Puti Utari Maharani; Nonong Amalita; Atus Amadi Putra; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/41

Abstract

Online motorcycle taxi is an application-based transportation technology innovation. Online motorcycles offer relatively low prices and offer discount features. However, the existence of online motorcycles creates congestion problems and conflicts between conventional transports. One such online motorcycle taxi service is GoRide. This GoRide feature is derived from the Gojek application. The emergence of GoRide raises public opinion and wants to judge an object openly through social media, one of which is Twitter. The assessment given by society is an analytical textual opinion. Sentiment analysis is used to detect opinions in the form of a person's judgment, evaluation, attitude, and emotion. The textual classification algorithm used in this study was Naive Bayes. This research aims to find out the public sentiment towards GoRide's service as an online motorcycle taxi in positive and negative categories and to find out the accuracy results of the Naive Bayes algorithm against GoRide's service. Research data was obtained using the API provided by Twitter developers. Analysis techniques are performed by text preprodeing, data labelling, word weighting, classification, then performance evaluation of classification. The results of the positive category sentiment classification are 698 data, while the negative category sentiment is 517 data. The Naive Bayes algorithm's performance evaluation results obtained an accuracy rate of 77.78%. So as a whole, GoRide can be categorized as a good service.  
Prediksi Harga Saham PT Bank Syariah Indonesia Tbk Menggunakan Support Vector Regression Isra Miraltamirus; Fadhilah Fitri; Dodi Vionanda; Dony Permana
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/43

Abstract

A company needs funding from outside the company so that all aspects of development needed can be fulfilled. Companies that need capital can carry out public offerings and sell securities on a stock exchange company. The movement of stock prices tends to fluctuate, so that it will have an impact on the income that will be received by companies and investors. This problem is currently happening to PT BSI Tbk, so it is necessary to do stock price modeling to predict the value of PT BSI Tbk's stock price in the coming days. Support vector regression is a machine learning method that can deal with fluctuating data by producing good predictive models. SVR aims to find the optimal hyperplane to produce a good predictive model. SVR uses the kernel function to handle non-linear data by mapping data from the input space to a higher feature space, hence it will be easier to form an optimal hyperplane. The kernel function used in this study is the radial basis function. The results of this study are that the best parameters are obtained with C = 100, ϵ = 0.01, and γ = 0.001 and produce a model error accuracy of 0.87%.

Page 2 of 21 | Total Record : 202