Claim Missing Document
Check
Articles

Sentiment Analysis og Goride Services on Twitter Social Media Using Naive Bayes Algorithm Puti Utari Maharani; Nonong Amalita; Atus Amadi Putra; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/41

Abstract

Online motorcycle taxi is an application-based transportation technology innovation. Online motorcycles offer relatively low prices and offer discount features. However, the existence of online motorcycles creates congestion problems and conflicts between conventional transports. One such online motorcycle taxi service is GoRide. This GoRide feature is derived from the Gojek application. The emergence of GoRide raises public opinion and wants to judge an object openly through social media, one of which is Twitter. The assessment given by society is an analytical textual opinion. Sentiment analysis is used to detect opinions in the form of a person's judgment, evaluation, attitude, and emotion. The textual classification algorithm used in this study was Naive Bayes. This research aims to find out the public sentiment towards GoRide's service as an online motorcycle taxi in positive and negative categories and to find out the accuracy results of the Naive Bayes algorithm against GoRide's service. Research data was obtained using the API provided by Twitter developers. Analysis techniques are performed by text preprodeing, data labelling, word weighting, classification, then performance evaluation of classification. The results of the positive category sentiment classification are 698 data, while the negative category sentiment is 517 data. The Naive Bayes algorithm's performance evaluation results obtained an accuracy rate of 77.78%. So as a whole, GoRide can be categorized as a good service.  
Comparison of Distance Function in K-Nearest Neighbor Algorithm to Predict Prospective Customers in Term Deposit Subscriptions Muhammad Tibri Syofyan; Nonong Amalita; Dodi Vionanda; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/47

Abstract

Data mining is often used to analysis of the big data to obtain new useful information that will be used in the future. One of the best algorithms in data mining is K-Nearest Neighbor (KKN). K-NN classifier is a distance-based classification algorithm. The distance function is a core component in measuring the distance or similarity between the tested data and the training data. Various measure of distance function exist make this a topic of kind literature problems to determining the best distance function for the performance of the K-NN classifier. This study aims to compare which distance function produces the best K-NN performance. The distance function to be compared is the Manhattan distance and Minkowski distance. The application of K-NN classifier using bank dataset about predict prospective customers in Term Deposit Subscriptions. This study show that Minkowski distance on K-NN algorithm achieved the best result compared to Manhattan distance. Minkowski distance with power p = 1.5 produces an accuracy rate of 88.40% when the K value is 7. Thus, performance of K-NN algorithm using Minkowski distance (p=1,5, K=7) is best algorithm in predicting prospective costumers in Term Deposit Subscription
Modeling Human Development Index in Papua and West Sumatera with Multivariate Adaptive Regression Spline Yulia Pertiwi; Dony Permana; Nonong Amalita; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/54

Abstract

The Human Development Index (HDI), is an indicator of the successful development of the quality of human life. The high value of HDI, shows the better development of a region. The purpose of this study is to model and determine the factors affect HDI in Papua Province and West Sumatera Province, using Multivariate Adaptive Regression Spline (MARS). MARS is one of the modeling methods that can handle high-dimensional data. The result of this study showed that the best MARS model for Papua Province is a combination of (BF=24, MI=2, and MO=0) with a minimum GCV value of 0.55953. while the best MARS model for West Sumatera Province is a combination of (BF=24, MI=2, and MO=0) with a minimum GCV value of 0.02697. Based on the model, the factors that significantly affect HDI in Papua Province and West Sumatera Province are average years of schooling (X2), adjusted per-capita income (X6), life expectancy (X1), percentage of poor people (X4), and gross regional domestic product (X3). The percentage level of importance of each variable for Papua Province is 100%, 45.26%, 29.24%, 6.55%, and 6.27%. Meanwhile, for West Sumatera Province it is 100%, 96.73%, 57.54%, 34.13%, and 29.6%, respectively. So in this case, based on the results of the study, the average years of schooling (X2) is the variable that most influences HDI in the two regions, with an importance level of 100%.  
Analysis of Factors Influencing the Population Growth Rate in West Sumatra Using Geographically Weighted Logistic Regression Rizqia Salsabila; Atus Amadi Putra; Nonong Amalita; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/59

Abstract

The model of Geographically Weighted Logistic Regression (GWLR) was the development of a model of logistic regression that was implemented to data in spatial. GWLR model parameter estimation was carried out at each location for observation using spatial weighting. The research purposes was to reveal the GWLR model on the dichotomous data of the Population Growth Rate (PGR) indicator in each Districts/Cities in West Sumatra in 2020 and learn more factors that influence the probability that the population growth rate will increase in 19 Districts/Cities in West Sumatra in 2020. The parameters estimation of the GWLR model uses the Maximum Likelihood Estimation (MLE) method. Spatial weighting for parameter estimation is determined using the Fixed Gaussian Kernel weighting function and determining the optimal bandwidth using Akaike's Information Citerion (AIC) criteria. The variable of response that is categorical in this study is the rate of population growth in each districts/cities in West Sumatra in 2020 and the predictor variables are the couples number of childbearing age, the live births number, the in-migration number, and the out-migration number. The reseacrh result obtained from research were that the GWLR model is better than the logistic regression model and 4 groups of Districts/Cities are formed based on factors that affect the increase in population growth rate.
Grouping Level of Poverty Based on District/City in Indonesia Using K-Harmonic Means nabillah putri; Nonong Amalita; Dodi Vionanda; Dony Permana
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/60

Abstract

Indonesia still has a relatively high poverty rate, although nationally it has declined in recent years. There are areas that are still experiencing increasing poverty rates. So that the currently planned poverty alleviation plans are no longer uniform, but need to pay attention to the conditions of each dimension that cause poverty in an area, so it is necessary to group districts/cities in Indonesia on poverty. Grouping was performed using K-Harmonic Means analysis. K-Harmonic Means is a non-hierarchical clustering that takes the average of the harmonic distance between each data point and the cluster’s center. The data used in this research is secondary data sourced from BPS publications on poverty and inequality in 2022. The analysis technique is carried out by standardizing the data, conducting cluster analysis, and validating clusters. Based on the results of the K-Harmonic Means analysis, the optimal number of clusters is two clusters that first cluster has 54 districts/cities while second cluster has 460 districts/cities and the Dunn Index value for cluster validation is 0,03492. So that a better grouping level of poverty based on district/city in Indonesia is obtained by using the K-Harmonic Means method with p = 2,25.
Geographically Weighted Panel Regression Modeling on Human Development Index in West Sumatra Amelia Fadila Rahman; Syafriandi Syafriandi; Nonong Amalita; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/63

Abstract

  The Human Development Index (HDI) is an important issue that has a negative impact on the field of human development and people's welfare in West Sumatra Province. The HDI is being attempted to be solved by identifying the contributing components. Geographically Weighted Panel Regression (GWPR) is a technique that can be used to find influencing factors and explain the influence of characteristic areas of observation. GWPR is a combination of panel data regression method with GWR which is used when the data has the influence of spatial heterogeneity. The purpose of this study is to form a GWPR model that will be applied to the HDI in Regencies/Cities in West Sumatera from 2019 to 2022. Modeling using GWPR Fixed Effect Model. With a minimum CV of 0,000208, the wighter function utilized is a fixed exponential kernel. The findings demonstrated that the model obtained had an of 99.9%, meaning the predictor variable could account for the model by this percentage. Variables that have a significant on HDI are Life Expectancy, Expected Years of Schooling, Mean Years of Schooling, and Purchasing Power Parity.
Comparison of Queen Contiguity and Customized Weighting Matrices on Spatial Regression to Identify Factors Impacting Poverty in East Java Gezi Fajri; Syafriandi Syafriandi; Nonong Amalita; Zamahsary Martha
UNP Journal of Statistics and Data Science Vol. 1 No. 3 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss3/67

Abstract

Poverty is crucial problem that negative impact on all sectors, including economic, social, and cultural development in East Java Province. Poverty can also increase unemployment, crime, trigger social disasters and hinder progress East Java province. One efforts overcome problem of poverty in East Java province is detect factors that influence. This effort can be done through statistical modeling to determine factors that influence poverty in East Java province. statistical model that can identify factors that influence poverty and explain relationship between region and surrounding area is spatial regression analysis. In spatial regression analysis, spatial weighting matrix is needed to determine spatial influences between regions where one region influences neighboring regions. spatial weighting matrices that is often used is queen contiguity, and according to Anselin (1988:20), this spatial weighting also considers initial information, purpose of case studied, and theory underlying the research. This weighting uses social and economic variables case under study, namely customized weighting matrix. Based on results of this study, shows that best spatial regression and spatial weighting models are General Spatial Model (GSM) with customized weighting because customized weighting produces better estimation results than SAR, SEM and GSM models with queen contiguity weighting in district and city poverty modeling in East Java province with Akaike Infomation Criterion (AIC) value of 188.77 and detemination coefficient (R2) of 84.95%. School's Expected Time, Life Expectancy Score, and Employment Participation Rate are factors that will have substantial impact on percentage of population living in poverty East Java's districts and cities in 2021.
Comparison of the Chen and Sinsgh’s Fuzzy Time Series Methods in Forecasting Farmer Exchange Rates in Indonesia Okia Dinda Kelana; Atus Amadi Putra; Nonong Amalita; Admi Salma
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/36

Abstract

Chen and Singh's Fuzzy Time Series Model is a forecasting method that uses the basi fuzzy logic in the process. The differences in the models are in the fuzzy logic relations. Chen's model uses Fuzzy Logical Relationship Groups. Meanwhile, the Singh model uses only Fuzzy Logical Relationships in the forecasting process. To find out the best model between the two models, forecasting the Farmer's Exchange Rate is carried out. Farmers' exchange rates are the option for observers of agricultural development in assessing the level of welfare of farmers in Indonesia. With changes in farmer exchange rates every month, it is necessary to forecast data in order to obtain an overview for the following month. Research used is applied research where the initial step is to study and analyze the theories related to our research, then colect the necessary data. The data used is data secondary data obtained online from the official website of the Badan Pusat Statistika (BPS). the forecasting results of the two models were compared using MAPE. The results of the comparison of the accuracy of the prediction accuracy of Chen and Singh's fuzzy time series models on farmers' exchange rates obtained Chen's MAPE fuzzy time series values ​​of 0.679% and Singh's fuzzy time series models of 0.354%. This means that the best forecasting model for farmer exchange rates in Indonesia is the Singh model.
Pemodelan Waktu Survival Pasien Tuberkulosis menggunakan Regresi Cox Proportional Hazard dengan Data Tersensor Elsa Oktaviani; Nonong Amalita; Atus Amadi Putra; Dony Permana
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/65

Abstract

Tuberculosis is an infectious disease that needs to be watched out for in West Sumatra Province. West Sumatra Province is the province with the 12th highest TB case in Indonesia in 2021 with a total of 8,216 TB cases and a TB treatment cure rate that is still far from the target of the Indonesian Ministry of Health. The purpose of this study is to determine the Cox proportional hazard regression model and factors that affect the survival time of tuberculosis patients at Dr. M. Djamil Padang Hospital. The survival period used is the time when the patient is taking TB treatment at RSUP Dr.  M. Djamil Padang in 2021 until the patient is declared dead. The method used in the Cox Proportional Hazard Regression analysis is the Maximum Partial Likelihood Estimation Method. By using the cox proportional hazard regression model, the factors that influence the survival time of tuberculosis patients at RSUP Dr.  M. Djamil's BMI , leukocytes , fever , shortness of breath , and decreased appetite . 
Comparasion of Error Rate Prediction Methods of C4.5 Algorithm for Balanced Data Ichlas Djuazva; Dodi Vionanda; Nonong Amalita; Zilrahmi
UNP Journal of Statistics and Data Science Vol. 1 No. 4 (2023): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol1-iss4/74

Abstract

C4.5 is a highly effective decision tree algorithm for classification purposes. Compared to CHAID, Cart, and ID3, C4.5 generates the decision tree faster and is easier to understand. However, C4.5 algorithm is also not exempt from errors in classification, which can impact the accuracy of the resulting model. Model accuracy could be measured by predicting the error rate. One commonly used method for error rate prediction is cross-validation. The cross-validation method divides data into two parts: training set to build model and testing set to test the model. There are several cross-validation techniques commonly used to predict the error rate, such as Leave One Out Cross Validation (LOOCV), Hold Out (HO), and k-fold cross-validation. LOO has unbiased estimation but takes a long time and depends on the data size; HO could avoid overfitting and work faster; and k-folds cross validation has a smaller error rate prediction.   This study uses artificially generated data with a normal distribution, including univariate, bivariate, and multivariate datasets with various combinations of mean differences and different correlations. Different correlation structures are applied to see the impact of these different correlations on the error rate prediction method. Considering these factors, this research focuses on comparing three cross-validation methods to predict error rates for the decision tree model generated by C4.5 algorithm. This research found that k-folds cross-validation is the most suitable cross-validation method to apply when testing the model generated by C4.5 algorithm with balanced data
Co-Authors Addini, Vidhiya Ade Eriyen Saputri Adinda Dwi Putri Admi Salma Aldwi Riandhoko Ali Asmar Amanda, Abilya Amelia Fadila Rahman Andini Yulianti Anggi Adrian Danis Anjelisni, Nining april leniati Arnellis Arnellis Atika Ahmad Atus Amadi Putra Azwar Ananda Chairina Wirdiastuti Cindy Febrianita Denia Putri Fajrina Dewi Febiyanti Dewi Murni Dina Fitria Dina Fitria Dina Fitria, Dina Dodi Vionanda Dony Permana Dwi Sulistiowati Dwi Sulistiowati, Dwi Edwin Musdi Elita Zusti Jamaan Elsa Oktaviani Fadhilah Fitri Fadilah, Salwa Hifa Fajrin Putra Hanifi fajriyanti nur, Putri Fatma Yulia Sari Faulina FAZHIRA ANISHA Fikra, Hidayatul Fitri, Fadhilah Gezi Fajri Ghaly, Fayyadh Hamida, Zilfa Hana Rahma Trifanni haniyathul husna Hasna, Hanifa Helma Helma Helma Helma Herlena Purnama Sari Huriati Khaira Ichlas Djuazva Inna Auliya Jihe Chen Juwita Juwita Khairani, Putri Rahmatun Leli, Nur Lilis Sulistiawati Media Rosha Media Rosha Meira Parma Dewi Melly Kurniawati Miftahurrahmi, Syifa Minora Longgom Mohammad Reza febrino Mudjiran Mudjiran Muhammad Tibri Syofyan Mukhti, Tessy Octavia Mutiya, Fenni Kurnia nabillah putri Nadha Ovella Syaqhasdy Nafandra, Bunga Natasya Dwi Ovalingga, natasyalinggaa Nini Erdiani Nur Fadillah, Nur Nurhizrah Gistituati Okia Dinda Kelana Oktaviani, Bernadita Permana, Dony Prida Nova Sari Puti Utari Maharani Rahma, Dzakyyah Resti Febrina Retsya Lapiza Rizki Amalia, Annisa Rizqia Salsabila Rusdinal Rusdinal Saddam Al Aziz Safitri, Melda Salma, Admi Sarmilah, Sarmilah Seif Adil El-Muslih Shavira Asysyifa S Sondriva, Wilia Sujantri Wahyuni Suparman Suparman Swithania Rizka Putri Syafriandi Syafriandi Syafriandi Syafriandi Syafriandi Syahfitrri, Nindi Tamur, Maximus Tessy Octavia Mukhti Tri Wahyuni Nurmulyati Venny Oktarinda Viola Yuniza Wella Saputri Wulan Septya Zulmawati Yarman Yarman, Yarman Yenni Kurniawati Yulia Pertiwi Zamahsary Martha Zilla Zalila Zilrahmi, Zilrahmi