Claim Missing Document
Check
Articles

Found 39 Documents
Search

Sentiment Analysis Using Support Vector Machine (SVM) of ChatGPT Application Users in Play Store Muthia Sakhdiah; Admi Salma; Dony Permana; Dina Fitria
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/158

Abstract

The ChatGPT application is an Articial Intelligence (AI) technology that responds to conversations in form text and voice messages, and is accessible via smartphones or computers. The ChatGPT provides answers and solutions related to the problems asked, the speed and complexity of the answers are also added values of this application. However, there are negative impacts, one of which is the vulnerability of scientific papers to plagiarism. Because of this, there are many reviews from the community that assess this application. These reviews can be seen on the Play Store which can be a reference before downloading the ChatGPT application. How the community responds can be seen through sentiment analysis, which will classify positive and negative assessments. Making it easier for companies to evaluate products. Then classification is carried out using Support Vector Machine (SVM), the classification model obtained is used to classify user reviews of the ChatGPT application. The results showed an accuracy of 93.9% with a linear kernel, and the sentiment of people who use the ChatGPT application is more positive.
Impelementation of Subtractive Fuzzy C-Means Method in Clustering Provinces in Indonesia Based on Factors Causing Stunting in Toddlers Hariati Ainun Nisa; Admi Salma; Dodi Vionanda; Tessy Octavia Mukhti
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/164

Abstract

Indonesia in 2022 has a stunting rate that is still relatively high at 21.6%. For this reason, it is necessary to make various efforts to reduce the stunting rate. One of the efforts that can be made is to understand the characteristics of each province in Indonesia with cluster analysis. This study aims to cluster provinces in Indonesia based on factors that cause stunting in children under five. The method used is Subtractive Fuzzy C-Means which has advantages in terms of speed, iteration, thus producing more stable and accurate results. The results of the validity test with Silhouette Coefficient Index, the optimum number of clusters is 8 clusters with a radius (r) of 0.70. There are 8 provinces that have provided maximum handling and efforts in reducing stunting rates, namely the provinces of Bangka Belitung Islands, Riau Islands, DKI Jakarta, DI Yogyakarta, Bali, East Kalimantan, South Kalimantan, and South Sulawesi. Meanwhile, 7 provinces namely East Nusa Tenggara, South Kalimantan, Central Sulawesi, West Sulawesi, Maluku, North Maluku, and West Papua, still need special attention from the government in reducing stunting rates based on the factors that cause stunting discussed in this study.
Analisis Sentimen Pengguna Aplikasi X terhadap Konflik antara Israel dan Palestina Menggunakan Algoritma Support Vector Machine Carina, Fadhillah Meisya; Admi Salma; Dony Permana; Zamahsary Martha
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/170

Abstract

The conflict between Israel and Palestine is the Middle East's longest-running conflict since 1917 and is still ongoing today. This is one of the international conflicts that involves many Arab countries and Western countries in the dispute. The conflict between Israel and Palestine has caused countries in the world to be divided into two camps, namely the pro Palestinian independence camp and the contra camp. The impact of this conflict also creates polarization among Indonesians and forms diverse public opinions on the social media application X. The purpose of this research is to find out how the classification of sentiment of X application users affects the conflict between Israel and Palestine. An analysis that is utilized to convert text-based public opinion data into information is sentiment analysis. The chosen algorithm to separate data classes is the Support Vector Machines algorithm, which can classify data by determining the best hyperplane to provide a separator between opinions that are pro Israel or pro Palestine. After the preprocessing stage, 1000 tweets data were obtained with 800 training data and 200 testing data. The accuracy rate is 93%, precision is 92.93%, recall is 100%, and f-measure is 96.33%. From the results of testing 200 data points, there were 198 pro Palestine opinions and 2 pro Israel opinions, so that it might be said that more individuals favor or support Palestinian independence in the conflict that occurred between Israel and Palestine.
Perbandingan Algoritma C4.5 dan C5.0 Dalam Klasifikasi Status Gizi Balita Stunting harelvi, dhea afrila; Admi Salma; Yenni Kurniawati; Fadhilah Fitri
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/172

Abstract

Stunting is one of the health conditions that reflect aspects of nutrition and child growth, allowing us to observe the nutritional status of toddlers. The aim of this study is to determine the classification results of the C4.5 and C5.0 algorithms in cases of stunted toddler nutritional status and to compare the results between the C4.5 and C5.0 algorithms in classifying stunted toddler nutritional status using k-fold cross-validation. The data in this study are secondary data. Which is collected from Puskesmas IV Pesisir Selatan Regency. The research variables are divided into two, namely the response variable Y, which is Toddler Nutritional Status, and predictor variables X including Age, Toddler Gender, Toddler Weight, and Toddler Height. The result of the study obtain the algorithm C5.0 produse accuracy value of the C5.0 algorithm is higher than that of the C4.5 algorithm. The C5.0 algorithm provides an average accuracy result of 83% while the C4.5 algorithm provides an accuracy result of 79%. Thus, it can be concluded that the C5.0 algorithm is better at classifying stunted toddler nutritional status.
Random Forest Implementation for Air Pollution Standard Index Classification in DKI Jakarta 2022 Hasna, Hanifa; Nonong Amalita; Dony Permana; Admi Salma
UNP Journal of Statistics and Data Science Vol. 2 No. 2 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss2/173

Abstract

Air pollution is a serious challenge in various cities, including DKI Jakarta. Based on measurements of the Air Pollution Standard Index carried out by the DKI Jakarta Environmental Service, the air quality in DKI Jakarta is considered moderate to unhealthy. Deteriorating air quality in the Jakarta metropolitan area is very dangerous for humans and living things. Therefore, to prevent the problem, the classification of air quality based on pollutant content is carried out using Random Forest (RF). The application of RF will form several trees that can provide better predictions and are able to produce low errors. The result of this study obtained optimal tree formation, namely tree formation using a combination of mtry (any input variables randomly selected in one sorting node)=2 and ntree (number of trees in the forest) as many as 5000 trees. The resulting accuracy was 99.17% with an OOB error rate of 0.83%. This research identifies that particulate pollutants are the main factor causing air pollution in DKI Jakarta. Based on these results, it shows that RF is able to provide accurate predictions about the level of air pollution in DKI Jakarta and can be identify important factors that affect air pollution.
PENGEMBANGAN DATA NAGARI TANJUNG GADANG MENUJU DESA DIGITAL Yenni Kurniawati; Dina Fitria; Admi Salma
Pelita Eksakta Vol 6 No 2 (2023): Pelita Eksakta, Vol. 6, No. 2
Publisher : Fakultas MIPA Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/pelitaeksakta/vol6-iss2/210

Abstract

Developing villages' data toward digital data is one of villages' government programs to improve the villages. The village's government needs collaboration with professional surveyors and data digital builders to achieve the goal, which the government is unable to provide. The Statistics Department provided the team to overcome the problems by giving training surveys to local residents and accompanying them to build Nagari Tanjung Gadang digital data.
Classification of Dropout Rates in West Sumatra Using the Random Forest Algorithm with Synthetic Minority Oversampling Technique Anita Fadila; Syafriandi Syafriandi; Yenni Kurniawati; Admi Salma
UNP Journal of Statistics and Data Science Vol. 2 No. 3 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss3/183

Abstract

This study aims to classify school dropout rates in West Sumatra Province using the Random Forest algorithm with the Synthetic Minority Oversampling Technique (SMOTE). Based on 2021 data from the Ministry of Education, Culture, Research, and Technology (Kemdikbudristek), the dropout rate in West Sumatra is above the national average. Despite efforts to reduce dropout rates, results remain suboptimal. Therefore, this study seeks to identify the causes of student dropouts and compare the performance of the Random Forest algorithm with and without SMOTE. The study uses the 2021 dropout data from West Sumatra, which has a significant class imbalance. SMOTE is applied to balance the data. The dataset is split into training and testing sets in an 80%:20% ratio, and parameter tuning is performed to optimize mtry and the number of trees (ntree). The model is evaluated using a confusion matrix to compare performance. The results show that Random Forest with SMOTE outperforms the version without SMOTE, with improvements in precision, recall, and F1-score. The presence of the biological mother ( ) is identified as the most significant factor influencing student dropouts, based on the Mean Decrease Gini value. The study concludes that using SMOTE in the Random Forest algorithm helps reduce classification bias and enhances the model's ability to detect students at risk of dropping out.
Pemetaan Indikator Pertumbuhan Ekonomi Di Provinsi Sumatera Barat Menggunakan Analisis Korespondensi Berganda Addini, Vidhiya; Dony Permana; Nonong Amalita; Admi Salma
UNP Journal of Statistics and Data Science Vol. 2 No. 3 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss3/190

Abstract

Economic growth is a key factor in sustainable regional development. This study employs Multiple Correspondence Analysis (MCA) to explore the relationships among economic growth indicators in the districts/cities of West Sumatra Province. Data from 2022 provided by the Central Statistics Agency are used to analyze economic growth indicators, including Gross Regional Domestic Product (GRDP) at Constant Prices (X1), Human Development Index (X2), Labor Force Participation (X3), Domestic Investment (X4), Government Expenditure (X5), and Balance Fund Allocation (X6). The results of MCA reveal complex relationships among these variables, with the first and second dimensions explaining approximately 44.43% of the data variance. The MCA plots visualize clusters of districts/cities based on their economic characteristics. From these plots, it is concluded that there are disparities in economic growth indicators in West Sumatra Province, with 11 districts/cities requiring special attention to achieve equitable and sustainable economic growth. This study contributes to a deeper understanding of regional economic disparities in West Sumatra Province and their relevance to more targeted and sustainable development policies.
Pemodelan Tingkat Partisipasi Angkatan Kerja Terhadap Persentase Penduduk Miskin di Jawa Timur Tahun 2023 Menggunakan Metode B-Spline Ibnul farizi, Gilang; Zilrahmi; Dony Permana; Admi Salma
UNP Journal of Statistics and Data Science Vol. 2 No. 4 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss4/215

Abstract

Poverty is a common issue in Indonesia. Data on the Percentage of Poor Population against the Labor Force Participation Rate (LFPR) per district/city, consisting of 38 districts/cities in East Java Province in 2023, indicates that the highest percentage of poverty in East Java Province in 2023 was 21,760. Employment is considered the most effective solution to alleviate poverty. The data in this study shows a distribution pattern that does not form a specific pattern, making it difficult to analyze using parametric methods. Therefore, the appropriate approach is Nonparametric Regression. In this study, the nonparametric regression used is the B-Spline regression model. The suitability of the model is based on the Mean Squared Error (MSE) value of the model. The analysis results indicate that the B-Spline regression model achieves an MSE value of 20.11447. The optimal MSE value is obtained from B-Spline estimation with order 2. This suggests that the B-Spline method provides a good explanation in addressing the issue
Estimation of Poverty in North Sumatera in 2022 using Truncated and Penalized Spline Regression Kurnia Andrea Diva; Fadhilah Fitri; Dony Permana; Admi Salma
UNP Journal of Statistics and Data Science Vol. 2 No. 4 (2024): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol2-iss4/217

Abstract

The Sustainable Development Goals' main goal is to reduce poverty (SDGs). Low human capital is the cause of poverty. The Human Development Index is one indicator that can be used to assess human capital (HDI). Despite having the largest population on the island of Sumatra, North Sumatra continues to have the fifth highest poverty rate. Because the pattern of the relationship between poverty and HDI based on previous research is still unclear because the results are inconsistent, nonparametric regression modeling was used in this study because it is flexible in following the pattern of data relationships and can avoid model prespecific errors. This study aims to compare the Spline Truncated and Penalized Spline regression methods. The results of the comparison between the Truncated Spline regression model and the P-Spline regression model by looking at the smallest MSE value showed that a better estimator for modeling the Human Development Index in North Sumatera in 2022 is non-parametric regression using the truncated spline estimaor. where the best truncated spline modeling is at order 2 with one knot point located at X = 66.93 with a GCV value of 6.0543.