Tatik Widiharih
Departemen Statistika, Fakultas Sains Dan Matematika, Universitas Diponegoro

Published : 42 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Kernel K-Means Clustering untuk Pengelompokan Sungai di Kota Semarang Berdasarkan Faktor Pencemaran Air Anestasya Nur Azizah; Tatik Widiharih; Arief Rachman Hakim
Jurnal Gaussian Vol 11, No 2 (2022): Jurnal Gaussian
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/j.gauss.v11i2.35470

Abstract

K-Means Clustering is one of the types of non-hierarchical cluster analysis which is frequently used, but has a weakness in processing data with non-linearly separable (do not have clear boundaries) characteristic and overlapping cluster, that is when visually the results of a cluster are between other clusters. The Gaussian Kernel Function in Kernel K-Means Clustering can be used to solve data with non-linearly separable characteristic and overlapping cluster. The difference between Kernel K-Means Clustering and K-Means lies on the input data that have to be plotted in a new dimension using kernel function. The real data used are the data of 47 rivers and 18 indicators of river water pollution from Dinas Lingkungan Hidup (DLH) of Semarang City in the first semester of 2019. The cluster results evaluation is used the Calinski-Harabasz, Silhouette, and Xie-Beni indexes. The goals of this study are to know the step concepts and analysis results of Kernel K-Means Clustering for the grouping of rivers in Semarang City based on water pollution factors. Based on the results of the study, the cluster results evaluation show that the best number of clusters K=4
ANALISIS KLASIFIKASI MENGGUNAKAN METODE REGRESI LOGISTIK BINER DAN BOOTSTRAP AGGREGATING CLASSIFICATION AND REGRESSION TREES (BAGGING CART) (Studi Kasus: Nasabah Koperasi Simpan Pinjam Dan Pembiayaan Syariah (KSPPS)) Salma Innassuraiya; Tatik Widiharih; Iut Tri Utami
Jurnal Gaussian Vol 11, No 2 (2022): Jurnal Gaussian
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/j.gauss.v11i2.35458

Abstract

The Save Loan and Sharia Financing Cooperatives (KSPPS) is a financial institution that offers deposits, loans, and financing to its members while adhering to Islamic sharia rules. Customers payment behaviour is influenced by their background differences, such as age, gender, occupation, and so on. The classification method is used to determine the characteristics of members who are currently in arears or are stuck in arears. Binary Logistic Regression and Bootstrap Aggregating Classification and Regression Trees were utilized as classification methods (BAGGING CART). A Logistic Regression with binary response variables is known as a Binary Logistic Regression. By resampling 50 times, the technique with the BAGGING process is used to improve the performance of the classification using CART. Customer data from one of the KSPPS in Central Java in 2021 was used in this investigation. Gender, age, marital status, employment, education level, time period, and income were the independent variables in this study, whereas payment status was the dependent variable (not stuck and stuck). The Binary Logistic Regression approach had an accuracy of 78.67 percent with an APER 21.33 percent, a Press's Q of 24.65, and a specificity of 98.30 percent, according to the classification accuracy statistics. The accuracy of the classification produced by CART with an accuracy value of 77.33 percent with an APER 22.67 percent, the value of Press's Q is 22,413, and specificity is 94.91 percent, then approached by BAGGING process the accuracy of the resulting classification by predicting data testing accuracy value of 78.67 percent with an APER 21.33 percent, press's Q value of 24.65, and specificity of 96.61 percent. Based on these findings, it can be inferred that using the BAGGING process can increase the CART method's performance to the point where it is nearly as good as Binary Logistic Regression, which has a slightly higher classification accuracy
PENGELOMPOKAN PROVINSI DI INDONESIA BERDASARKAN INDIKATOR KESEHATAN LINGKUNGAN MENGGUNAKAN METODE PARTITIONING AROUND MEDOIDS DENGAN VALIDASI INDEKS INTERNAL Diah Aliyatus Saidah; Rukun Santoso; Tatik Widiharih
Jurnal Gaussian Vol 11, No 2 (2022): Jurnal Gaussian
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/j.gauss.v11i2.35478

Abstract

Environmental health is an important aspect in efforts to achieve public health. The condition of environmental health in Indonesia is varies in each province, so the priorities for increasing environmental health are also different. This study aims to grouping provinces in Indonesia based on environmental health indicators in order to know the high/low environmental quality in each province to assist the government in optimizing environmental health efforts. The grouping of provinces is done partitioning around medoids method which is robust to data containing outliers. The measure of similarity objects is calculated using the Euclidean and Manhattan distances, the selection of the best number of clusters is done by validating the internal index, namely the Calinski-Harabasz index, Baker-Hubert index, silhouette index, C-index, and Davies-Bouldin index. The result of this study is that the best number of clusters are two clusters using the Manhattan distance measurement method, with the largest Calinski-Harabasz index value = 24.10072, the largest Baker-Hubert index = 0.8466251, the largest silhouette index = 0.4246581, the smallest C-index = 0.07290109, and the smallest Davies-Bouldin index = 1.094805.
GUI R UNTUK ANALISIS KERANJANG BELANJA DENGAN ALGORITMA APRIORI PADA SUATU PERUSAHAAN E-COMMERCE Ryan Anugrah; Tatik Widiharih; Sugito Sugito
Jurnal Gaussian Vol 11, No 2 (2022): Jurnal Gaussian
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/j.gauss.v11i2.35475

Abstract

Technological developments help people live easier. One of the technological developments is being able to trade digitally or it can be called e-commerce. To increase revenue, e-commerce companies collect consumer sales history data that can be analyzed and obtain information about consumer habits. One of the analyzes that can be used is shopping basket analysis which aims to find a pattern in transaction data. In data processing and analysis is done using the R program computation and GUI R is made with a recommendation system simulation. The results of the shopping cart analysis produce as many as 22 rules using a minimum support of 0.06 and a confidence of 0.5. The greater the support value, the more often the product or rule is purchased by consumers from all data transactions and vice versa. Meanwhile, the greater the trust value, the more often the products purchased under the regulation are purchased together. Thus, the information can be used to help carry out promotions to increase sales by the company.
PENERAPAN TUNING HYPERPARAMETER RANDOMSEARCHCV PADA ADAPTIVE BOOSTING UNTUK PREDIKSI KELANGSUNGAN HIDUP PASIEN GAGAL JANTUNG Tita Aulia Edi Putri; Tatik Widiharih; Rukun Santoso
Jurnal Gaussian Vol 11, No 3 (2022): Jurnal Gaussian
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/j.gauss.11.3.397-406

Abstract

Heart failure is the number one cause of death every year. Heart failure is a pathological condition characterized by abnormalities in heart function, which results in the failure of blood to be pumped to supply metabolic needs of tissues. The application of data mining and computational techniques to medical records can be an effective tool to predict each patient's survival who has heart failure symptoms. Data mining is a process of gathering important information from big data. The collection of important information is carried out through several processes, including statistical methods, mathematics, and artificial intelligence technology. The AdaBoost method is one of the supervised algorithms in data mining that is widely applied to make classification models. Hyperparameter Optimization is selecting the optimal set of hyperparameters for a learning algorithm. AdaBoost has hyperparameters requiring a classification process set, namely learning rate and n_estimators. RandomSearchCV is a random combination method of selected hyperparameters used to train the model. This research uses heart failure patient data collected at the Faisalabad Institute of Cardiology and at the Allied Hospital in Faisalabad (Punjab, Pakistan) from April to December 2015. The research uses learning rate: [-2.2] (log scale), n_estimators start from 10 to 776, and Kfold=5 and produces the best hyperparameters in learning rate=0.01 and n_estimators=443 with an accuracy value of 0.85 and AUC value of 0.897.
PENERAPAN DIAGRAM KONTROL MEWMA DALAM PENGENDALIAN KUALITAS PRODUKSI KERIPIK SINGKONG PADA UMKM DI KOTA SEMARANG Nesari Nesari; Mustafid Mustafid; Tatik Widiharih
Jurnal Gaussian Vol 11, No 3 (2022): Jurnal Gaussian
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/j.gauss.11.3.355-365

Abstract

Quality is the main thing that needs to be considered by every company. Ceriping Bintang Putra Bu Slamet is an UMKM (Usaha Mikro, Kecil dan Menengah) that produces cassava chips. During production, there are three quality characteristics, namely large crumbs defects, small crumbs, and chips sticking together. It is important to control these defects to produce quality products according to customer needs. This research was conducted from July to August 2021. The purpose of this study was to control the production quality of cassava chips using the Multivariate Exponentially Weighted Moving Average (MEWMA) control chart and multivariate process capability analysis. The MEWMA control chart is used to detect the shift in the process average which is more sensitive using weights (λ), while the process capability analysis is used to determine the process performance. The implementation of the MEWMA control chart is carried out in two stages, namely phase I control to obtain the optimal weighting and control limits so that it can be used in phase II control to monitor the average process for the next period. Based on the results of the analysis, the optimal weighting is λ =0,4 with BKA=201,7434, GT=113,538, and BKB=0 in phase I control. Then, the results of phase II control show a shift in the average process in a better direction. In addition, the results of the process capability analysis show an improvement in the performance of the production process from July 2021 to August 2021 with MCpm values of 0,535 and 1,147
KLASIFIKASI PENYAKIT HIPERTENSI MENGGUNAKAN METODE SVM GRID SEARCH DAN SVM GENETIC ALGORITHM (GA) Fithroh Oktavi Awalullaili; Dwi Ispriyanti; Tatik Widiharih
Jurnal Gaussian Vol 11, No 4 (2022): Jurnal Gaussian
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/j.gauss.11.4.488-498

Abstract

Hypertension is an abnormally high pressure that occurs inside the arteries. Hypertension increased by 8.3% from 2013 based on health research in 2018. Some of the factors that cause hypertension include gender, age, salt consumption, cigarette consumption, cholesterol levels and a family history of hypertension. The data in this study are data on normal and hypertensive patients at the Padangsari Health Center for the period of July – December 2021. This study will classify blood pressure with the aim of obtaining the results of the accuracy of the classification of the methods used. The method used in this study is a support vector machine (SVM). SVM is a well-known algorithm, producing optimal solutions to classification problems. SVM uses kernel functions for separable nonlinear data. The displacement kernels used in this study are linear and RBF. SVM has the disadvantage of determining the best parameters, to overcome these weaknesses developed the method of finding the best parameters. The search for the parameters of this study used grid search and genetic algorithm (GA).  Grid search has the advantage of producing parameters that are close to the optimal value, while GA has the advantage of being easy to find global optimum values. This study will compare the classification results of the SVM grid search and SVM GA methods. The results of this study obtained the method that has the best accuracy, namely SVM grid search using a radial base function (RBF) kernel with an accuracy of 89.22%.
PERBANDINGAN SMOTE DAN ADASYN PADA DATA IMBALANCE UNTUK KLASIFIKASI RUMAH TANGGA MISKIN DI KABUPATEN TEMANGGUNG DENGAN ALGORITMA K-NEAREST NEIGHBOR Dinda Virrliana Ramadhanti; Rukun Santoso; Tatik Widiharih
Jurnal Gaussian Vol 11, No 4 (2022): Jurnal Gaussian
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/j.gauss.11.4.499-505

Abstract

Poverty is a global problem that has occurred in various countries with various impacts. Poverty conditions are characterized by the inability of a person or household to meet the basic needs of life. Socio-economic problems, such as poverty, can be handled using machine learning, one of which is classification. The classification of households based on poverty criteria is expected to assist the government in preparing programs that are right on target. K-Nearest Neighbor is one of the easy-to-use classification algorithms. this classification is based on the closest neighborliness. The problem that can be experienced when classifying is if the data used is imbalanced. The data imbalance will causing the classification process to focus more on the majority class. SMOTE and ADASYN are used to solve the problem of imbalanced data. This study resulted in the addition of  SMOTE and ADASYN to imbalanced data can improve classification performance, especially on the G-mean value. G-mean is a performance measure that is widely used in the case of imbalanced data. The result of this study is that SMOTE can increase the G-mean value to 58.5%, while ADASYN is 57.3%. Therefore, it can be concluded that SMOTE-KNN is the best classification model for household poverty classification.
ANALISIS PENGARUH KUALITAS PELAYANAN TERHADAP KEPUASAN PENUMPANG BRT TRANS SEMARANG MENGGUNAKAN PARTIAL LEAST SQUARE (PLS) Irma Dwi Tyana; Tatik Widiharih; Iut Tri Utami
Jurnal Gaussian Vol 11, No 4 (2022): Jurnal Gaussian
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/j.gauss.11.4.591-604

Abstract

BRT Trans Semarang is an integrated bus transportation system that operates in Semarang City and parts of Semarang Regency. This transportation provides service facilities such as the availability of bus stops, air-conditioned rooms to travel route information. The facility is expected to be able to provide service satisfaction for its passengers. This study was conducted to determine the effect of service quality on the satisfaction of Trans Semarang BRT passengers using Partial Least Square (PLS), with a case study of Diponegoro University students. PLS is an alternative approach from covariance-based SEM to variance-based. The advantage of PLS is that it is able to handle covariance-based SEM problems such as small sample numbers, abnormal data and the presence of multicholinearity. The quality of this service is measured through the variables of Direct Evidence, Reliability, Responsiveness, Empathy and Guarantee. Passenger satisfaction is measured through a sense of pleasure, a positive impression and the absence of complaints. The results showed that the variables that had a significant effect on the satisfaction of Trans Semarang BRT passengers were the variables of Direct Evidence, Reliability and Responsiveness. Variables that do not have a significant effect on the satisfaction of Trans Semarang BRT passengers are the empathy and guarantee variables. The Adjusted R-Square value is included in the medium category with a value of 0.414, means that the variables of Direct Evidence, Reliability and Responsiveness affect the satisfaction of Trans Semarang BRT passengers by 41.4%. 
ANALISIS SENTIMEN PADA ULASAN APLIKASI INVESTASI ONLINE AJAIB PADA GOOGLE PLAY MENGGUNAKAN METODE SUPPORT VECTOR MACHINE DAN MAXIMUM ENTROPY Fath Ezzati Kavabilla; Tatik Widiharih; Budi Warsito
Jurnal Gaussian Vol 11, No 4 (2022): Jurnal Gaussian
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/j.gauss.11.4.542-553

Abstract

Investment is money or asset to earn profits in the future. Online investment applications are already available, one of which is Ajaib. A review of Ajaib’s application is needed to find out reviews given are positive or negative. Sentiment analysis in Ajaib is used to see the user's response to Ajaib’s performance which is divided into positive and negative classes. Sentiment analysis of the Ajaib’s reviews classification can be used with the Support Vector Machine and Maximum Entropy methods. Support Vector Machine on non-linear problems inserts the kernel into a high-dimensional space, to find a hyperplane that can maximize the distance between classes. The kernel used in SVM is the Radial Basis Function (RBF) kernel with gamma parameters of 0.002 and Cost (C) of 0.1; 1; 10. Maximum Entropy is a classification technique that uses the entropy value to classify data with the evaluation model used, namely 5-fold cross-validation. The algorithm which has the highest accuracy and kappa statistics is the best algorithm for classifying the sentiments of Ajaib users. The results using the Support Vector Machine algorithm show the overall accuracy is 85.75% and the kappa accuracy is 58.07%. The results using the Maximum Entropy algorithm show an overall accuracy of 83% and kappa accuracy of 50.5%. This shows that sentiment using the Support Vector Machine has a better performance than Maximum Entropy.