Claim Missing Document
Check
Articles

Latent Dirichlet Allocation dalam Identifikasi Respon Masyarakat Indonesia Terhadap Covid-19 Tahun 2020-2021 Karel Fauzan Hakim; Pika Silvianti; Agus Mohamad Soleh
Xplore: Journal of Statistics Vol. 10 No. 3 (2021)
Publisher : Department of Statistics, IPB

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (298.682 KB) | DOI: 10.29244/xplore.v10i3.836

Abstract

Covid-19 is a very troubling disease in Indonesia. Therefore, understanding public opinion is required to find solutions and evaluate the government performance in handling the pandemic. Twitter can be helpful to identify the public opinion of significant events. Twitter’s tweet is a large dimension text-based big data. It requires text sampling and text mining to be processed efficiently and effectively. Stratified random sampling with 20 repetitions applied to assume days as strata followed by topic modeling with latent Dirichlet allocation (LDA). This research aims to find out public opinion regarding Covid-19 and itsgrowth over time. Other than that, this research also aims to find out sampling effects on tweet data using stratified random sampling. Therefore, the extracted topics will be transformed into time-series data and considering the variety of the pattern made. Afterward, the transformation results will be explored and interpreted. This research suggests that discussions related to Covid-19 are divided into four topics by the first model, namely: “Vaccine”, “Positive or affected people”, “Health protocol”, and “Indonesia” then nine topics by the second model, namely: “Vaccine”, “Prayer”, “Health protocol”, “Social aid and corruption”, “Affected people”, “Indonesian economy”, “Work”, “Persuading to wear mask”, and “Willing to watch”. Furthermore, some topics peak whenever a significant event occurs in Indonesia. Afterward, this research suggests that 20 repetitions of stratified random sampling could provide good results.
Analisis Tingkat Kepuasan Pelanggan dan Loyalitas Pelanggan terhadap Cafe Infinity Coffee Muhammad Nuruddin Prathama; Muhammad Nur Aidi; Agus Mohamad Soleh
Xplore: Journal of Statistics Vol. 11 No. 2 (2022):
Publisher : Department of Statistics, IPB

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (283.025 KB) | DOI: 10.29244/xplore.v11i2.898

Abstract

Cafe and restaurant businesses are some of the most competitive businesses and have a sizeable market in Jakarta. In this case, the restaurant owner must know the wishes and preferences of the buyer. This research was conducted in one of the cafes in Jakarta "Infinity coffee", this study was conducted to identify consumer characteristics, customer satisfaction, and consumer loyalty. Applying customer satisfaction analysis in the Infinity coffee business can increase understanding of what Infinity coffee consumers want and improve the quality of Infinity coffee services based on research’s results. The analytical methods used in this study are descriptive analysis, Important Performance Analysis (IPA), and the Consumer Satisfaction Index (CSI) as well as correspondence analysis. The results of this study indicate that the entire Infinity coffee service satisfaction index for all aspects is above 80%, which means that the value is included in the satisfied category. However, the IPA scatter diagram shows that there are attributes with a high level of importance that need to be improved in terms of service quality. One of the most important attributes that become a priority for improvement is the attribute of completeness of supporting facilities and adequate cutlery. The Method that used was proven to be successful in examine level of consumer satisfaction also to know more about the characteristic of the consumer.
SUPPORT VECTOR REGRESSION (SVR) METHOD FOR PADDY GROWTH PHASE MODELING USING SENTINEL-1 IMAGE DATA Hengki Muradi; Asep Saefuddin; I Made Sumertajaya; Agus Mohamad Soleh; Dede Dirgahayu Domiri
MEDIA STATISTIKA Vol 16, No 1 (2023): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.16.1.25-36

Abstract

Support Vector Machines (SVMs) have received extensive attention over the last decade because it is claimed to be able to produce models that are accurate and have good predictions in various situations. This study aims to test the SVR (Support Vector Regression) method for modeling the growth phase of paddy using sentinel-1 image data. This method was compared for its accuracy with the LR (Linear Model) method using RMSE and R2 statistics and model stability using 10 repetitions. The accuracy of the model with the two best predictors is when the NDPI and API Polarization Index are the predictors. The paddy age model from the SVR method is better than the paddy age model from the LR method, where the SVR method produces a model with an average RMSE of 11.13 and an average coefficient of determination of 88.10%. The accuracy of the SVR model with NDPI and API predictors can be improved by adding VH polarization to the model, where the average RMSE statistic decreases to 11.0 and the average coefficient of determination becomes 88.42%. In this scenario, the best model gives a minimum RMSE value of 10.35 and a coefficient of determination of 90.05%.
Deep Learning Image Classification Rontgen Dada pada Kasus Covid-19 Menggunakan Algoritma Convolutional Neural Network Leni Anggraini Susanti; Agus Mohamad Soleh; Bagus Sartono
Jurnal Teknologi Informasi dan Ilmu Komputer Vol 10 No 5: Oktober 2023
Publisher : Fakultas Ilmu Komputer, Universitas Brawijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.25126/jtiik.20231057142

Abstract

Penelitian ini mengusulkan penggunaan Convolutional Neural Network (CNN) dengan arsitektur VGGNet-19 dan ResNet-50 untuk diagnosis COVID-19 melalui analisis citra rontgen dada. Modifikasi dilakukan dengan membandingkan nilai regularisasi dropout 50% dan 80% untuk kedua arsitektur dan mengubah jumlah lapisan klasfikasi menjadi 4 kelas. Selanjutnya, kinerja model dibandingkan berdasarkan ukuran dataset. Dataset terdiri dari 21165 citra, dengan pembagian 10% sebagai data uji dan 90% data dibagi menjadi data latih (80%) dan data validasi (20%). Kinerja model dievaluasi menggunakan metode validasi silang berulang 5 kali lipat. Proses pelatihan menggunakan learning rate 0.0001, optimasi stochastic gradient descent (SGD), dan sepuluh iterasi. Hasil penelitian menunjukkan bahwa penambahan lapisan dropout dengan peluang 50% untuk kedua arsitektur secara efektif mengatasi overfitting dan meningkatkan performa model. Ditemukan bahwa kinerja yang lebih baik dicapai pada ukuran kumpulan data lebih besar dan memberikan peningkatan signifikan pada kinerja model. Hasil klasifikasi menunjukkan arsitektur ResNet-50 mencapai akurasi rata-rata 94.4%, recall rata-rata 94.1%, presisi rata-rata 95.5%, spesifisitas rata-rata 97% dan F1-score rata-rata 94.8%. Sedangkan arsitektur VGGNet-19 mencapai akurasi rata-rata 91%, recall rata-rata 89%, presisi rata-rata 95.0%, spesifisitas rata-rata 96.8% dan F1-score rata-rata 92.7%. Pemanfaatan model ini dapat membantu mengidentifikasi penyebab kematian pasien dan memberikan informasi yang berharga bagi pengambilan keputusan medis dan epidemiologi.   Abstract This research proposes using a Convolutional Neural Network (CNN) with VGGNet-19 and ResNet-50 architectures for COVID-19 diagnosis through chest X-ray image analysis. Modifications were made by comparing the dropout regularization values of 50% and 80% for both architectures and altering the number of classification layers to 4 classes. Furthermore, the model's performance was compared based on dataset size. The dataset comprised 21,165 images, with a division of 10% for testing and 90% divided into training data (80%) and validation data (20%). The model's performance was evaluated using the 5-fold repeat cross-validation method. The training process employed a learning rate of 0.0001, stochastic gradient descent (SGD) optimization, and ten iterations. The study's results indicate that adding dropout layers with a 50% probability for both architectures effectively addressed overfitting and improved the model's performance. It was found that better performance was achieved with larger dataset sizes. The classification results indicate the ResNet-50 architecture achieved an average accuracy of 94.4%, average recall of 94.1%, average precision of 95.5%, average specificity of 97%, and average F1-score of 94.8%. Meanwhile, the VGGNet-19 architecture achieved an average accuracy of 91%, an average recall of 89%, average precision of 95.0%, average specificity of 96.8%, and an average F1-score of 92.7%. Utilizing these models can assist in identifying the causes of patient mortality and offer valuable information for medical and epidemiological decision-making.
BETA-BINOMIAL MODEL IN SMALL AREA ESTIMATION USING HIERARCHICAL LIKELIHOOD APPROACH Etis Sunandi; Khairil Anwar Notodiputro; Indahwati Indahwati; Agus Mohamad Soleh
MEDIA STATISTIKA Vol 16, No 1 (2023): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.16.1.88-99

Abstract

Small Area Estimation is a statistical method used to estimate parameters in sub-populations with small or even no sample sizes. This research aims to evaluate the Beta-Binomial model's performance for estimating small areas at the area level. The estimation method used is Hierarchical Likelihood (HL). The data used are simulation data and empirical data. Simulation studies were used to investigate the proposed model. The estimator's Mean Squared Error of Prediction (MSEP) and Absolute Bias (AB) estimator values determine the best estimation criteria. An empirical study using data on the illiteracy rate at the sub-district level in Bengkulu Province. The results of the simulation study show that, in general, the parameter estimators are nearly unbiased. Proportion prediction has the same tendency as parameters. Finally, the HL estimator has a small MSEP estimator. The results of an empirical study show that the average illiteracy rate in Bengkulu province is quite diverse. Kepahiang District has the highest average illiteracy rate in Bengkulu Province in 2021.
Performance of Ensemble Learning in Diabetic Retinopathy Disease Classification Anisa Nurizki; Anwar Fitrianto; Agus Mohamad Soleh
Scientific Journal of Informatics Vol. 11 No. 2: May 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i2.4725

Abstract

Purpose: This study explores diabetic retinopathy (DR), a complication of diabetes leading to blindness, emphasizing early diagnostic interventions. Leveraging Macular OCT scan data, it aims to optimize prevention strategies through tree-based ensemble learning. Methods: Data from RSKM Eye Center Padang (October-December 2022) were categorized into four scenarios based on physician certificates: Negative & non-diagnostic DR versus Positive DR, Negative versus Positive DR, Non-Diagnosis versus Positive DR, and Negative DR versus non-Diagnosis versus Positive DR. The suitability of each scenario for ensemble learning was assessed. Class imbalance was addressed with SMOTE, while potential underfitting in random forest models was investigated. Models (RF, ET, XGBoost, DRF) were compared based on accuracy, precision, recall, and speed. Results: Tree-based ensemble learning effectively classifies DR, with RF performing exceptionally well (80% recall, 78.15% precision). ET demonstrates superior speed. Scenario III, encompassing positive and undiagnosed DR, emerges as optimal, with the highest recall and precision values. These findings underscore the practical utility of tree-based ensemble learning in DR classification, notably in Scenario III. Novelty: This research distinguishes itself with its unique approach to validating tree-based ensemble learning for DR classification. This validation was accomplished using Macular OCT data and physician certificates, with ETDRS scores demonstrating promising classification capabilities.
Land Use Change Modelling Using Logistic Regression, Random Forest and Additive Logistic Regression in Kubu Raya Regency, West Kalimantan Alfa Nugraha Pradana; Anik Djuraidah; Agus Mohamad Soleh
Forum Geografi Vol 37, No 2 (2023): December 2023
Publisher : Universitas Muhammadiyah Surakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.23917/forgeo.v37i2.23270

Abstract

Kubu Raya Regency is a regency in the province of West Kalimantan which has a wetland ecosystem including a high-density swamp or peatland ecosystem along with an extensive area of mangroves. The function of wetland ecosystems is essential for fauna, as a source of livelihood for the surrounding community and as storage reservoir for carbon stocks. Most of the land in Kubu Raya Regency is peatland. As a consequence, peat has long been used for agriculture and as a source of livelihood for the community. Along with the vast area of peat, the regency also has a potential high risk of peat fires. This study aims to predict land use changes in Kubu Raya Regency using three statistical machine learning models, specifically Logistic Regression (LR), Random Forest (RF) and Additive Logistic Regression (ALR). Land cover map data were acquired from the Ministry of Environment and Forestry and subsequently reclassified into six types of land cover at a resolution of 100 m. The land cover data were employed to classify land use or land cover class for the Kubu Raya regency, for the years 2009, 2015 and 2020. Based on model performance, RF provides greater accuracy and F1 score as opposed to LR and ALR. The outcome of this study is expected to provide knowledge and recommendations that may aid in developing future sustainable development planning and management for Kubu Raya Regency.
Pengaruh Penggunaan Random Undersampling, Oversampling, dan SMOTE terhadap Kinerja Model Prediksi Penyakit Cardiovascular (CVD) Uswatun Hasanah; Agus Mohamad Soleh; Kusman Sadik
Jurnal Matematika, Statistika dan Komputasi Vol. 21 No. 1 (2024): SEPTEMBER 2024
Publisher : Department of Mathematics, Hasanuddin University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20956/j.v21i1.35552

Abstract

Cardiovascular Disease (CVD) or commonly known as Heart Disease is a leading cause of mortality globally, prompting extensive research into predictive models to assess individual risk and plan preventive measures. Machine learning approaches such as Random Forest, Support Vector Machine (SVM), and LASSO Logistic Regression have showed promise. Recent studies have indicated that traditional resampling methods like Random Oversampling, Random Undersampling, and SMOTE may not significantly improve model discrimination. This study aims to evaluate the impact of these techniques on the performance of Cardiovascular Disease (CVD) prediction models, utilizing data from the UCI Machine Learning Heart Disease database. By employing LASSO Logistic Regression, Random Forest, and Support Vector Machine (SVM) with resampling techniques, including Random Oversampling, Random Undersampling, and SMOTE. This research seeks to enhance understanding of model performance in addressing class imbalances within the dataset and contribute to refining cardiovascular disease (CVD) prediction strategies. This study demonstrates that the use of the SMOTE technique significantly enhances the performance of cardiovascular disease (CVD) prediction models. Specifically, when combined with the Random Forest algorithm, SMOTE achieves the best performance in terms of accuracy, sensitivity, and specificity. This highlights the importance of selecting appropriate resampling techniques to handle class imbalance in datasets. Consequently, this research contributes to refining CVD prediction strategies and provides new insights into improving prediction accuracy in imbalanced medical data.
Metode Machine Learning-Based Univariate Time Series Imputation Method untuk Estimasi Nilai Hilang pada Data Non-Stasioner Dini Ramadhani; Agus Mohamad Soleh; Erfiani Erfiani
Jurnal Matematika, Statistika dan Komputasi Vol. 21 No. 1 (2024): SEPTEMBER 2024
Publisher : Department of Mathematics, Hasanuddin University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20956/j.v21i1.36468

Abstract

Handling missing values in time series data is crucial because they can disrupt data analysis and interpretation. Sequentially missing values in time series often pose a more complex challenge compared to randomly missing values. One of the promising recent methods is Machine Learning-Based Univariate Time Series Imputation (MLBUI), although it is still not widely used and its accessibility is limited. MLBUI employs Random Forest Regression (RFR) and Support Vector Regression (SVR) algorithms. This study evaluates the performance of MLBUI in addressing missing data scenarios in non-stationary univariate time series data. The data used in this research is the average temperature data from Bogor Regency. The missing data scenarios considered include rates of 6%, 10%, and 14%. Besides MLBUI, five other comparison methods are used: Kalman StructTS, Kalman Auto-ARIMA, Spline Interpolation, Stine Interpolation, and Moving Average. The results show that MLBUI performs poorly for non-stationary data, although the obtained Mean Absolute Percentage Error (MAPE) is below 10%.
Evaluasi Perbandingan Kinerja Algoritma Cheng and Church Biclustering Terhadap Algoritma Clustering Klasik K-Means untuk Mengidentifikasi Pola Distribusi Barang Ekspor Indonesia Baehera, Seta; Utami Dyah Syafitri; Agus Mohamad Soleh
Jurnal Statistika dan Aplikasinya Vol 7 No 2 (2023): Jurnal Statistika dan Aplikasinya
Publisher : Program Studi Statistika FMIPA UNJ

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21009/JSA.07204

Abstract

Clustering is a process of grouping data into several groups (clusters) so that data in one cluster has a homogeneous level of similarity and data between clusters has heterogeneous similarity. A common example of a clustering algorithm is K-Means Clustering. Compared with classical clustering algorithms, the biclustering algorithm is a two-dimensional data grouping process. The biclustering algorithm functions to find data submatrices, namely row subgroups and column subgroups that have high correlation. One example of a biclustering algorithm is Cheng and Church Biclustering (CC Biclustering). The aim of this research is to evaluate the performance of the biclustering algorithm against classical clustering algorithms. Analysis applied to CC Biclustering and K-Means Clustering to identify distribution patterns of Indonesian export goods in the period 2013 to 2022. Based on research results, the optimal scenario for the K-Means algorithm is scenario 2, that is the application of the 7 cluster K-Means algorithm with pre- processing data scaling. Meanwhile, the optimal scenario for the CC Biclustering algorithm is scenario 1, that is the application of the CC Biclustering algorithm with a tolerance value of 0.10 with data scaling pre-processing. However, from these two scenarios, based on the MSR/Volume value, it was concluded that the best scenario is scenario 1 in the application of the CC Biclustering algorithm which has an MSR/Volume value of 0.077.