p-Index From 2020 - 2025
6.566
P-Index
This Author published in this journals
All Journal FORUM STATISTIKA DAN KOMPUTASI Media Statistika Statistika JURNAL MATEMATIKA STATISTIKA DAN KOMPUTASI IPTEK The Journal for Technology and Science CAUCHY: Jurnal Matematika Murni dan Aplikasi Sosioinforma International Journal of Advances in Intelligent Informatics Scientific Journal of Informatics JOIN (Jurnal Online Informatika) Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Jurnal Penelitian Pertanian Tanaman Pangan BAREKENG: Jurnal Ilmu Matematika dan Terapan SINTECH (Science and Information Technology) Journal MIND (Multimedia Artificial Intelligent Networking Database) Journal Jurnal Aplikasi Statistika & Komputasi Statistik FIBONACCI: Jurnal Pendidikan Matematika dan Matematika Inferensi International Journal of Advances in Data and Information Systems InPrime: Indonesian Journal Of Pure And Applied Mathematics Majalah Ilmiah Matematika dan Statistika (MIMS) Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika Enthusiastic : International Journal of Applied Statistics and Data Science Prosiding Seminar Nasional Official Statistics Jurnal Natural Eduvest - Journal of Universal Studies Xplore: Journal of Statistics PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND OFFICIAL STATISTICS Parameter: Jurnal Matematika, Statistika dan Terapannya Scientific Journal of Informatics Journal of Mathematics, Computation and Statistics (JMATHCOS) Advance Sustainable Science, Engineering and Technology (ASSET) Indonesian Journal of Statistics and Its Applications Journal on Mathematics Education
Claim Missing Document
Check
Articles

Regularisasi model pembelajaran mesin dengan regresi terpenalti pada data yang mengandung multikolinearitas (Studi kasus prediksi Indeks Pembangunan Manusia di 34 provinsi di Indonesia) Khamidah, Nur; Sadik, Kusman; M Soleh, Agus; Dito, Gerry Alfa
Majalah Ilmiah Matematika dan Statistika Vol. 24 No. 1 (2024): Majalah Ilmiah Matematika dan Statistika
Publisher : Jurusan Matematika FMIPA Universitas Jember

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.19184/mims.v24i1.40360

Abstract

This research intends to model high-dimensional data that contains multicollinearity in four machine-learning algorithms: Random Forest, K-Nearest Neighbor, XGBoost, and Regression Tree. Previously, regularization was carried out with penalized ridge regression, least absolute shrinkage and selection operator (LASSO) regression, and Elastic Net regression. A total of 100 predictor variables and 1 response variable which are the Development Index 2022 data of 34 provinces in Indonesia from BPS were used and standardized. The simulation is also applied to highly correlated data on two distributions, uniform and normal with parameter values taken from existing empirical data. The results showed that the ridge regularization method is the best for producing accurate and stable predictions. Furthermore, there was no difference in the root mean square error (RMSE) results between the data with standardization and without standardization, wherein all the data analyzed it was found that the kNN model was better than other models on simulation data, and the Random Forest and XGBoost models were better than other models on empirical data. In addition, the Regression Tree model is not recommended according to the results of this study. Keywords: regularization, multicollinearity, ridge, LASSO, elastic netMSC2020: 62J07
Densely Connected dan Residual Convolutional Neural Network Untuk Estimasi Jumlah Keluarga Tingkat Desa Dengan Citra Satelit Siregar, Jodi jhouranda; Kurnia, Anang; Sadik, Kusman
SINTECH (Science and Information Technology) Journal Vol. 5 No. 2 (2022): SINTECH Journal Edition Oktober 2022
Publisher : Prahasta Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31598/sintechjournal.v5i2.1191

Abstract

Indonesia conducts a population census every ten years to collect population data. Variables such as family count are collected to provide general population information for policy making and sampling frames. Indonesia as an archipelagic country with an area of 8.3 million km2 will require a lot of resources to collect such data. In the age of big data, satellite imagery has become more available and inexpensive. In this study, we used West Java as a case study for applying deep learning to estimate family counts at the village level. Sentinel-2 and SPOT-67 data are used to model family counts. Using xgboost, we regress the family count with the softmax probability, resulting from family density classification using deep learning (densenet121 and resnet50 ) as the input. With an R2 of 0.93 and a MAPE of 19%, the regression model provides a good prediction of the number of families in the census. Regarding the input data, Sentinel-2 is sufficient to accomplish this task as there is no significant difference from the modeling results with higher resolution images (SPOT 6-7). The input level in the form of a segment of the estimation area and using structured auxiliary variables also deliver better predictions
N-Level Structural Equation Models (nSEM): The Effect of Sample Size on the Parameter Estimation in Latent Random-Intercept Model Eminita, Viarti; Saefuddin, Asep; Sadik, Kusman; Syafitri, Utami Dyah
InPrime: Indonesian Journal of Pure and Applied Mathematics Vol 6, No 1 (2024)
Publisher : Department of Mathematics, Faculty of Sciences and Technology, UIN Syarif Hidayatullah

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15408/inprime.v6i1.38914

Abstract

Multilevel Structural Equation Modeling (MSEM) is claimed to address hierarchical data structures and latent response variables, but it becomes unstable with an increasing number of levels. N-Level SEM (nSEM) is an SEM framework designed to handle a growing number of levels in the model. The nSEM framework uses the Maximum Likelihood Estimation (MLE) method for parameter estimation, which requires a large sample size and correct model specification. Therefore, it is essential to consider the necessary minimal sample size to ensure accurate and efficient parameter estimation in the nSEM model. This study examined how sample size affects the performance of parameter estimators in nSEM models. We propose a method to evaluate the effect of many environments to estimate the results of factor loadings and environmental variance produced by the model. In addition, we also assess the impact of environment size on the estimation results of factor loadings and individual variance. The results were then applied to actual data on student mathematics learning motivation in Depok. The findings show that neither the number of environments nor the size of the environment affects the performance of fixed parameter estimation in the nSEM model. nSEM indicates excellent performance in estimating environmental variance at level 2 when the number of environments increases. Conversely, increasing the size of the environment worsens the performance of estimating individual variance parameters. Overall, the nSEM framework for the latent random-intercept (LatenRI) model performs well with increasing sample sizes. The application data on LatenRI models show almost similar estimation results.Keywords: Hierarchical data; Latent random intercept model; Multilevel structural equation modeling; n-Level structural equation modeling.AbstrakMultilevel Structural Equation Modeling (MSEM) diklaim dapat mengatasi struktur data hierarki dan variabel respons laten, namun menjadi tidak stabil dengan bertambahnya jumlah level. N-Level SEM (nSEM) adalah kerangka kerja SEM yang dirancang untuk menangani semakin banyak level dalam model. Kerangka kerja nSEM menggunakan metode Maximum Likelihood Estimation (MLE) untuk estimasi parameter, yang memerlukan ukuran sampel yang besar dan spesifikasi model yang benar. Oleh karena itu, penting untuk mempertimbangkan ukuran sampel minimal yang diperlukan untuk memastikan estimasi parameter yang akurat dan efisien dalam model nSEM. Studi ini menguji bagaimana ukuran sampel mempengaruhi kinerja penduga parameter dalam model nSEM. Kami mengusulkan metode untuk mengevaluasi pengaruh banyak lingkungan dalam memperkirakan hasil factor loadings  dan varians lingkungan yang dihasilkan oleh model. Selain itu, kami juga menilai dampak ukuran lingkungan terhadap hasil estimasi factor loadings dan varians individu. Hasilnya kemudian diterapkan pada data aktual motivasi belajar matematika siswa di Depok. Hasil menunjukkan bahwa baik jumlah lingkungan maupun ukuran lingkungan tidak mempengaruhi kinerja estimasi parameter tetap pada model nSEM. nSEM menunjukkan kinerja yang sangat baik dalam memperkirakan varians lingkungan pada level 2 ketika jumlah lingkungan meningkat. Sebaliknya, peningkatan ukuran lingkungan akan memperburuk kinerja pendugaan parameter varians individu. Secara keseluruhan, kerangka nSEM untuk model intersepsi acak laten (LatenRI) bekerja dengan baik dengan meningkatnya ukuran sampel. Data penerapan model LatenRI menunjukkan hasil estimasi yang hampir serupa.Kata Kunci: Data hirarki; Model intersep acak laten; Model persamaan structural multilevel; Model persamaan structural n-level. 2020MSC: 62D99
Loan Approval Classification Using Ensemble Learning on Imbalanced Data Anadra, Rahmi; Sadik, Kusman; Soleh, Agus M; Astari, Reka Agustia
Enthusiastic : International Journal of Applied Statistics and Data Science Volume 4 Issue 2, October 2024
Publisher : Universitas Islam Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20885/enthusiastic.vol4.iss2.art1

Abstract

Loan processing is an important aspect of the financial industry, where the right decisions must be made to determine loan approval or rejection. However, the issue of default by loan applicants has become a significant concern for financial institutions. Hence, ensemble learning needs to be used with random forest and Extreme Gradient Boosting (XGBoost) algorithms. Unbalanced data are handled using the Synthetic Minority Over-sampling Technique (SMOTE). This research aimed to improve accuracy and precision in credit risk assessment to reduce human workload. Both algorithms used a dataset of 4,296 with 13 variables relevant to making loan approval decisions. The research process involved data exploration, data preprocessing, data sharing, model training, model evaluation with accuracy, sensitivity, specificity, and F1-score, model selection with 10-fold cross-validation, and important variables. The results showed that XGBoost with imbalanced data handling had the highest accuracy rate of 98.52% and a good balance between sensitivity of 98.83%, specificity of 98.01, and F1-score of 98.81%. The most important variables in determining loan approval are credit score, loan term, loan amount, and annual income.
Classification Performance of Stacking Ensemble with Meta-Model of Categorical Principal Component Logistic Regression on Food Insecurity Data Pangestika, Dhita Elsha; Fitrianto, Anwar; Sadik, Kusman
Scientific Journal of Informatics Vol. 11 No. 4: November 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i4.15315

Abstract

Purpose: Stacking is one type of ensemble whose base-models use different algorithms. The classification results from its base-models are categorical and tend to be associated with each other. They then become input for the stacking meta-model. However, there are no currently definite rules for determining the classifier that becomes the meta-model in stacking. On the other hand, recent research has found that CATPCA-LR can work well on categorical predictor variables associated with each other. Therefore, this study focuses on the classification performance of the stacking algorithm with the CATPCA-LR meta-model. Methods: The study compared the classification performance stacking with CATPCA-LR meta-model to stacking with other meta-models (random forest, gradient boost, and logistic regression) and its base-models (random forest, gradient boost, extreme gradient boost, extra trees, light gradient boost). This research used food insecurity data from March 2022. Result: The stacking algorithm with the CATPCA-LR meta-model performs better insecurity data regarding sensitivity, balanced accuracy, F1-Score, and G-Means values. This model offers a sensitivity of 46.28%, a balanced accuracy of 59.82%, an F1-Score of 37.82%, and a G-Means of 58.26%. Meanwhile, regarding specificity values, the light gradient boost (LGB) algorithm gives the highest value compared to other algorithms. This model provides a specificity value of 88.40%. Generally, the stacking with the CATPCA-LR meta-model algorithm provides the best performance compared with other algorithms on food insecurity data. Novelty: This research has explored a stacking classification performance with CATPCA-LR as meta-model.
PENERAPAN ANALISIS REGRESI LOGISTIK ORDINAL MULTILEVEL DENGAN BAYESIAN DALAM MEMODELKAN TINGKAT KESEJAHTERAAN DATA P3KE Hermawati, Neni; Susetyo, Budi; Sadik, Kusman
Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika Vol. 6 No. 1 (2025): Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistik
Publisher : LPPM Universitas Bina Bangsa

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.46306/lb.v6i1.918

Abstract

The science of statistics is required to continue to develop following the times, because the more the characteristics of the data available in the field are increasingly diverse. The more types of data, the more statistical analysis methods are developed, including hierarchical structured data. P3KE data is new data complementing DTKS which is the basis for the government in distributing social assistance. P3KE data becomes a reference in determining KPM BLT DD. Wanasari is a village that is accustomed to determining KPM BLT DD based on the results of deliberations at the kedusunan level (Musdus). In Wanasari Village, there is often a problem of inconsistency between the KPM candidates from the Musdus and the P3KE data from BKKBN provided through the Cianjur District government. Therefore, it is necessary to analyze the components that have a significant effect on the Welfare Decile of Wanasari Village P3KE data. The data is considered to be hierarchically structured with ordinal response variables. Therefore, multilevel ordinal logistic regression analysis with Bayesian parameter estimation will be used to obtain the best model. Normal (0.10) and Cauchi (0.2.5) priors were compared to find the best model. The results show that the P3KE data of Wanasari Village is hierarchical data because the results of two-level logistic regression analysis are better than one level. The study also concluded that Bayesian parameter estimation is better when using Cauchy prior (0.2.5) both for β coefficient estimation and inter-departmental diversity estimation. The best model obtained is able to explain the diversity between neighborhoods by 1.07 and has an accuracy of 63.23%. Predictor variables that have a significant effect include civil registration equivalents, having money/jewelry/livestock/etc. saved, wall type, cooking fuel, drinking water source, stunting risk, and number of households.
Comparison of Ensemble Learning Methods in Classifying Unbalanced Data on the Bank Marketing Dataset Hasnataeni, Yunia; Sadik, Kusman; Soleh, Agus M; Astari, Reka Agustia
Inferensi Vol 8, No 1 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i1.20569

Abstract

The banking industry is experiencing rapid growth, particularly in telemarketing strategies to increase product and service sales. Despite widespread use, these strategies need higher success rates due to data imbalance, where fewer customers accept offers than those who reject them. This study evaluates machine learning algorithms, including Random Forest, Gradient Boosting, Extra Trees, and AdaBoost, without and handling imbalanced data using the Random Over-Sampling Examples (ROSE) method. The evaluation covers accuracy, precision, recall, F1-score, and AUC of the ROC curve. Results indicate that Random Forest and AdaBoost consistently perform well, with Random Forest maintaining a high accuracy of 91.00% after handling imbalanced data. Gradient Boosting and Extra Trees improve in precision post-oversampling. All models exhibit high AUC values, close to 0.94, demonstrating excellent differentiation between positive and negative classes. The study concludes that addressing data imbalance enhances model performance, making these models suitable for effective telemarketing strategies in the banking sector.
PEMODELAN DATA TERSENSOR KANAN MENGGUNAKAN ZERO INFLATED NEGATIVE BINOMIAL DAN HURDLE NEGATIVE BINOMIAL Kusni Rohani Rumahorbo; Budi Susetyo; Kusman Sadik
Indonesian Journal of Statistics and Applications Vol 3 No 2 (2019)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v3i2.247

Abstract

Health is a very important thing for humanity. One way to look at a person's health condition is through the number of unhealthy days which can also shows the productivity of the community in a region. Modeling the number of unhealthy days which are examples of count data can be done using Poisson regression. Problems that are often faced in data counts are overdispersion and excess zero. Poisson regression cannot be applied to data that experiences both of these. Zero Inflated Negative Binomial and Hurdle Negative Binomial modeling was performed on data with 2 conditions, uncensored and censored. The explanatory variables used are gender, age, marital status, education level, home ownership status and rural-urban status. According to the results of the AIC and RMSE calculation, Zero Inflated Negative Binomial on censored data showed the best performance for estimating the number of unhealthy days.
KAJIAN REGRESI KEKAR MENGGUNAKAN METODE PENDUGA-MM DAN KUADRAT MEDIAN TERKECIL Khusnul Khotimah; Kusman Sadik; Akbar Rizki
Indonesian Journal of Statistics and Applications Vol 4 No 1 (2020)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v4i1.502

Abstract

Regression is a statistical method that is used to obtain a pattern of relations between two or more variables presented in the regression line equation. This line equation is derived from estimation using ordinary least squares (OLS). However, OLS has limitations that are highly dependent on outliers data. One solution to the outliers problem in regression analysis is to use the robust regression method. This study used the least median squares (LMS) and multi-stage method (MM) robust regression for analysis of data containing outliers. Data analysis was carried out on generation data simulation and actual data. The simulation results of regression analysis in various scenarios are concluded that the LMS and MM methods have better performance compared to the OLS on data containing outliers. MM method has the lowest average parameter estimation bias, followed by the LMS, then OLS. The LMS has the smallest average root mean squares error (RMSE) and the highest average R2 is followed by the MM then the OLS. The results of the regression analysis comparison of the three methods on Indonesian rice production data in 2017 which contains 10% outliers were concluded that the LMS is the best method. The LMS produces the smallest RMSE of 4.44 and the highest R2 that is 98%. MM's method is in the second-best position with RMSE of 6.78 and R2 of 96%. OLS method produces the largest RMSE and lowest R2 that is 23.15 and 58% respectively.
Simulation Study of Robust Geographically Weighted Empirical Best Linear Unbiased Predictor on Small Area Estimation: Simulasi Metode Prediksi Tak Bias Linier Terbaik Empiris Terboboti Geografis Kekar pada Pendugaan Area Kecil Naima Rakhsyanda; Kusman Sadik; Indahwati Indahwati
Indonesian Journal of Statistics and Applications Vol 5 No 1 (2021)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v5i1p50-60

Abstract

Small area estimation can be used to predict the population parameter with small sample sizes. For some cases, the population units that are close spatially may be more related than units that are further apart. The use of spatial information like geographic coordinates are studied in this research. Outlier contaminations can affect small area estimations. This study was conducted using simulation methods on generated data with six scenarios. The scenarios are the combination of spatial effects (spatial stationary and spatial non-stationary) with outlier contamination (no outlier, symmetric outliers, and non-symmetric outliers). The purpose of this study was to compare the geographically weighted empirical best linear unbiased predictor (GWEBLUP) and robust GWEBLUP (RGWEBLUP) with direct estimator, EBLUP, and REBLUP using simulation data. The performance of the predictors is evaluated using relative root mean squared error (RRMSE). The simulation results showed that geographically weighted predictors have the smallest RRMSE values for scenarios with spatial non-stationary, therefore offer a better prediction. For scenarios with outliers, robust predictors with smaller RRMSE values offer more efficiency than non-robust predictors.
Co-Authors . Erfiani . Indahwati A.Tuti Rumiati Aam Alamudi Abdullah, Adib Roisilmi Achmad Fauzan Agus Mohamad Soleh Ahmad Rifai Nasution Aji Hamim Wigena Akbar Rizki Akbar Rizki Akbar Rizki Akmala Firdausi Amalia, Rahmatin Nur Anadra, Rahmi Ananda Shafira Anang Kurnia Andespa, Reyuli Andi Okta Fengki ASEP SAEFUDDIN Astari, Reka Agustia Astari, Reka Agustia Aulya Permatasari Azka Ubaidillah Bagus Sartono Budi Susetyo Budi Susetyo Cici Suhaeni Cici Suhaeni Dito, Gerry Alfa Dwi Agustin Nuriani Sirodj Efriwati Efriwati Embay Rohaeti Eminita, Viarti EVITA PURNANINGRUM FARDILLA RAHMAWATI Farit Mochamad Afendi Fitrianto, Anwar Haikal, Husnul Aris Hari Wijayanto Hasnataeni, Yunia Hazan Azhari Zainuddin Hermawati, Neni I Gusti Ngurah, Sentana Putra I Made Sumertajaya I Wayan Mangku Indahwati Indahwati Indahwati Intan Arassah, Fradha Iqbal, Teuku Achmad Isnanda, Eriski Khairi A N Khairil Anwar Notodiputro Khikmah, Khusnia Nurul Khusnul Khotimah Kusni Rohani Rumahorbo Latifah, Leli Lili Puspita Rahayu Logananta Puja Kusuma M Soleh, Agus Mochamad Ridwan Mochamad Ridwan, Mochamad Mohammad Masjkur Muh Nur Fiqri Adham Muhammad Yusran Mulianto Raharjo Naima Rakhsyanda Nisrina Az-Zahra, Putri Nur Khamidah NURADILLA, SITI Nusar Hajarisman Pangestika, Dhita Elsha Parwati Sofan, Parwati Purnama Sari Rifqi Aulya Rahman Rizki, Akbar Rizqi, Tasya Anisah ROCHYATI ROCHYATI Sahamony, Nur Fitriyani Saleh, Agus Muhammad Satriyo Wibowo Siregar, Jodi jhouranda Siti Raudlah Sitti Nurhaliza Soleh, Agus M Suhaeni, Cici Supriatin, Febriyani Eka Tendi Ferdian Diputra Titin Suhartini Titin Suhartini, Titin Tri Wahyuni Uswatun Hasanah Utami Dyah Syafitri Viarti Eminita Widhiyanti Nugraheni Yenni Angraini Yenni Kurniawati Yuli Eka Putri