p-Index From 2021 - 2026
7.537
P-Index
This Author published in this journals
All Journal FORUM STATISTIKA DAN KOMPUTASI Media Statistika Statistika JURNAL MATEMATIKA STATISTIKA DAN KOMPUTASI IPTEK The Journal for Technology and Science CAUCHY: Jurnal Matematika Murni dan Aplikasi Sosioinforma JUITA : Jurnal Informatika Jurnal Pengelolaan Sumberdaya Alam dan Lingkungan (Journal of Natural Resources and Environmental Management) International Journal of Advances in Intelligent Informatics Scientific Journal of Informatics JOIN (Jurnal Online Informatika) Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Indonesian Journal of Applied Statistics Jurnal Penelitian Pertanian Tanaman Pangan BAREKENG: Jurnal Ilmu Matematika dan Terapan JOURNAL OF APPLIED INFORMATICS AND COMPUTING SINTECH (Science and Information Technology) Journal MIND (Multimedia Artificial Intelligent Networking Database) Journal JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika) Jurnal Aplikasi Statistika & Komputasi Statistik FIBONACCI: Jurnal Pendidikan Matematika dan Matematika Inferensi International Journal of Advances in Data and Information Systems InPrime: Indonesian Journal Of Pure And Applied Mathematics ESTIMASI: Journal of Statistics and Its Application Majalah Ilmiah Matematika dan Statistika (MIMS) Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika Journal of Applied Data Sciences Enthusiastic : International Journal of Applied Statistics and Data Science Prosiding Seminar Nasional Official Statistics Jurnal Natural Eduvest - Journal of Universal Studies Xplore: Journal of Statistics PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND OFFICIAL STATISTICS Parameter: Jurnal Matematika, Statistika dan Terapannya Scientific Journal of Informatics Journal of Mathematics, Computation and Statistics (JMATHCOS) Advance Sustainable Science, Engineering and Technology (ASSET) Indonesian Journal of Statistics and Its Applications Journal on Mathematics Education
Claim Missing Document
Check
Articles

Regularisasi model pembelajaran mesin dengan regresi terpenalti pada data yang mengandung multikolinearitas (Studi kasus prediksi Indeks Pembangunan Manusia di 34 provinsi di Indonesia) Khamidah, Nur; Sadik, Kusman; M Soleh, Agus; Dito, Gerry Alfa
Majalah Ilmiah Matematika dan Statistika Vol. 24 No. 1 (2024): Majalah Ilmiah Matematika dan Statistika
Publisher : Jurusan Matematika FMIPA Universitas Jember

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.19184/mims.v24i1.40360

Abstract

This research intends to model high-dimensional data that contains multicollinearity in four machine-learning algorithms: Random Forest, K-Nearest Neighbor, XGBoost, and Regression Tree. Previously, regularization was carried out with penalized ridge regression, least absolute shrinkage and selection operator (LASSO) regression, and Elastic Net regression. A total of 100 predictor variables and 1 response variable which are the Development Index 2022 data of 34 provinces in Indonesia from BPS were used and standardized. The simulation is also applied to highly correlated data on two distributions, uniform and normal with parameter values taken from existing empirical data. The results showed that the ridge regularization method is the best for producing accurate and stable predictions. Furthermore, there was no difference in the root mean square error (RMSE) results between the data with standardization and without standardization, wherein all the data analyzed it was found that the kNN model was better than other models on simulation data, and the Random Forest and XGBoost models were better than other models on empirical data. In addition, the Regression Tree model is not recommended according to the results of this study. Keywords: regularization, multicollinearity, ridge, LASSO, elastic netMSC2020: 62J07
Densely Connected dan Residual Convolutional Neural Network Untuk Estimasi Jumlah Keluarga Tingkat Desa Dengan Citra Satelit Siregar, Jodi jhouranda; Kurnia, Anang; Sadik, Kusman
SINTECH (Science and Information Technology) Journal Vol. 5 No. 2 (2022): SINTECH Journal Edition Oktober 2022
Publisher : Prahasta Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31598/sintechjournal.v5i2.1191

Abstract

Indonesia conducts a population census every ten years to collect population data. Variables such as family count are collected to provide general population information for policy making and sampling frames. Indonesia as an archipelagic country with an area of 8.3 million km2 will require a lot of resources to collect such data. In the age of big data, satellite imagery has become more available and inexpensive. In this study, we used West Java as a case study for applying deep learning to estimate family counts at the village level. Sentinel-2 and SPOT-67 data are used to model family counts. Using xgboost, we regress the family count with the softmax probability, resulting from family density classification using deep learning (densenet121 and resnet50 ) as the input. With an R2 of 0.93 and a MAPE of 19%, the regression model provides a good prediction of the number of families in the census. Regarding the input data, Sentinel-2 is sufficient to accomplish this task as there is no significant difference from the modeling results with higher resolution images (SPOT 6-7). The input level in the form of a segment of the estimation area and using structured auxiliary variables also deliver better predictions
N-Level Structural Equation Models (nSEM): The Effect of Sample Size on the Parameter Estimation in Latent Random-Intercept Model Eminita, Viarti; Saefuddin, Asep; Sadik, Kusman; Syafitri, Utami Dyah
InPrime: Indonesian Journal of Pure and Applied Mathematics Vol 6, No 1 (2024)
Publisher : Department of Mathematics, Faculty of Sciences and Technology, UIN Syarif Hidayatullah

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15408/inprime.v6i1.38914

Abstract

Multilevel Structural Equation Modeling (MSEM) is claimed to address hierarchical data structures and latent response variables, but it becomes unstable with an increasing number of levels. N-Level SEM (nSEM) is an SEM framework designed to handle a growing number of levels in the model. The nSEM framework uses the Maximum Likelihood Estimation (MLE) method for parameter estimation, which requires a large sample size and correct model specification. Therefore, it is essential to consider the necessary minimal sample size to ensure accurate and efficient parameter estimation in the nSEM model. This study examined how sample size affects the performance of parameter estimators in nSEM models. We propose a method to evaluate the effect of many environments to estimate the results of factor loadings and environmental variance produced by the model. In addition, we also assess the impact of environment size on the estimation results of factor loadings and individual variance. The results were then applied to actual data on student mathematics learning motivation in Depok. The findings show that neither the number of environments nor the size of the environment affects the performance of fixed parameter estimation in the nSEM model. nSEM indicates excellent performance in estimating environmental variance at level 2 when the number of environments increases. Conversely, increasing the size of the environment worsens the performance of estimating individual variance parameters. Overall, the nSEM framework for the latent random-intercept (LatenRI) model performs well with increasing sample sizes. The application data on LatenRI models show almost similar estimation results.Keywords: Hierarchical data; Latent random intercept model; Multilevel structural equation modeling; n-Level structural equation modeling.AbstrakMultilevel Structural Equation Modeling (MSEM) diklaim dapat mengatasi struktur data hierarki dan variabel respons laten, namun menjadi tidak stabil dengan bertambahnya jumlah level. N-Level SEM (nSEM) adalah kerangka kerja SEM yang dirancang untuk menangani semakin banyak level dalam model. Kerangka kerja nSEM menggunakan metode Maximum Likelihood Estimation (MLE) untuk estimasi parameter, yang memerlukan ukuran sampel yang besar dan spesifikasi model yang benar. Oleh karena itu, penting untuk mempertimbangkan ukuran sampel minimal yang diperlukan untuk memastikan estimasi parameter yang akurat dan efisien dalam model nSEM. Studi ini menguji bagaimana ukuran sampel mempengaruhi kinerja penduga parameter dalam model nSEM. Kami mengusulkan metode untuk mengevaluasi pengaruh banyak lingkungan dalam memperkirakan hasil factor loadings  dan varians lingkungan yang dihasilkan oleh model. Selain itu, kami juga menilai dampak ukuran lingkungan terhadap hasil estimasi factor loadings dan varians individu. Hasilnya kemudian diterapkan pada data aktual motivasi belajar matematika siswa di Depok. Hasil menunjukkan bahwa baik jumlah lingkungan maupun ukuran lingkungan tidak mempengaruhi kinerja estimasi parameter tetap pada model nSEM. nSEM menunjukkan kinerja yang sangat baik dalam memperkirakan varians lingkungan pada level 2 ketika jumlah lingkungan meningkat. Sebaliknya, peningkatan ukuran lingkungan akan memperburuk kinerja pendugaan parameter varians individu. Secara keseluruhan, kerangka nSEM untuk model intersepsi acak laten (LatenRI) bekerja dengan baik dengan meningkatnya ukuran sampel. Data penerapan model LatenRI menunjukkan hasil estimasi yang hampir serupa.Kata Kunci: Data hirarki; Model intersep acak laten; Model persamaan structural multilevel; Model persamaan structural n-level. 2020MSC: 62D99
Loan Approval Classification Using Ensemble Learning on Imbalanced Data Anadra, Rahmi; Sadik, Kusman; Soleh, Agus M; Astari, Reka Agustia
Enthusiastic : International Journal of Applied Statistics and Data Science Volume 4 Issue 2, October 2024
Publisher : Universitas Islam Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20885/enthusiastic.vol4.iss2.art1

Abstract

Loan processing is an important aspect of the financial industry, where the right decisions must be made to determine loan approval or rejection. However, the issue of default by loan applicants has become a significant concern for financial institutions. Hence, ensemble learning needs to be used with random forest and Extreme Gradient Boosting (XGBoost) algorithms. Unbalanced data are handled using the Synthetic Minority Over-sampling Technique (SMOTE). This research aimed to improve accuracy and precision in credit risk assessment to reduce human workload. Both algorithms used a dataset of 4,296 with 13 variables relevant to making loan approval decisions. The research process involved data exploration, data preprocessing, data sharing, model training, model evaluation with accuracy, sensitivity, specificity, and F1-score, model selection with 10-fold cross-validation, and important variables. The results showed that XGBoost with imbalanced data handling had the highest accuracy rate of 98.52% and a good balance between sensitivity of 98.83%, specificity of 98.01, and F1-score of 98.81%. The most important variables in determining loan approval are credit score, loan term, loan amount, and annual income.
Classification Performance of Stacking Ensemble with Meta-Model of Categorical Principal Component Logistic Regression on Food Insecurity Data Pangestika, Dhita Elsha; Fitrianto, Anwar; Sadik, Kusman
Scientific Journal of Informatics Vol. 11 No. 4: November 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i4.15315

Abstract

Purpose: Stacking is one type of ensemble whose base-models use different algorithms. The classification results from its base-models are categorical and tend to be associated with each other. They then become input for the stacking meta-model. However, there are no currently definite rules for determining the classifier that becomes the meta-model in stacking. On the other hand, recent research has found that CATPCA-LR can work well on categorical predictor variables associated with each other. Therefore, this study focuses on the classification performance of the stacking algorithm with the CATPCA-LR meta-model. Methods: The study compared the classification performance stacking with CATPCA-LR meta-model to stacking with other meta-models (random forest, gradient boost, and logistic regression) and its base-models (random forest, gradient boost, extreme gradient boost, extra trees, light gradient boost). This research used food insecurity data from March 2022. Result: The stacking algorithm with the CATPCA-LR meta-model performs better insecurity data regarding sensitivity, balanced accuracy, F1-Score, and G-Means values. This model offers a sensitivity of 46.28%, a balanced accuracy of 59.82%, an F1-Score of 37.82%, and a G-Means of 58.26%. Meanwhile, regarding specificity values, the light gradient boost (LGB) algorithm gives the highest value compared to other algorithms. This model provides a specificity value of 88.40%. Generally, the stacking with the CATPCA-LR meta-model algorithm provides the best performance compared with other algorithms on food insecurity data. Novelty: This research has explored a stacking classification performance with CATPCA-LR as meta-model.
PENERAPAN ANALISIS REGRESI LOGISTIK ORDINAL MULTILEVEL DENGAN BAYESIAN DALAM MEMODELKAN TINGKAT KESEJAHTERAAN DATA P3KE Hermawati, Neni; Susetyo, Budi; Sadik, Kusman
Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika Vol. 6 No. 1 (2025): Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistik
Publisher : LPPM Universitas Bina Bangsa

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.46306/lb.v6i1.918

Abstract

The science of statistics is required to continue to develop following the times, because the more the characteristics of the data available in the field are increasingly diverse. The more types of data, the more statistical analysis methods are developed, including hierarchical structured data. P3KE data is new data complementing DTKS which is the basis for the government in distributing social assistance. P3KE data becomes a reference in determining KPM BLT DD. Wanasari is a village that is accustomed to determining KPM BLT DD based on the results of deliberations at the kedusunan level (Musdus). In Wanasari Village, there is often a problem of inconsistency between the KPM candidates from the Musdus and the P3KE data from BKKBN provided through the Cianjur District government. Therefore, it is necessary to analyze the components that have a significant effect on the Welfare Decile of Wanasari Village P3KE data. The data is considered to be hierarchically structured with ordinal response variables. Therefore, multilevel ordinal logistic regression analysis with Bayesian parameter estimation will be used to obtain the best model. Normal (0.10) and Cauchi (0.2.5) priors were compared to find the best model. The results show that the P3KE data of Wanasari Village is hierarchical data because the results of two-level logistic regression analysis are better than one level. The study also concluded that Bayesian parameter estimation is better when using Cauchy prior (0.2.5) both for β coefficient estimation and inter-departmental diversity estimation. The best model obtained is able to explain the diversity between neighborhoods by 1.07 and has an accuracy of 63.23%. Predictor variables that have a significant effect include civil registration equivalents, having money/jewelry/livestock/etc. saved, wall type, cooking fuel, drinking water source, stunting risk, and number of households.
Comparison of Ensemble Learning Methods in Classifying Unbalanced Data on the Bank Marketing Dataset Hasnataeni, Yunia; Sadik, Kusman; Soleh, Agus M; Astari, Reka Agustia
Inferensi Vol 8, No 1 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i1.20569

Abstract

The banking industry is experiencing rapid growth, particularly in telemarketing strategies to increase product and service sales. Despite widespread use, these strategies need higher success rates due to data imbalance, where fewer customers accept offers than those who reject them. This study evaluates machine learning algorithms, including Random Forest, Gradient Boosting, Extra Trees, and AdaBoost, without and handling imbalanced data using the Random Over-Sampling Examples (ROSE) method. The evaluation covers accuracy, precision, recall, F1-score, and AUC of the ROC curve. Results indicate that Random Forest and AdaBoost consistently perform well, with Random Forest maintaining a high accuracy of 91.00% after handling imbalanced data. Gradient Boosting and Extra Trees improve in precision post-oversampling. All models exhibit high AUC values, close to 0.94, demonstrating excellent differentiation between positive and negative classes. The study concludes that addressing data imbalance enhances model performance, making these models suitable for effective telemarketing strategies in the banking sector.
Analyzing multilevel model of educational data: Teachers’ ability effect on students’ mathematical learning motivation Eminita, Viarti; Saefuddin, Asep; Sadik, Kusman; Syafitri, Utami Dyah
Journal on Mathematics Education Vol. 15 No. 2 (2024): Journal on Mathematics Education
Publisher : Universitas Sriwijaya in collaboration with Indonesian Mathematical Society (IndoMS)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22342/jme.v15i2.pp431-450

Abstract

Motivation to learn mathematics decreased due to the inability of teachers to implement innovative learning models and techniques. Therefore, this study aimed to investigate the effects of teachers' ability on students' motivation to learn mathematics by using quantitative methods and survey approaches. There were 32 mathematics teachers and 542 students in the 24 schools within the Depok region, selected as respondents through a stratified random sampling method. The research instruments of two questionnaires of teachers’ competence and students’ learning motivation were distributed to the respondents. Data analysis was conducted to test the random effect of teachers’ ability on students’ motivation to learn mathematics by using the effect of teachers’ random intercepts and competence as models 1 and 2, respectively. These two models were analyzed using the n-level Structural Equation Model (nSEM), and the result showed that model 2 was the best one to investigate the random effect of teachers’ ability and students’ learning motivation. The data analysis showed that the variance among teachers’ ability (0,0027) was less than learning motivation among students (0.0597). These findings indicated that the motivation levels of students taught by the same teacher varied significantly, whereas the effects of the teachers were relatively homogeneous. In other words, teachers’ ability was somewhat the same in increasing students’ learning motivation. Based on these findings, this research work suggests teachers keep improving their teaching techniques. Hence, students will be well motivated to learn so that the learning objectives will be well achieved.
Simulation and Empirical Studies of Long Short-Term Memory Performance to Deal with Limited Data Khikmah, Khusnia Nurul; Sadik, Kusman; Notodiputro, Khairil Anwar
JOIN (Jurnal Online Informatika) Vol 10 No 1 (2025)
Publisher : Department of Informatics, UIN Sunan Gunung Djati Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15575/join.v10i1.1356

Abstract

This research is proposed to determine the performance of time series machine learning in the presence of noise, where this approach is intended to forecast time series data. The approach method chosen is long short-term memory (LSTM), a development of recurrent neural network (RNN). Another problem is the availability of data, which is not limited to high-dimensional data but also limited data. Therefore, this study tests the performance of long short-term memory using simulated data, where the simulated data used in this study are data generated from the functional autoregressive (FAR) model and data generated from the functional autoregressive model of order 1 FAR(1) which is given additional noise. Simulation results show that the long short-term memory method in analyzing time series data in the presence of noise outperforms by 1-5% the method without noise and data with limited observations. The best performance of the method is determined by testing the analysis of variance against the mean absolute percentage error. In addition, the empirical data used in this study are the percentage of poverty, unemployment, and economic growth in Java. The method that has the best performance in analyzing each poverty data is used to forecast the data. The comparison result for the empirical data is that the M-LSTM method outperforms the LSTM in analyzing the poverty percentage data. The best method performance is determined based on the average value of the mean absolute percentage error of 1-10%.
Klasifikasi Halaman SEO Berbasis Machine Learning Melalui Mutual Information dan Random Forest Feature Importance NURADILLA, SITI; SADIK, KUSMAN; SUHAENI, CICI; SOLEH, AGUS M
MIND (Multimedia Artificial Intelligent Networking Database) Journal Vol 10, No 1 (2025): MIND Journal
Publisher : Institut Teknologi Nasional Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26760/mindjournal.v10i1.114-129

Abstract

AbstrakProses optimasi SEO melibatkan banyak faktor yang saling terkait, sehingga sulit bagi tim SEO dalam menentukan halaman mana yang memerlukan perbaikan lebih lanjut. Penelitian ini bertujuan untuk mengembangkan model berbasis machine learning yang tidak hanya akurat dalam mengklasifikasikan halaman, tetapi juga efisien dalam memilih fitur yang paling informatif. Metode yang digunakan dalam penelitian ini melibatkan seleksi fitur menggunakan Mutual Information (MI) dan Random Forest Feature Importance (RFFI) untuk mengidentifikasi faktor-faktor yang paling penting untuk optimasi SEO, yang dimodelkan menggunakan Random Forest dan Weighted Voting Ensemble (WVE). Model dievaluasi berdasarkan Accuracy, Precision, Recall, dan ROC AUC. Hasil penelitian menunjukkan bahwa model Random Forest dengan 20 fitur berdasarkan RFFI, memberikan performa terbaik dengan ROC AUC sebesar 75.87%, Accuracy 77,74%, Precision 60,51%, dan Recall 71.29%. Model mampu membedakan secara efektif halaman yang membutuhkan optimasi SEO atau tidak.Kata kunci: Feature Importance, Random Forest, SEO, Seleksi Variabel, WVEAbstractThe SEO optimization process involves many interrelated factors, making it challenging to identify which pages need further improvement. This study proposes a machine learning-based model that is accurate in classifying web pages and efficient in selecting the most relevant features. Feature selection is performed using Mutual Information (MI) and Random Forest Feature Importance (RFFI) to identify key factors for SEO optimization, followed by modeling with Random Forest and Weighted Voting Ensemble (WVE). The model is evaluated using Accuracy, Precision, Recall, and ROC AUC. Results indicate that the Random Forest model with 20 features selected via RFFI delivers the best performance, achieving a ROC AUC of 75.87%, Accuracy of 77.74%, Precision of 60.51%, and Recall of 71.29%. The model effectively distinguishes between pages that require SEO optimization and those that do not.Keywords: Feature Importance, Random Forest, SEO, Variable Selection, WVE
Co-Authors . Erfiani . Indahwati A.Tuti Rumiati Aam Alamudi Abdullah, Adib Roisilmi Achmad Fauzan Agus Mohamad Soleh Ahmad Rifai Nasution Aji Hamim Wigena Akbar Rizki Akbar Rizki Akmala Firdausi Alfiryal, Naufalia Amalia, Rahmatin Nur Anadra, Rahmi Ananda Shafira Anang Kurnia Andespa, Reyuli Andi Okta Fengki ASEP SAEFUDDIN Astari, Reka Agustia Astari, Reka Agustia Aulya Permatasari Azka Ubaidillah Bagus Sartono Budi Susetyo Budi Susetyo Cici Suhaeni Cici Suhaeni Dian Handayani Dito, Gerry Alfa Dwi Agustin Nuriani Sirodj Efriwati Efriwati Embay Rohaeti Eminita, Viarti EVITA PURNANINGRUM Fahira, Fani FARDILLA RAHMAWATI Farit Mochamad Afendi Fitrianto, Anwar Freya, Wa Ode Rona Gerry Alfa Dito Haikal, Husnul Aris Hari Wijayanto Hasnataeni, Yunia Hazan Azhari Zainuddin Hermawati, Neni I Gusti Ngurah, Sentana Putra I Made Sumertajaya I Wayan Mangku Indahwati Indahwati Indahwati Intan Arassah, Fradha Iqbal, Teuku Achmad Isnanda, Eriski Kamila, Sabrina Adnin Khairi A N Khairil Anwar Notodiputro Khikmah, Khusnia Nurul khusnul khotimah Khusnul Khotimah Kusni Rohani Rumahorbo Latifah, Leli Lili Puspita Rahayu Logananta Puja Kusuma M Soleh, Agus Mochamad Ridwan Mochamad Ridwan, Mochamad Mohammad Masjkur Muh Nur Fiqri Adham Muhammad Yusran Mulianto Raharjo Naima Rakhsyanda Nisrina Az-Zahra, Putri Nur Khamidah NURADILLA, SITI Nusar Hajarisman Pangestika, Dhita Elsha Parwati Sofan, Parwati Purnama Sari Rakhsyanda, Naima Rifqi Aulya Rahman Rita Rahmawati Rizaldi Boer Rizki, Akbar Rizqi, Tasya Anisah ROCHYATI ROCHYATI Rumahorbo, Kusni Rohani Sahamony, Nur Fitriyani Saleh, Agus Muhammad Satriyo Wibowo Sentana Putra, I Gusti Ngurah Siregar, Jodi jhouranda Siti Aisyah Siti Raudlah Sitti Nurhaliza Soleh, Agus M Suhaeni, Cici Sundari, Marta Supriatin, Febriyani Eka Tendi Ferdian Diputra Titin Suhartini Titin Suhartini, Titin Tri Wahyuni Uswatun Hasanah Utami Dyah Syafitri Viarti Eminita Widhiyanti Nugraheni Yenni Angraini Yenni Kurniawati Yuli Eka Putri Zafira Fakhriyah