Arianti, Reza
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

A Comparative Study of Generalized Linear Mixed Model and Mixed Effects Random Forest for Analyzing Data with Outliers Arianti, Reza; Notodiputro, Khairil Anwar; Angraini, Yenni
Jurnal Teknik Informatika (Jutif) Vol. 7 No. 2 (2026): JUTIF Volume 7, Number 2, April 2026
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2026.7.2.5407

Abstract

This study compares MERF and GLMM-NB in analyzing hierarchical data and focusing on the role of residual outliers and the application of winsorization. A two-stage analytical pipeline was implemented: (1) winsorization to reduce extreme residual values, and (2) model training using MERF and GLMM-NB. The dataset comes from the 2021 National Socio-Economic Survey (Susenas) in West Java Province, measuring tobacco consumption intensity. Two statistical approaches are compared, MERF and GLMM with a Negative Binomial distribution (GLMM-NB). Models were trained under two conditions: without winsorization (WIN0) and with two-sided 5% winsorization (WIN5). Winsorization was applied to the training data, and the test data were adjusted using thresholds from the training set. Model performance was assessed using Root Mean Squared Error (RMSE) and the train-test ratio. Under WIN0, GLMM recorded an RMSE of 49.65 for training and 42.27 for testing, while MERF achieved 35.96 and 39.94, respectively. After WIN5, GLMM showed a larger error reduction, with RMSE values of 34.90 (train) and 30.20 (test), while MERF dropped to 26.63 (train) and 28.64 (test). These results indicate that MERF provides higher predictive accuracy, whereas GLMM benefits more from winsorization. Household expenditure, employment status, age, and gender consistently emerged as key variables linked to tobacco consumption intensity. This study is the first to compare MERF and GLMM-NB with winsorization using Indonesia’s hierarchical data. The analytical framework helps inform public health policies aligned with SDG 3: Good Health and Well-being, particularly in reducing tobacco-related health risks.