Claim Missing Document
Check
Articles

Found 2 Documents
Search

ENSEMBLE-BASED LOGISTIC REGRESSION ON HIGH-DIMENSIONAL DATA: A SIMULATION STUDY Widhianingsih, Tintrim Dwi Ary; Kuswanto, Heri; Prastyo, Dedy Dwi
MEDIA STATISTIKA Vol 17, No 1 (2024): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.17.1.13-24

Abstract

Dramatic computation growth encourages big data era, which induces data size escalation in various fields. Apart from huge sample size, cases arise high-dimensional data having more feature size than its samples. High-computing power compels the usage of modern approaches to deal with this typical dataset, while in practice, common logistic regression method is yet applied due to its simplicity and explainability. Applying logistic regression on high-dimensional data arises multicollinearity, overfitting, and computational complexity issues. Logistic Regression Ensemble (Lorens) and Ensemble Logistic Regression (ELR) are the logistic-regression-based alternative methods proposed to solve these problems. Lorens adopts ensemble concept with mutually exclusive feature partitions to form several subsets of data, while ELR involves feature selection in the algorithm by drawing part of features based on probability ranking value. This paper uncovers the effectiveness of Lorens and ELR applied to high-dimensional data classification through simulation study under three different scenarios, i.e., with feature size variation, for imbalanced high-dimensional data, and under multicollinearity conditions. Our simulation study reveals that ELR outperforms Lorens and obtains more stable performance over different feature sizes and imbalanced data settings. On the other hand, Lorens achieves more reliable performance than ELR on a simulation study with a multicollinearity issue.
Per capita expenditure prediction using model stacking based on satellite imagery Kuswanto, Heri; Rouhan, Asva Abadila; Qori’atunnadyah, Marita; Hia, Supriadi; Fithriasari, Kartika; Widhianingsih, Tintrim Dwi Ary
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 14, No 2: April 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v14.i2.pp1220-1231

Abstract

One of the indicators for measuring poverty is per capita expenditure. However, collecting timely and reliable per capita expenditure data is quite challenging and expensive, as it requires collecting detailed household data directly. One way to deal with this issue is to use satellite image data processed by machine learning methods. This research proposes a method to predict the per capita expenditure of regencies or cities in Indonesia based on satellite imagery using machine learning techniques, such as k-nearest neighbors (KNN), random forest (RF), extreme gradient boosting (XGBoost), and support vector machine (SVM). The predictions are stacked to predict per capita expenditure using least absolute shrinkage and selection operator (LASSO) regression as the meta-learner. The model is trained on Google-Earth-based satellite imagery of Java Island, Indonesia, which provides more update field conditions compared to data collected from Statistics Indonesia (BPS). The research found that the stacked model outperforms the individual methods. However, the R2 criterion of the stacked method is comparable to that of RF, which is slightly higher than the others.