Claim Missing Document
Check
Articles

Found 3 Documents
Search

ENSEMBLE-BASED LOGISTIC REGRESSION ON HIGH-DIMENSIONAL DATA: A SIMULATION STUDY Widhianingsih, Tintrim Dwi Ary; Kuswanto, Heri; Prastyo, Dedy Dwi
MEDIA STATISTIKA Vol 17, No 1 (2024): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.17.1.13-24

Abstract

Dramatic computation growth encourages big data era, which induces data size escalation in various fields. Apart from huge sample size, cases arise high-dimensional data having more feature size than its samples. High-computing power compels the usage of modern approaches to deal with this typical dataset, while in practice, common logistic regression method is yet applied due to its simplicity and explainability. Applying logistic regression on high-dimensional data arises multicollinearity, overfitting, and computational complexity issues. Logistic Regression Ensemble (Lorens) and Ensemble Logistic Regression (ELR) are the logistic-regression-based alternative methods proposed to solve these problems. Lorens adopts ensemble concept with mutually exclusive feature partitions to form several subsets of data, while ELR involves feature selection in the algorithm by drawing part of features based on probability ranking value. This paper uncovers the effectiveness of Lorens and ELR applied to high-dimensional data classification through simulation study under three different scenarios, i.e., with feature size variation, for imbalanced high-dimensional data, and under multicollinearity conditions. Our simulation study reveals that ELR outperforms Lorens and obtains more stable performance over different feature sizes and imbalanced data settings. On the other hand, Lorens achieves more reliable performance than ELR on a simulation study with a multicollinearity issue.
Per capita expenditure prediction using model stacking based on satellite imagery Kuswanto, Heri; Rouhan, Asva Abadila; Qori’atunnadyah, Marita; Hia, Supriadi; Fithriasari, Kartika; Widhianingsih, Tintrim Dwi Ary
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 14, No 2: April 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v14.i2.pp1220-1231

Abstract

One of the indicators for measuring poverty is per capita expenditure. However, collecting timely and reliable per capita expenditure data is quite challenging and expensive, as it requires collecting detailed household data directly. One way to deal with this issue is to use satellite image data processed by machine learning methods. This research proposes a method to predict the per capita expenditure of regencies or cities in Indonesia based on satellite imagery using machine learning techniques, such as k-nearest neighbors (KNN), random forest (RF), extreme gradient boosting (XGBoost), and support vector machine (SVM). The predictions are stacked to predict per capita expenditure using least absolute shrinkage and selection operator (LASSO) regression as the meta-learner. The model is trained on Google-Earth-based satellite imagery of Java Island, Indonesia, which provides more update field conditions compared to data collected from Statistics Indonesia (BPS). The research found that the stacked model outperforms the individual methods. However, the R2 criterion of the stacked method is comparable to that of RF, which is slightly higher than the others.
Predictive Analytics for Property Valuation Using Random Forest in Malang City Noorihsan, Sandrian Yulian Firmansyah; Widhianingsih, Tintrim Dwi Ary; Kuswanto, Heri
MALCOM: Indonesian Journal of Machine Learning and Computer Science Vol. 6 No. 1 (2026): MALCOM January 2026
Publisher : Institut Riset dan Publikasi Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.57152/malcom.v6i1.2411

Abstract

The property market in Malang City continues to expand alongside rising housing demand, yet limited price transparency still constrains informed decision-making for buyers, sellers, and developers. This study develops a data-driven property price prediction model using the Random Forest algorithm, selected for its robustness and ability to capture complex nonlinear relationships. An initial dataset of 4,358 property listings was collected through web scraping from Rumah123.com, and after thorough preprocessing including data cleaning, handling missing values, and feature refinement 1,573 valid observations remained for analysis. The model incorporates key property characteristics, covering temporal variables (month, year), physical attributes (land area, building area, number of bedrooms and bathrooms, electricity capacity, number of floors), property characteristics (certificate type, property type, property condition, furniture condition, hook position), and price information. Using optimally tuned hyperparameters, the final Random Forest model achieved an R² of 76.66% and a MAPE of 25.27%, indicating strong predictive performance relative to standard regression benchmarks. These findings offer managerial implications by providing objective, data-driven price estimates that can support developers, agents, and prospective buyers in pricing decisions, marketing strategies, and fair value assessments during negotiations.