Divo Dharma Silalahi
Data Science Department, Faculty Of Science And Informatics, Universitas Pertiba, Bangka Belitung

Published : 3 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 3 Documents
Search

Robust Method with Cross-Validation in Partial Least Square Regression Sibuea, Nuraini; Syamsudhuha, Syamsudhuha; Adnan, Arisman; Silalahi, Divo Dharma
Journal of Mathematics, Computations and Statistics Vol. 8 No. 1 (2025): Volume 08 Nomor 01 (April 2025)
Publisher : Jurusan Matematika FMIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/jmathcos.v8i1.4766

Abstract

Partial Least Squares Regression (PLSR) is a multivariate analysis technique used to handle data with highly correlated predictor variables or when the number of predictor variables exceeds the number of samples. PLSR is not robust to outliers, which can disrupt the stability and accuracy of the model. Cross-validation is an important approach to improve model reliability, particularly in data that contains outliers. This study aims to evaluate the effectiveness of K-fold cross-validation and nested cross-validation in a PLSR model using NIRS data from oil palm plantation soil that contains outliers. The methods used in this study include outlier identification using RBF kernel PCA, followed by the application of K-fold cross-validation and nested cross-validation in the PLSR model. The evaluation is based on the Root Mean Square Error (RMSE) and the Coefficient of Determination (R²). The results show that nested cross-validation performs better than K-fold cross-validation. Nested cross-validation results in lower RMSE and higher R², both with and without outliers. K-fold cross-validation is more susceptible to overfitting, whereas nested cross-validation is more effective in mitigating the impact of outliers and improving model accuracy. The conclusion of this study is that nested cross-validation outperforms K-fold cross-validation in improving prediction accuracy and the stability of the PLSR model, especially in data containing outliers. It is recommended to use nested cross-
Understanding Tourism Behaviour Through Exploratory Data Analysis with Machine Learning on Search Engine Data: Case Study in Bangka Belitung Islands, Indonesia Divo Dharma Silalahi; Nidia Mindiyarti
International Journal of Artificial Intelligence Research Vol 8, No 1.1 (2024)
Publisher : STMIK Dharma Wacana

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29099/ijair.v8i1.1.1372

Abstract

Analyzing tourist behaviour through Google search data offers a dynamic, real-time approach to understanding travel preferences. This study employs Exploratory Data Analysis (EDA) alongside machine learning techniques such as hierarchical clustering, Principal Component Analysis (PCA), Strength Variables Index (SVI), heat map generation, and correlation matrix analysis to explore the key tourism drivers in Bangka Belitung. These drivers are categorized into demand-side factors—including tourist preferences, curiosity, seasonality, and economic conditions—and supply-side factors, such as transportation, accommodation, activities, pricing, culinary tourism, and local attractions. The findings reveal that transportation and accommodation consistently emerge as the most influential drivers in both regions, highlighting the importance of accessibility and lodging availability. Bangka emphasizes culinary experiences and price sensitivity, while Belitung is more influenced by economic conditions and seasonality. Peak tourism periods are identified during Chinese New Year in February, New Year, and mid-year school holidays in June to July. In Belitung, culinary tourism and seasonal activities see increased interest during February and October, while Bangka shows steady interest in beach-related activities and culinary offerings throughout the year. . Misalignment between supply-side factors, such as limited affordable accommodation or transportation options, can impact tourism performance during these periods. These insights offer practical recommendations for local governments, tourism boards, and businesses to refine marketing strategies, enhance tourist experiences, and optimize tourism infrastructure. . Focusing on affordable travel and culinary experiences for Bangka seasonal tourism and economic preferences for Belitung can help maximize tourism potential and drive sustainable growth in the region
STATISTICAL MODELING FOR DOWNSCALING USING PRINCIPAL COMPONENT REGRESSION AND DUMMY VARIABLES: A CASE OF SIAK DISTRICT Adnan, Arisman; Alika, Elsa Riesta; Silalahi, Divo Dharma; Aulia, Felia Rizki; Erda, Gustriza
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 20 No 2 (2026): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol20iss2pp1643-1658

Abstract

Indonesia, as a tropical country, is characterized by two primary seasons: the rainy season and the dry season. It is evident that meteorological shifts can exert considerable influence on the agricultural sector, a notable example being the cultivation of palm oil. Consequently, the ability to predict rainfall has emerged as a pivotal element in the broader endeavor to mitigate the adverse effects of climate change. This study employs statistical downscaling using the Principal Component Regression (PCR) approach to model rainfall predictions. The issue of multicollinearity, a common occurrence in Global Circulation Model (GCM) data, is addressed through the use of Principal Component Regression (PCR). This method has been demonstrated to stabilize the model structure and reduce variance in the regression coefficients. The data utilized encompass observed rainfall from LIBO Estate, which is owned by PT SMART Tbk (SMART Research Institute), for the period from 2013 to 2022. This data serves as the response variable, while the CMIP6 GCM simulation output data functions as the predictor variable. The findings indicated that the initial PCR model exhibited an RMSE value ranging from 97.06 to 131.69, along with an R² value ranging from 14.25% to 20.49%. The incorporation of dummy variables into the model resulted in a substantial enhancement in its performance, as evidenced by a decline in RMSE to 24.46–35.83 and an increase in R² to 89.02%–90.24%. The findings indicate that the use of PCR with dummy variables is an effective approach for enhancing the accuracy of rainfall modeling through statistical downscaling.