Claim Missing Document
Check
Articles

Found 2 Documents
Search

Metode Robust K-Fold Cross Validation dengan Partial Least Square Regression pada Data Near Infrared Spectroscopy Sibuea, Nuraini; Syamsudhuha, Syamsudhuha; Adnan, Arisman
Seminar Nasional Teknologi Informasi Komunikasi dan Industri 2024: SNTIKI 16
Publisher : UIN Sultan Syarif Kasim Riau

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Penelitian ini mengevaluasi performa model Partial Least Square Regression (PLSR) dalam kondisi data dengan dan tanpa outlier. Penanganan data yang mengandung outlier digunakan metode k-fold cross validation yang diaplikasikan pada data Near Infrared Spectroscopy (NIRS) tanah perkebunan kelapa sawit terhadap pupuk nitrogen (N). Sebelum pengolahan data dilakukan terlebih dahulu pretreatment data untuk menghilangkan efek hamburan data dengan Standardized Normal Variate (SNV). Identifikasi outlier dilakukan dengan metode RBF Kernel PCA menghasilkan data yang termasuk outlier yaitu data ke 7, 8, 92, 93, dan 95. Hasil analisis menunjukkan bahwa keberadaan outlier secara signifikan menurunkan performa PLSR klasik dengan penurunan nilai R2 dan peningkatan nilai RMSE. Penerapan k-fold cross validation pada PLSR mampu meningkatkan robustitas model terhadap outlier dengan peningkatan nilai R2 meskipun sedikit peningkatan pada RMSE. Disimpulkan bahwa k-fold cross validation lebih efektif dalam menangani data set yang mengandung outlier sehingga memberikan prediktabilitas yang lebih stabil dibandingkan PLSR klasik.
Robust Method with Cross-Validation in Partial Least Square Regression Sibuea, Nuraini; Syamsudhuha, Syamsudhuha; Adnan, Arisman; Silalahi, Divo Dharma
Journal of Mathematics, Computations and Statistics Vol. 8 No. 1 (2025): Volume 08 Nomor 01 (April 2025)
Publisher : Jurusan Matematika FMIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/jmathcos.v8i1.4766

Abstract

Partial Least Squares Regression (PLSR) is a multivariate analysis technique used to handle data with highly correlated predictor variables or when the number of predictor variables exceeds the number of samples. PLSR is not robust to outliers, which can disrupt the stability and accuracy of the model. Cross-validation is an important approach to improve model reliability, particularly in data that contains outliers. This study aims to evaluate the effectiveness of K-fold cross-validation and nested cross-validation in a PLSR model using NIRS data from oil palm plantation soil that contains outliers. The methods used in this study include outlier identification using RBF kernel PCA, followed by the application of K-fold cross-validation and nested cross-validation in the PLSR model. The evaluation is based on the Root Mean Square Error (RMSE) and the Coefficient of Determination (R²). The results show that nested cross-validation performs better than K-fold cross-validation. Nested cross-validation results in lower RMSE and higher R², both with and without outliers. K-fold cross-validation is more susceptible to overfitting, whereas nested cross-validation is more effective in mitigating the impact of outliers and improving model accuracy. The conclusion of this study is that nested cross-validation outperforms K-fold cross-validation in improving prediction accuracy and the stability of the PLSR model, especially in data containing outliers. It is recommended to use nested cross-