Garuda - Garba Rujukan Digital

p-Index From 2021 - 2026

4.006

P-Index

This Author published in this journals

All Journal KURVA S JURNAL MAHASISWA Jurnal Konstruksia Jurnal Media Gizi Indonesia (MGI) QALAMUNA: Jurnal Pendidikan, Sosial, dan Agama AGREGAT Jurnal AKP JOURNAL OF APPLIED INFORMATICS AND COMPUTING Jurnal Teknik Informatika UNIKA Santo Thomas Fair Value: Jurnal Ilmiah Akuntansi dan Keuangan Jurnal Teknik Informatika (JUTIF) jurnal dikdas bantara Jurnal Teknik Informatika Unika Santo Thomas (JTIUST) Jurnal Multidisiplin Madani (MUDIMA) Jurnal Pengabdian Masyarakat Bhinneka International Journal of Civil Engineering and Infrastructure (IJCEI) Jurnal Pengabdian Masyarakat Bangsa Prosiding Seminar Nasional Rekayasa dan Teknologi (TAU SNAR- TEK) Mu'asyarah Media of Computer Science Prosiding Seminar Nasional CORISINDO

Andika Setiawan, Andika

Unknown Affiliation

Author-ID : 584300

Religion Agriculture, Biological Sciences & Forestry Arts Humanities Automotive Engineering Chemical Engineering, Chemistry & Bioengineering Civil Engineering, Building, Construction & Architecture Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Economics, Econometrics & Finance Education Electrical & Electronics Engineering Energy Engineering Environmental Science Health Professions Industrial & Manufacturing Engineering Languange, Linguistic, Communication & Media Law, Crime, Criminology & Criminal Justice Library & Information Science Mathematics Medicine & Pharmacology Nursing Physics Public Health Social Sciences Transportation Other

Published : 29 Documents Claim Missing Document

Claim Missing Document

Articles

Title

Preventing Data Leakage in Classification via Integrated Machine Learning Pipelines: Preprocessing, Feature Transformation, and Hyperparameter Tuning Ichwani, Arief; Kesuma, Rahman Indra; Setiawan, Andika; Wicaksono, Imam Eko; Hanifah, Raidah
Jurnal Teknik Informatika (Jutif) Vol. 7 No. 1 (2026): JUTIF Volume 7, Number 1, February 2026
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2026.7.1.5490

Data leakage in machine learning classification often leads to overfitting, inflated performance estimates, and poor reproducibility, which can undermine the reliability of deployed models and incur industrial losses. This paper addresses the leakage problem by proposing an integrated machine learning pipeline that strictly isolates training and evaluation processes across preprocessing, feature transformation, and model optimization stages. Experiments are conducted on the Titanic passenger survival dataset, where exploratory data analysis identifies data quality issues, followed by stratified train-test splitting to preserve class distribution. All preprocessing steps, including missing value imputation, categorical encoding, and feature scaling, are applied exclusively to the training data using a ColumnTransformer embedded within a unified Pipeline. A K-Nearest Neighbors (KNN) classifier is employed, with hyperparameters optimized via GridSearchCV and 3-fold cross-validation. Experimental results show that a baseline model without leakage control achieves only 72.62% test accuracy and exhibits a substantial overfitting gap. In contrast, the proposed pipeline-based approach improves generalization, achieving 78.21% test accuracy with an optimal configuration of k = 29 and Manhattan distance while significantly reducing overfitting. The main contribution of this work is the formulation of a reproducible, leakage-aware pipeline guideline that ensures unbiased evaluation and reliable generalization in classification tasks, providing practical methodological insights for both academic research and real-world machine learning applications.

Co-Authors Adi Sucipto, Adi Afit Miranto agung nusantoro Agung, Ahmad Anggraria Jaya Aji, Abdul Alrando, Raja Aksana Anas Tamsuri Anggraini, Leslie Aprilianda, Mohamad Meazza Arif Ashari Astuti, Resti Dwi Badaruddin Badaruddin Badaruddin Bagaskara, Radhinka Basit Al Hanif Ba’its, Alfian Kafilah Cahyo Untoro, Meida Deby Puspitaningrum Demi Dama Yanti Desty Ervira Puspaningtyas Drantantiyas, Nike Dwi Grevika Dwi Yulina Abdi Jayanti Eko Prasetyo, Harwidyo Elityasari, Cindy Nur Faisal, Amir Farahdiba Farahdiba Febrianto, Andre Fil’aini, Raizummi Fitrawan, Mhd. Kadar G. Gunawan, G. Habeahan, Angelina Hanif, Farhan Harmiansyah Hartanto Tantriawan Harwidyo Eko Prasetyo Haryo Koco Buwono Heri Khoeri Herlina, Idra Himawan, Prasta Genie Ichwani, Arief Indarto Indarto Irwanto, Rachmad Kisna Pertiwi, Kisna Kuswidyanarko , Arief Listiani, Amalia Manurung, Jefri Marhamah . Mario, Frisel Martin Clinton Tosima Manullang, Martin Clinton Tosima Miranda, Anne Putri Mufidah, Zunanik Muhammad Yogi Saputra Mustakim, Fahmi Novriani, Shinta Nusyura Al Islami, Aulia Pratama, Rizky Nisa Putri, Arie Pradina Rahman Indra Kesuma Rahmat Kurniawan Raidah Hanifah Rakhman, Arkham Zahri Ramadhani, Uri Arta REZA PAHLEVI Satiawan, Budi Satya Soeratmodjo, Irnanda Satya Soerjatmodjo, Irnanda Siska Puspita Sari Sitorus, Iwan Romadhan Soerjatmodjo, Irnanda Satya Sofiana, Dini sutik, sutik Tuta, Gabriel Vanes Sabat Untoro, Meida Cahyo Wicaksana, Galih Wicaksono, Imam Eko Wijayanti, Rini Windasari, Liska Yulia, Alya Yulita, Winda Yuni Afriani, Yuni Yusmita, Yusmita

Title Search

Found 1 Documents Search Journal : Jurnal Teknik Informatika (JUTIF)

Abstract

Title

Found 1 Documents
Search
Journal : Jurnal Teknik Informatika (JUTIF)