p-Index From 2021 - 2026
7.937
P-Index
This Author published in this journals
All Journal IJCCS (Indonesian Journal of Computing and Cybernetics Systems) JIK Jurnal Ilmu Komputer Jurnal Simetris Format : Jurnal Imiah Teknik Informatika JURNAL MEDIA INFORMATIKA BUDIDARMA Jurnal Komputasi IKRA-ITH Informatika : Jurnal Komputer dan Informatika JURNAL REKAYASA TEKNOLOGI INFORMASI Jurnal Rekam Medis dan Informasi Kesehatan METHOMIKA: Jurnal Manajemen Informatika & Komputerisasi Akuntansi J I M P - Jurnal Informatika Merdeka Pasuruan JURNAL TEKNOLOGI DAN OPEN SOURCE IKRA-ITH ABDIMAS JOURNAL OF SCIENCE AND SOCIAL RESEARCH Jurnal Pendidikan dan Konseling Jurnal Mnemonic JATI (Jurnal Mahasiswa Teknik Informatika) Journal Sensi: Strategic of Education in Information System CICES (Cyberpreneurship Innovative and Creative Exact and Social Science) Jurnal Abdi Insani JTIULM (Jurnal Teknologi Informasi Universitas Lambung Mangkurat) JURNAL PENDIDIKAN, SAINS DAN TEKNOLOGI Jurnal Teknik Informatika (JUTIF) JPM: JURNAL PENGABDIAN MASYARAKAT INSERT: Information System and Emerging Technology Journal Jurnal Pengabdian dan Edukasi Sekolah (Jubaedah) International Journal of Social Science, Educational, Economics, Agriculture Research, and Technology (IJSET) Jurnal Impresi Indonesia Prosiding Seminar Nasional Sisfotek (Sistem Informasi dan Teknologi Informasi) ABDINE Jurnal Pengabdian Masyarakat Jurnal Ilmu Multidisplin Jurnal Indonesia Sosial Teknologi Innovative: Journal Of Social Science Research Majalah Ilmiah METHODA Media Abdimas Jurnal Karya untuk Masyarakat (JKuM) Jurnal Komputasi
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Jurnal Teknik Informatika (JUTIF)

Preventing Data Leakage in Classification via Integrated Machine Learning Pipelines: Preprocessing, Feature Transformation, and Hyperparameter Tuning Ichwani, Arief; Kesuma, Rahman Indra; Setiawan, Andika; Wicaksono, Imam Eko; Hanifah, Raidah
Jurnal Teknik Informatika (Jutif) Vol. 7 No. 1 (2026): JUTIF Volume 7, Number 1, February 2026
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2026.7.1.5490

Abstract

Data leakage in machine learning classification often leads to overfitting, inflated performance estimates, and poor reproducibility, which can undermine the reliability of deployed models and incur industrial losses. This paper addresses the leakage problem by proposing an integrated machine learning pipeline that strictly isolates training and evaluation processes across preprocessing, feature transformation, and model optimization stages. Experiments are conducted on the Titanic passenger survival dataset, where exploratory data analysis identifies data quality issues, followed by stratified train-test splitting to preserve class distribution. All preprocessing steps, including missing value imputation, categorical encoding, and feature scaling, are applied exclusively to the training data using a ColumnTransformer embedded within a unified Pipeline. A K-Nearest Neighbors (KNN) classifier is employed, with hyperparameters optimized via GridSearchCV and 3-fold cross-validation. Experimental results show that a baseline model without leakage control achieves only 72.62% test accuracy and exhibits a substantial overfitting gap. In contrast, the proposed pipeline-based approach improves generalization, achieving 78.21% test accuracy with an optimal configuration of k = 29 and Manhattan distance while significantly reducing overfitting. The main contribution of this work is the formulation of a reproducible, leakage-aware pipeline guideline that ensures unbiased evaluation and reliable generalization in classification tasks, providing practical methodological insights for both academic research and real-world machine learning applications.
Co-Authors Adi Widiantono Adib Faishol Adrianta, Ade Tirta AMELIA FEBRIANA, AMELIA Aminu, Yusuf Andika Setiawan, Andika Anggoro, Rio Astrina Dewi Farida Azizah, Anik Hanifatul Badie Uddin Binastya Anggara Sekti Budi Tjahjono Chandra, Michlee Septian Dewi Marini Diah Aryani, Diah Emon Sulaksana Eva Milenia Surya Buana Evi Martaseli Fabio Sepriano Fahreza Bahran Farida, Astrina Dewi Febriyana, Vira Fitri Nur Utami Gaol, Gunawan Lumban Hani Dewi Ariessanti Hastuti, Hera Hendry Gunawan Hendry Gunawan, Hendry Hermansyah Herwanto, Agus Hosizah Hosizah Husbi, Fazhli Ridwan Imam Taufik Indrawan, Heronimus Erwin Iswahyudi, Raden Teddy Iwan Setiawan Jepri Halomoan Simbolon Jorgy Qori Qurani Joseph Febrian Julianti, Khoirunisa Kartini Kartini Kundang Karsono Kus Hendrawan Muiz Lili Hastuti Marchel Antonius Sirait Mira Asmirajanti Mohamad Alrifqi Muhammad Hadi Arfian MUNAWAR Nabilla Nabilla Nabilla Nabilla Nilasari, Rifani Nina Nurhasanah, Nina Nixon Erzed Nizirwan Anwar Noviandi Noviandi Nurfilael, Gagas Nurfilae Nurmalasari, Mieke Pratama, Anggoro Yudha Purwadi, Yusuf Agung Rachmah, Alfinda Rahman Indra Kesuma Raidah Hanifah Ramadhan, Naufal Maulana Rendy Nurmarianto Riya Widayanti Rizki Reza Saputra Roland, Muhammad Saragih, Muhammad Daffa Arjuna Fiarnanda Septian Priyo Utomo Setiawati, Popong Sori Mahummat Siringo Ringo Suprapto Suprapto Sutanto, Imam Syarah Syarah Syihan Achmad Tria Saras Pertiwi Tugiman Ulum, Muhamad Bahrul Vincentia Vicitta Rara Violy Vira Febriyana Vitri Tundjungsari Wicaksono, Imam Eko Widia Sari Widodo, Agung Mulyo Yanathifal Salsabila Anggraeni Yasir Muharram Fauzi Yohanes Bagas Ari Widatama Yulfitri, Alivia Yulhendri Yulhendri