Claim Missing Document
Check
Articles

Found 3 Documents
Search

Differentiated Thyroid Cancer Recurrence Prediction Using Boosting Algorithms Saritas, Mucahid Mustafa; Yildiz, Muslume Beyza; Cengel, Talha Alperen; Koklu, Murat
Jurnal Komputer Teknologi Informasi Sistem Informasi (JUKTISI) Vol. 4 No. 2 (2025): September 2025
Publisher : LKP KARYA PRIMA KURSUS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62712/juktisi.v4i2.490

Abstract

This study aims to compare the performance of AdaBoost, Gradient Boosting, and CatBoost algorithms in predicting the recurrence risk of Differentiated Thyroid Cancer (DTC). DTC is the most common type of thyroid cancer, and due to its recurrence risk, accurate and effective prediction models are needed. In this study, a dataset containing clinical and pathological data of patients diagnosed with DTC was used. The performance of the models was evaluated using metrics such as accuracy, precision, recall, and F1 score. The results revealed that the CatBoost algorithm achieved the highest performance, with an accuracy of 98.70% and an F1 score of 98.69% on the test data. The Gradient Boosting algorithm ranked second with an accuracy of 97.40% and an F1 score of 97.40%, while the AdaBoost algorithm showed the lowest performance, with an accuracy of 96.10% and an F1 score of 96.14%. These findings indicate that the CatBoost algorithm outperforms the other algorithms in predicting DTC recurrence risk and is a suitable candidate for use in clinical decision support systems.
Clustering Performance on Heart Disease Data: Effects of Distance Metrics and Scaling Akbas, Ibrahim; Taspinar, Yavuz; Koklu, Murat
Journal of Technology and System Information Vol. 3 No. 1 (2026): January
Publisher : Indonesian Journal Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47134/jtsi.v3i1.5336

Abstract

Cardiovascular diseases (CVD) are one of the leading causes of morbidity and mortality worldwide, requiring advanced analytical approaches to identify early-stage risk groups and classify patient profiles in greater detail. The aim of this study is to reveal latent patient subgroups associated with CVD using unsupervised machine learning methods on clinical data. In this context, a dataset consisting of 11 clinical variables from 303 patients who visited the VA Medical Center in Long Beach, California, was analyzed. During the preprocessing stage, missing observations were eliminated, only numerical variables were used, and both z-score standardization and min–max normalization were applied to the data. Subsequently, hierarchical clustering analyses were performed using single, complete, and average linkage approaches based on Euclidean and cosine distance measures) (the number of possible clusters for different distance–scaling combinations was evaluated using the Elbow and Silhouette measures. The results obtained showed that the 4-cluster solution, particularly under the complete and average linkage methods, represented the data structure in the most clinically explanatory manner. The similarity between the clustering results obtained using the k-means algorithm with Euclidean distance in standardized data and cosine distance in normalized data was calculated as the Rand Index (RI) = 0.8179) (this value demonstrated that the cluster structure was largely preserved despite different distance metrics and scaling strategies.  The findings demonstrate that unsupervised learning-based clustering approaches provide a useful tool for defining meaningful risk classes within heterogeneous patient populations based on clinical datasets and for conducting comparative clinical evaluations between these classes.
Classification of Sleep Disorders Using Machine Learning Algorithms Ekim, Ufuk; Koklu, Murat
Journal of Technology and System Information Vol. 3 No. 1 (2026): January
Publisher : Indonesian Journal Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47134/jtsi.v3i1.5346

Abstract

This study aims to analyse the relationship between individuals' sleep health and lifestyle using machine learning algorithms. The Sleep Health and Lifestyle dataset used in the study includes variables such as age, gender, occupation, physical activity, stress level, and sleep duration. The data has been cleaned during the pre-processing stage and normalisation procedures have been applied. Subsequently, the classification of individuals' sleep quality was performed using the K-Nearest Neighbour (KNN), Random Forest, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) algorithms. Model performance has been evaluated using metrics such as accuracy, F1-score, precision and sensitivity. In this study, the 5-fold cross-validation method was preferred to evaluate the model's performance in a more reliable and generalisable manner. The results show that ANN and Random Forest models achieve a higher accuracy rate compared to other algorithms. These findings reveal that lifestyle factors have a strong influence on predicting sleep quality.