Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Science in Information Technology Letters

Comparative analysis of decision tree and random forest classifiers for structured data classification in machine learning Kinasih, Agnes Nola Sekar; Handayani, Anik Nur; Ardiansah, Jevri Tri; Damanhuri, Nor Salwa
Science in Information Technology Letters Vol 5, No 2 (2024): November 2024
Publisher : Association for Scientific Computing Electronics and Engineering (ASCEE)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31763/sitech.v5i2.1746

Abstract

This study explores the application of machine learning techniques, specifically classification, to improve data analysis outcomes. The primary objective is to evaluate and compare the performance of Decision Tree and Random Forest classifiers in the context of a structured dataset. Using the Elbow Method for optimal clustering alongside decision tree and random forest for classification algorithms, this research investigates the effectiveness of each method in accurately categorizing data. The study employs K-Means clustering to segment the data and Decision Trees and Random Forests for classification tasks. Dataset used in this research was obtained from Kaggle consisting of 13 attributes and 1048575 rows, all of which are numeric. The key results show that Random Forest outperforms Decision Trees in terms of classification accuracy, precision, recall, and F1 score, providing a more robust model for data classification. The performance improvement observed in Random Forest, particularly in handling complex datasets, demonstrates its superiority in generalizing across varied classes. The findings suggest that for applications requiring high accuracy and reliability, Random Forest is preferable to Decision Trees, especially when the dataset exhibits high variability. This research contributes to a deeper understanding of how different machine learning models can be applied to real-world classification problems, offering insights into the selection of the most appropriate model based on specific data characteristics.