Journal of Data Science, Technology, and Artificial Intelligence
Vol. 1 No. 1 (2024): July 2024

Feature Importance and Binary Classification using PyCaret

Naswir, Ahmad Fadhil (Unknown)
Williem (Unknown)
Hasanul Fahmi (Unknown)



Article Info

Publish Date
23 Apr 2025

Abstract

The rapid advancement of machine learning (ML) techniques has facilitated the development of robust models for various classification tasks. This study explores the application of PyCaret, an open-source, low-code machine learning library, to perform feature importance analysis and binary classification using the Titanic dataset from Kaggle. The dataset underwent preprocessing to convert categorical features into numerical values and to remove irrelevant columns. Multiple classification models were compared, with the Gradient Boosting Classifier achieving the highest performance, marked by an average accuracy of 81.52%. Detailed evaluation metrics, including precision, recall, F1 score, and AUC, further validated the model's effectiveness. Feature importance analysis identified gender (sex), fare, and age as the most significant predictors of survival, aligning with historical accounts. The results demonstrate PyCaret's capability to streamline the ML workflow, providing valuable insights and enabling rapid experimentation. This study highlights the potential of binary classification and feature importance analysis in handling large-scale datasets, where the identified important features can serve as a baseline for implementing advanced algorithms such as deep learning.

Copyrights © 2024






Journal Info

Abbrev

ditech

Publisher

Subject

Computer Science & IT

Description

The Journal of Data Science, Technology, and Artificial Intelligence is a semi-annual publication released in January and July. It covers a wide range of topics within the realms of data science, technology, and artificial intelligence. This interdisciplinary journal is a platform for scholars, ...