Claim Missing Document
Check
Articles

Found 3 Documents
Search

Improving Multi-Class Classification on 5-Celebrity-Faces Dataset using Ensemble Classification Methods Nurul Rismayanti; Aulia Putri Utami
Indonesian Journal of Data and Science Vol. 4 No. 2 (2023): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v4i2.78

Abstract

This study aims to compare the performance between Random Forest Classifier and Gaussian Naïve Bayes Classifier in classification. Several evaluation metrics such as accuracy, precision, recall, and F1-score were used to analyze the performance of both models. The dataset used has specific characteristics that influence the evaluation results. The research findings indicate that Random Forest Classifier outperforms Gaussian Naïve Bayes Classifier in most of the evaluation metrics. Random Forest Classifier achieves higher accuracy and better precision, recall, and weighted F1-score. However, it should be noted that Random Forest Classifier also has more outliers compared to Gaussian Naïve Bayes Classifier when visualized using boxplots. Therefore, in selecting a classification model, a trade-off between higher performance and sensitivity to outliers needs to be considered. Further statistical testing and advanced evaluation are required to gain a deeper understanding of the impact and interpretation of the obtained results. This study provides valuable insights into understanding the comparison between these two classification models and their implications in different contexts.
M2SmallLint : software health monitoring tool Hayatou Oumarou; Nurul Rismayanti
Indonesian Journal of Data and Science Vol. 4 No. 2 (2023): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v4i2.90

Abstract

Developing error-free applications is a major challenge for computer scientists. Tools to remedy this problem have been developed, notably Rule Checkers and proof assistants. As a particular case of error, a bug is by nature intangible, invisible and difficult to trace. We propose to investigate the correlations between the alerts generated by rule checkers and the internal quality of the software system. In this first version of the work, we present M2SmallLint, a tool for visualizing and navigating through source code properties in order to locate potential errors. This tool enables the visualization of software health.
Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions Admojo, Fadhila Tangguh; Nurul Rismayanti
Indonesian Journal of Data and Science Vol. 5 No. 1 (2024): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v5i1.126

Abstract

This study harnesses the predictive capabilities of machine learning to explore the determinants of obesity within populations from Mexico, Peru, and Colombia, using a Decision Tree algorithm bolstered by 5-fold cross-validation. Our comprehensive analysis of 2111 individuals' lifestyle and physical condition data yielded accuracy, precision, recall, and F1-scores that notably peaked in the third and fifth folds. The findings affirmed the significance of dietary habits and physical activity as substantial predictors of obesity levels. The variability in model performance across the folds underscored the importance of robust cross-validation in enhancing the model's generalizability. This research contributes to the burgeoning field of data science in public health by providing a viable model for obesity prediction and laying the groundwork for targeted health interventions. Our study's insights are pivotal for public health officials and policymakers, serving as a stepping stone towards more sophisticated, data-driven approaches to combating obesity. The study, however, recognizes the inherent limitations of self-reported data and the need for broader datasets that encompass more diverse variables. Future research directions include the analysis of longitudinal data to establish causal relationships and the comparison of various machine learning models to optimize predictive performance