cover
Contact Name
Huzain
Contact Email
huzain.azis@umi.ac.id
Phone
+628114484875
Journal Mail Official
ijodas.journal@gmail.com
Editorial Address
Jln. Paccerakkang, Kel. Berua, Kec.Biringkanaya, Kota Makassar, Propinsi Sulawesi Selatan, 90241
Location
Unknown,
Unknown
INDONESIA
Indonesian Journal of Data and Science
Published by yocto brain
ISSN : -     EISSN : 27159930     DOI : -
Core Subject : Science, Education,
IJODAS provides online media to publish scientific articles from research in the field of Data Science, Data Mining, Data Communication, Data Security and Data Representation
Articles 135 Documents
Comparative Analysis of Random Forest and LSTM Models for Customer Churn Prediction Based on Customer Satisfaction and Retention Gegeleso, Babajide; Ebiesuwa, Oluwaseun
Indonesian Journal of Data and Science Vol. 6 No. 2 (2025): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v6i2.244

Abstract

Forecasting of Customer churn and prediction is important for sustaining long-term customer relationships and enhancing profitability in competitive markets. This study outlines the comparison of the performance of Random Forest (RF) and Long Short-Term Memory (LSTM) models in predicting customer churn using a dataset of 2,850 customers. The dataset comprises of behavioral, transactional, and satisfaction metrics. Key evaluation metrics include accuracy, precision, recall, F1-score, and AUC-ROC. The result clearly shows that while Random Forest offers strong baseline performance with interpretable results, LSTM captures temporal patterns very effectively and performs better in identifying subtle churn indicators, especially in sequential customer satisfaction data. The result of metrics evaluated shows LSTM has an Accuracy of 88.6%,Precision of 85.3%,Recall of 82.5%,F1-score of 83.9% and AUC-ROC of 0.92 while Random Forest has Accuracy of 85.2%,Precision of 81.5%,Recall of 77.0%,F1- Score of 79.2% and AUC-ROC of 0.89. This shows the preference of LSTM for rapidly changing and large volume dataset over RF excellence in less complicated and sparse dataset
Enhanced NER Tagging Model using Relative Positional Encoding Transformer Model Achir, Jerome Aondongu; Abdulkarim, Muhammed; Abdullahi , Mohammed
Indonesian Journal of Data and Science Vol. 6 No. 2 (2025): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v6i2.245

Abstract

Named Entity Recognition remains pivotal for structuring unstructured text, yet existing models face challenges with long-range dependencies, domain generalisation, and reliance on large, annotated datasets. To address these limitations, this paper introduces a hybrid architecture combining a transformer model enhanced with relative positional encoding and a rule-based refinement module. Relative positional encoding improves contextual understanding by capturing token relationships dynamically, while rule-based post-processing corrects inconsistencies in entity tagging. After being evaluated on the Groningen Meaning Bank and Wikipedia Location datasets, the proposed model achieves state-of-the-art performance, with validation accuracies of 98.91% for Wikipedia and 98.50% for GMB with rule-based refinement, surpassing existing benchmark research of 94.0%. The relative positional encoding contributes 34.42% to the attention mechanism’s magnitude, underscoring its efficacy in modelling token interactions. Results demonstrate that integrating transformer-based architectures with rule-based corrections significantly enhances entity classification accuracy, particularly in complex and morphologically diverse contexts. This work highlights the potential of hybrid approaches to optimise sequence labelling tasks across domains.
Integration of Yolov8 And Instance Segmentation in The Chinese Sign Language (CSL) Recognition System Wijaya, Mikel Ega; Handayani, Anik Nur
Indonesian Journal of Data and Science Vol. 6 No. 2 (2025): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v6i2.247

Abstract

This research aims to develop an advanced recognition system for Chinese Sign Language (CSL) by integrating YOLOv8 and instance segmentation techniques. Communication through sign language is essential for the deaf community, and although CSL has been standardized in China, recognizing complex hand movements remains a significant challenge. YOLOv8 is employed for real-time object detection, while instance segmentation is used to provide more detailed analysis of hand gestures. This integration seeks to improve hand gesture recognition under varying lighting and background conditions, which is crucial for more effective communication between the deaf community and the wider society. The study evaluates the system’s performance using common metrics such as Mean Average Precision (mAP), precision, recall, and F1-score. The findings indicate that the non-segmentation model performs better than the segmentation model in terms of precision, recall, and mAP, especially when trained with a larger dataset ratio. The non-segmentation model provides faster and more accurate detection, while the segmentation model, despite using the same amount of data, shows potential for more detailed recognition of gestures. Although the segmentation model shows improvements in the F1-score with more detailed accuracy, the non-segmentation model remains superior in overall detection speed and accuracy. This research highlights the importance of integrating YOLOv8 and instance segmentation for improving CSL recognition, with better results on the non-segmentation model for more effective communication for the deaf
Performance Analysis of Random Forest and Naive Bayes Methods for Classifying Tomato Leaf Disease Datasets Ananda, Rima; Lilis Nur Hayati; Irawati
Indonesian Journal of Data and Science Vol. 6 No. 2 (2025): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v6i2.252

Abstract

Tomato productivity is often disrupted by diseases affecting tomato plants, such as early blight and late blight, which can significantly reduce crop yields. Early detection of these diseases is crucial to prevent greater losses. This study compares two machine learning-based classification methods, namely Random Forest and Naïve Bayes, in identifying diseases on tomato leaves. The dataset used consists of 1,255 images obtained from Kaggle, with the data divided into two classes: early blight with 627 images and late blight with 628 images, which then underwent preprocessing and data splitting with three ratio scenarios (70:30, 80:20, and 90:10) for training and testing. This study shows that it only achieved an accuracy of 76.98%, while the Random Forest method had the highest accuracy of 92.86% in the 90:10 data ratio scenario. Thus, the Random Forest method proves to be more effective in classifying tomato leaf diseases compared to Naïve Bayes. The implementation of this model can help farmers detect diseases more quickly and accurately, thereby increasing agricultural productivity.
Optimization of Nglegena Javanese Script Recognition With Machine Learning Based on Zoning And Normalization of Feature Extraction Graciello, Manuel Tanbica; Handayani, Anik Nur; Wibawa, Aji Prasetya
Indonesian Journal of Data and Science Vol. 6 No. 2 (2025): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v6i2.256

Abstract

Machine learning offers promising solutions for the recognition of handwritten Javanese Nglegena script, which is crucial for preserving Indonesia's cultural heritage. This study explores the application of several supervised learning algorithms-K-Nearest Neighbors (KNN), Naïve Bayes, Decision Tree, and Random Forest-for classifying handwritten images of Nglegena Javanese script. Feature extraction is performed using a zoning technique, where each character image is divided into multiple zones (16, 25, 36, and 64) to capture local details. The extracted features are further processed using normalization methods, including Min-Max, Z-Score, and Binary normalization, to ensure uniform data distribution. The dataset, consisting of 600 images representing Javanese Nglegena characters, is split into training and testing sets using various ratios. Experimental results show that the combination of Naïve Bayes classification, 36-zone feature extraction, and Min-Max or Z-Score normalization achieves the highest accuracy of 65%. These findings demonstrate that optimizing zoning and normalization can significantly enhance the accuracy of machine learning models for Javanese script recognition. The research contributes to developing Optical Character Recognition (OCR) technology for Javanese script, supporting the digital preservation of Indonesia's historical and cultural heritage.
Comparative Analysis of OCR Methods Integrated with Fuzzy Matching for Food Ingredient Detection in Japanese Packaged Products Muhammad Zaky Rahmatsyah; Jevri Tri Ardiansah; Anik Nur Handayani
Indonesian Journal of Data and Science Vol. 6 No. 2 (2025): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v6i2.257

Abstract

Advances in digital technology offer a solution to the challenges faced by foreign consumers in understanding ingredient information on Japanese food packaging, especially due to the use of Kanji, Hiragana, and Katakana characters. This study develops and reveals an allergen detection method based on Optical Character Recognition (OCR) and fuzzy match applied to Japanese food packaging. Three OCR methods—Google Vision OCR, PaddleOCR, and Tesseract OCR—were compared and evaluated using Precision, Recall, F1-Score, and Confusion Matrix metrics.The study began with the collection of food product images from bold sources, followed by text extraction using the three OCR methods. The extracted text was then cleaned and normalized before being matched with ground truth data using fuzzy match. Testing was conducted on 10 product image samples with varying quality and lighting conditions. The results showed that Google Vision OCR outperformed the others, achieving an average F1 score of 1.00, followed by PaddleOCR (0.75), and Tesseract OCR (0.30). Google Vision was the most consistent in detecting allergens such as 乳 (milk), 小麦 (wheat), and 卵 (egg). These findings suggest that the integration of OCR and fuzzy matching is effective in detecting allergens, even in the presence of textual variations and recognition errors. This study contributes to the development of automated food recommendation systems for foreign consumers, especially those who have food preferences due to allergies, religious beliefs, or personal preferences.
Classification of Cavendish Banana Ripeness With CNN Method Tjokorda Istri Agung Pandu Yuni Maharani; I Gusti Agung Indrawan; Gede Dana Pramitha; Christina Purnama Yanti; I Made Marthana Yusa
Indonesian Journal of Data and Science Vol. 6 No. 2 (2025): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v6i2.259

Abstract

Cavendish bananas are one of the most widely consumed tropical fruits in Indonesia due to their sweet taste and high nutritional content. However, as they ripen, the sugar content in bananas increases, which can be a problem for diabetics. To help diabetics choose bananas with the right level of ripeness, this study developed a Cavendish banana ripeness classification model using artificial intelligence technology, namely the ResNet50 Convolutional Neural Network (CNN) architecture. The banana data is divided into five ripeness categories: green, yellowish green, yellow, spotted yellow, and spotted brownish yellow. The model was trained with two approaches, with and without data augmentation, using two types of training algorithms (optimizers), namely Adam and SGD, as well as a k-fold cross-validation method to ensure accurate results. The results showed that the ResNet50 model produced the highest accuracy of 98% when trained using data augmentation and the Adam optimizer with a learning rate setting of 0.0001.
Hybrid CNN-LSTM and Cox Model for Bipolar Risk Analysis Using Social Media Data Amanda, Rizki; Aulia, Jasmine
Indonesian Journal of Data and Science Vol. 6 No. 2 (2025): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v6i2.265

Abstract

Introduction: Mental disorders such as bipolar disorder are becoming increasingly prominent, particularly with the rise of emotional expression through social media. Early detection remains a significant challenge due to the lack of non-invasive, real-time assessment methods. Methods: This study proposes a hybrid deep learning approach combining Convolutional Neural Network–Long Short-Term Memory (CNN-LSTM) and the Cox Proportional Hazards (Cox PH) model to analyze the risk and timing of bipolar disorder onset. A dataset of 3,511 tweets from 517 Twitter users was collected. The CNN-LSTM model classified bipolar risk levels based on text data, while the Cox PH model estimated the time-to-event for high-risk conditions using behavioral features and predicted risk labels. Results: The hybrid model demonstrated strong predictive performance. The risk label significantly influenced the time to high-risk condition (hazard ratio = 5.39, p < 0.005). The model achieved a concordance index (C-index) of 0.816, indicating high reliability in survival prediction. Conclusions: This case study highlights the potential of integrating deep learning and survival analysis for early bipolar disorder detection using social media data. The proposed non-invasive method can support mental health monitoring while raising awareness of ethical and privacy considerations
Classification Of Bougainvillea Flower Varieties Using Variant Of CNN: Resnet50 I Gede Agung Chandra Wijaya; I Gusti Agung Indrawan; I Nyoman Anom Fajaraditya; Ayu Gede Wildahlia; Ida Bagus Ary Indra Iswara
Indonesian Journal of Data and Science Vol. 6 No. 2 (2025): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v6i2.266

Abstract

Bougainvillea is a tropical ornamental plant renowned for its vibrant colors and variety of cultivars, yet classifying its species remains challenging due to morphological similarities. This study aims to develop an automated classification system using the ResNet50 deep learning architecture to identify Bougainvillea flower varieties based on visual imagery. The dataset consists of 700 images from seven distinct classes, captured under natural lighting using a smartphone camera. The research process includes image preprocessing (resizing to 224x224 pixels), geometric data augmentation to increase dataset diversity, and evaluation using K-Fold Cross Validation to ensure robust model assessment. The model was trained using transfer learning, and its performance was compared between augmented and non-augmented datasets through evaluation metrics such as accuracy, precision, recall, and F1-score. The results show that augmentation significantly improved the model's performance, achieving an average accuracy of 99.67% on augmented data compared to 93.39% on non-augmented data. The augmented model also exhibited greater consistency across all folds, with several achieving perfect scores. These findings highlight that combining ResNet50 with transfer learning and image augmentation produces a highly accurate and reliable Bougainvillea classification system. This research contributes to the field of AI-based plant phenotyping and lays the groundwork for future applications in horticulture, biodiversity conservation, and education. Further development is recommended to explore larger and more diverse datasets, investigate advanced architectures such as EfficientNet or Vision Transformers, and build real-time mobile-based classification tools for practical field usage
Classification Of Organic And Inorganic Waste Using Resnet50 Qinantha, I Kadek Mahesa Chandra; Indrawan, I Gusti Agung; Putra, I Putu Satria Udyana; Aristamy, I Gusti Ayu Agung Mas; Willdahlia, Ayu Gede
Indonesian Journal of Data and Science Vol. 6 No. 2 (2025): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v6i2.267

Abstract

Waste generation, particularly from organic and inorganic sources, has become a growing environmental issue, especially in culturally unique regions like Bali where traditional offerings contribute to organic waste volumes. Despite regulations such as Gianyar Regency Regulation No. 76 of 2023 mandating source-level separation, on-ground implementation remains inconsistent due to low public awareness and operational limitations. This study addresses the challenge by developing an automated image-based classification system using the ResNet50 deep learning architecture to distinguish between organic and inorganic waste. A total of 200 images were collected 100 per class using smartphone cameras, and the dataset was expanded to 1,400 images through geometric data augmentation techniques such as rotation, flipping, and zooming. Images were resized to 224x224 pixels and evaluated using K-Fold Cross Validation to ensure model stability. The model was trained using transfer learning and tested under two conditions with and without augmentation while optimizing hyperparameters such as learning rates (0.0001 and 0.00001) and optimizers (Adam and SGD). The results demonstrate that augmentation significantly enhanced model performance, with the augmented model achieving an average accuracy of 99.25%, precision of 99.32%, recall of 99.25%, and F1-score of 99.25%, compared to 89.88% accuracy in the non-augmented model. These findings confirm that ResNet50, when combined with geometric augmentation and proper preprocessing, offers a robust, accurate, and scalable solution for waste classification tasks. This research contributes to the advancement of AI-driven environmental technologies and offers a potential framework for smart waste management systems, with future directions including real-time deployment, multi-class classification, and expansion to more diverse and real-world datasets.