Claim Missing Document
Check
Articles

Found 2 Documents
Search

Image Preprocessing Approaches Toward Better Learning Performance with CNN Tribuana, Dhimas; Hazriani; Arda, Abdul Latief
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 8 No 1 (2024): February 2024
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29207/resti.v8i1.5417

Abstract

Convolutional neural networks (CNNs) are at the forefront of computer vision, relying heavily on the quality of input data determined by the preprocessing method. An undue preprocessing approach will result in poor learning performance. This study critically examines the impact of advanced image pre-processing techniques on computational neural networks (CNNs) in facial recognition. Emphasizing the importance of data quality, we explore various pre-processing approaches, including noise reduction, histogram equalization, and image hashing. Our methodology involves feature visualization to improve facial feature discernment, training convergence analysis, and real-time model testing. The results demonstrate significant improvements in model performance with the preprocessed dataset: average accuracy, recall, precision, and F1 score enhancements of 4.17%, 3.45%, 3.45%, and 3.81%, respectively. Additionally, real-time testing shows a 21% performance increase and a 1.41% reduction in computing time. This study not only underscores the effectiveness of preprocessing in boosting CNN capabilities, but also opens avenues for future research in applying these methods to diverse image types and exploring various CNN architectures for comprehensive understanding.
Information Extraction from Makassar Culinary Images Using Vision Transformers and Cahya GPT-2 (Visual Question Answering Case Study Sharief, Tirta Chiantalia; Hazriani; Syamsul; Anas; Yuyun
Indonesian Journal of Data and Science Vol. 6 No. 3 (2025): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v6i3.357

Abstract

This study examines the development of a Visual Question Answering (VQA) system to extract information from images of Makassar culinary specialties by combining the Vision Transformer (ViT) and Cahya_GPT-2 models. The main objective is to integrate visual and natural language understanding so that computers can recognize visual objects (food images) and generate relevant text descriptions. The research method uses an experimental approach with a fine-tuning process of the pre-trained ViT model as a visual encoder and Cahya_GPT-2 as a text decoder. The dataset used includes images of Makassar culinary specialties such as Coto, Konro, Pisang Epe, Barongko, and Jalangkote with question and answer (QnA) annotations. Evaluation is carried out using the ROUGE metric to assess the semantic match between the model's answers and the actual answers. The results show that the developed multimodal model is able to accurately understand the image context with an average ROUGE-L score of 0.63, indicating a good level of closeness between the model's answers and the annotations. In conclusion, the combination of ViT and Cahya_GPT-2 can be an effective approach for natural language-based visual information extraction systems, especially in the Indonesian local culinary domain