Claim Missing Document
Check
Articles

Found 2 Documents
Search

Mono Background and Multi Background Datasets Comparison Study for Indonesian Sign Language (SIBI) Letters Detection using YOLOv8 Andriyanto, Teguh; Handayani, Anik Nur; Ar Rosyid, Harits; Wiryawan, Muhammad Zaki; Azizah, Desi Fatkhi; Liang, Yeoh Wen
JOIV : International Journal on Informatics Visualization Vol 9, No 5 (2025)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.9.5.3462

Abstract

The research in this paper focuses on the detection of Indonesian Sign Language System (SIBI) letters using the YOLOv8 object detection model. The study compares two datasets, one with mono-background (a simple, uniform background) and another with multi-background (complex and varied backgrounds). The research aims to evaluate how the complexity of image backgrounds affects the performance of the YOLOv8 model in detecting SIBI letters This study uses a dataset consisting of 24 SIBI letters (excluding J and Z due to the complexity of their gestures), sourced from Mendeley. The dataset was processed with and without data augmentation (rotation, brightness adjustments, blur, and noise) to test the model under various conditions. Four models were trained and tested: one using mono-background images, another using augmented mono-background images, a third using multi-background images, and a final model trained on augmented multi-background images. The results showed that the YOLOv8 model performed best with the multi-background dataset, achieving a precision of 0.995, recall of 1.000, F1 score of 0.997, and mAP50 of 0.994Adding to the model made it better at generalizing, but it took longer to train. The study finds that multi-background datasets with augmentation make the model much better at finding SIBI letters in real-world settings. This makes it a promising tool for projects that aim to improve communication for deaf people in Indonesia. The study suggests that more research should be done on augmentation techniques and bigger datasets to make detection more accurate. 
Comparative Analysis of Speech-to-Text APIs for Supporting Communication of the Deaf Community Handayani, Anik Nur; Hariyono, Hariyono; Nasih, Ahmad Munjin; Rochmawati, Rochmawati; Hitipeuw, Imanuel; Ar Rosyid, Harits; Ardiansah, Jevri Tri; Praja, Rafli Indar; Nurdiansyah, Ahmad; Azizah, Desi Fatkhi
Indonesian Journal of Data and Science Vol. 6 No. 3 (2025): Indonesian Journal of Data and Science
Publisher : yocto brain

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56705/ijodas.v6i3.327

Abstract

Hearing impairment can have a profound impact on the mental and emotional state of sufferers, as well as hinder communication and delay in accessing information directly that relies on interpreters. Advances in assistive technology, especially speech recognition systems that are able to convert spoken language into written text (speech-to-text). However, its implementation faces various challenges related to the level of accuracy of each speech-to-text Application Programming Interface (API), thus requiring an appropriate deep learning model. This study serves to analyze and compare the performance of speech-to-text API services (Deepgram API, Google API and Whisper AI) based on Word Error Rate (WER) and Words Per Minute (WPM), to determine the most optimal API in a web-based real-time transcription system using the JavaScript programming language and Glitch.com. The three API services were tested by calculating their error rates and transcription speeds, then evaluated to see how low the error accuracy rate was and how high the transcription speed was. On average, Whisper AI had a WER of 0% across all word categories, but its speed was lower than the other two APIs. Deepgram API displayed the best balance between accuracy and speed, with an average WER of 13.78% and 67 WPM. Google API performed stably, but its WER value was slightly higher than Deepgram API. In conclusion, based on the results, Deepgram API was deemed the most optimal for live transcription, as it is capable of producing fast and error-free transcriptions, significantly increasing the accessibility of information for the deaf community.