Claim Missing Document
Check
Articles

Maleo-Short: An "In-the-Wild" Indonesian Dataset for Speaker Diarization Mardiana, Ardi; Muslimah, Dinda Desmonda; Bastian, Ade; Irawan, Eka Tresna
JOIN (Jurnal Online Informatika) Vol 11 No 1 (2026)
Publisher : Department of Informatics, UIN Sunan Gunung Djati Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15575/join.v11i1.1781

Abstract

Speaker diarization (SD), the task of partitioning an audio stream into speaker-homogenous segments, is fundamental for analyzing multi-speaker recordings. Its application to “in-the-wild” data, such as content from the YouTube platform, poses significant challenges, including overlapped speech, ambient noise, and rapid speaker turns, thereby constituting an active research area. While numerous SD datasets are available, they predominantly focus on English and other high-resource languages. A notable scarcity of publicly accessible datasets exists for the Indonesian language, as extant corpora are primarily engineered for Automatic Speech Recognition (ASR). To address this resource deficit, this research introduces Maleo-Short, a new Indonesian multi-speaker dataset derived from YouTube. The dataset comprises 110 short conversational clips, with a total duration of 1 hours 32 minutes. A reliable ground truth was established through a meticulous manual annotation process using ELAN to generate precise speaker segmentation and transcription files. To validate its utility and assess its complexity, the dataset was evaluated using pre-trained baseline models. The empirical results confirm its status as a challenging benchmark, with the most effective models achieving a Diarization Error Rate (DER) of 32.64% and a Word Error Rate (WER) of 33.78%. Maleo-Short is presented as a valuable, publicly accessible resource intended to catalyze advancements in Indonesian speaker diarization research by facilitating the development and rigorous evaluation of SD systems on acoustically complex and realistic conversational data. Maleo-Short is available at https://doi.org/10.57967/hf/7944.  
Global Sentiment Analysis of Video Surveillance Technology Using BERT Syifaa Puspita Rahayu; Ade Bastian; Dadan Zaliluddin; Ii Sopiandi; Ardi Mardiana
Journal of Applied Information System and Informatic (JAISI) Vol 4, No 1 (2026): MEI 2026
Publisher : Deparment Information System, Siliwangi University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.37058/jaisi.v4i1.17249

Abstract

The significant imbalance between population growth and the availability of video surveillance systems in several major cities across different countries has raised concerns regarding the effectiveness of public space monitoring. As urban areas become increasingly dense and complex, CCTV is expected to function not only as a security support tool but also as part of a broader strategy for crime prevention, public safety, and urban management. However, the implementation of video surveillance also generates diverse public opinions, especially on social media platforms such as Twitter (X), where users actively express their views, support, concerns, and criticism. This study aims to identify public sentiment trends toward video surveillance and evaluate the performance of the BERT (Bidirectional Encoder Representations from Transformers) model in classifying these sentiments. The research uses social media data collected from various countries and applies BERT as a contextual natural language processing model. The findings show that most users expressed positive sentiments toward video surveillance, indicating that CCTV is generally perceived as beneficial for improving security and monitoring public spaces. In terms of model performance, BERT achieved its highest accuracy of 0.85 during the third trial at epoch 15. However, a slight decrease in accuracy occurred at epoch 20, indicating the possibility of overfitting when the model was trained for too long. These findings suggest that BERT is effective in capturing public opinion contextually and can be used as a valuable analytical tool to support evidence-based decision-making related to surveillance technology implementation in urban environments.
Maleo Emotion Audio Dataset Indonesia for Emotion Classification Ardi Mardiana; Sri Mentari Widya Ningrum Permana; Ii Sopiandi; Ade Bastian; Eka Tresna Irawan
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control Vol. 11, No. 2, May 2026
Publisher : Universitas Muhammadiyah Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22219/kinetik.v11i2.2474

Abstract

The limited availability of voice emotion corpora in Indonesian poses a challenge for the development of Speech Emotion Recognition (SER) systems, despite growing needs in sectors such as customer service and human-computer interaction. To address this, we developed the Maleo Emotion Audio Corpus, a collection of three-second audio clips with seven emotion labels (angry, neutral, disgusted, sad, happy, afraid, and surprised), sourced from YouTube. The audio data underwent preprocessing, feature extraction (MFCC, ZCR, energy, spectral roll-off, and spectral flux), and augmentation. The classification model was built using a 1D Convolutional Neural Network (CNN) architecture specifically adapted for the 3-second audio features, comprising four convolutional layers. Evaluation showed the model achieved 94.48% accuracy on the test data. The claim of balanced performance is supported by high F1-scores across all classes, ranging from 0.87 for 'sad' to 0.98 for 'neutral', indicating no single class dominated the results. These findings demonstrate that the developed corpus and model architecture have strong capability for recognizing emotions from Indonesian speech in a locally relevant context. Maleo Emotion collection is available at https://doi.org/10.57967/hf/6144.