Claim Missing Document
Check
Articles

Maleo-Short: An "In-the-Wild" Indonesian Dataset for Speaker Diarization Mardiana, Ardi; Muslimah, Dinda Desmonda; Bastian, Ade; Irawan, Eka Tresna
JOIN (Jurnal Online Informatika) Vol 11 No 1 (2026)
Publisher : Department of Informatics, UIN Sunan Gunung Djati Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15575/join.v11i1.1781

Abstract

Speaker diarization (SD), the task of partitioning an audio stream into speaker-homogenous segments, is fundamental for analyzing multi-speaker recordings. Its application to “in-the-wild” data, such as content from the YouTube platform, poses significant challenges, including overlapped speech, ambient noise, and rapid speaker turns, thereby constituting an active research area. While numerous SD datasets are available, they predominantly focus on English and other high-resource languages. A notable scarcity of publicly accessible datasets exists for the Indonesian language, as extant corpora are primarily engineered for Automatic Speech Recognition (ASR). To address this resource deficit, this research introduces Maleo-Short, a new Indonesian multi-speaker dataset derived from YouTube. The dataset comprises 110 short conversational clips, with a total duration of 1 hours 32 minutes. A reliable ground truth was established through a meticulous manual annotation process using ELAN to generate precise speaker segmentation and transcription files. To validate its utility and assess its complexity, the dataset was evaluated using pre-trained baseline models. The empirical results confirm its status as a challenging benchmark, with the most effective models achieving a Diarization Error Rate (DER) of 32.64% and a Word Error Rate (WER) of 33.78%. Maleo-Short is presented as a valuable, publicly accessible resource intended to catalyze advancements in Indonesian speaker diarization research by facilitating the development and rigorous evaluation of SD systems on acoustically complex and realistic conversational data. Maleo-Short is available at https://doi.org/10.57967/hf/7944.  
Maleo Emotion Audio Dataset Indonesia For Emotion Classification Mardiana, Ardi; Permana, Sri Mentari Widya Ningrum; Ii Sopiandi; Ade Bastian; Irawan, Eka Tresna
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control Vol. 11, No. 2, May 2026 (Article in Progress)
Publisher : Universitas Muhammadiyah Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22219/kinetik.v11i2.2474

Abstract

The limited availability of voice emotion datasets in Indonesian poses a challenge in the development of Speech Emotion Recognition (SER) systems, even though the need for such systems continues to grow in various sectors such as customer service, education, and human-computer interaction. To address this challenge, this study developed the Maleo Emotion Audio Dataset, a collection of three-second audio clips labeled with seven emotion categories: angry, neutral, disgusted, sad, happy, afraid, and surprised. The data was collected from the YouTube platform, and the Maleo Emotion Dataset is available at https://huggingface.co/datasets/maleo-ai/maleo-emotion. It was processed through preprocessing, feature extraction, and augmentation stages. The five main features extracted include Zero Crossing Rate, energy, Mel-Frequency Cepstral Coefficients (MFCC), spectral roll-off, and spectral flux. To enhance generalization, augmentation techniques such as pitch shifting, noise injection, and time stretching were applied. The classification model was built using a Convolutional Neural Network (CNN) architecture with TensorFlow-based implementation. Evaluation showed that the model achieved 94.48% accuracy on the test data, with balanced performance across all emotion categories. These results demonstrate that the developed dataset and model architecture have high capability in effectively recognizing emotions from Indonesian speech in a locally relevant context.
Co-Authors Abrar Wahid Abu Bakar, Abib Maftuh ade rahmawati Adnan Arshad Ai Komariah Alam, Muhammad Quthbul Aldri Frinaldi Ano Tarsono Ardi Mardiana Ardi Mardiana Arif Yusuf Budiman Aripin, Ali Maulana Hapid Arshad, Adnan Asep Rachmat Asyhari, Muhammad Fiddiana Azkiya, Muhammad Azkal Badhel, Yasser Gibran Berliani, Mega Billy Adrian Fernanda Budiman Budiman Cesoria, Yola Zerlinda Dadan Romadhoni Dadan Zaliluddin Dadan Zaliluddin Destiani, Putri Dety Sukmawati Devi Sukrisna Diana Surya Heriyana Didin Rudini Didin Rudini Dimas, Fadli Dinda Sri Wulansari Dony Susandi Eka Tresna Irawan Erdiyanti, Yucky Putri Fahmi Aziz, Muhamamad Fernanda, Billy Adrian Firmansyah, Mochammad Bagasnanda Fitriani, Nadila Fitriyani, Rofi Hafsari, Zacky Haq, Rosdiana Harti, Adi Oksifa Rahma Harun Sujadi Hermawan, Dicky Ida Marina Ii Sopiandi, Ii Imas Naimah Hasnah Indra Permana, Indra Indradewa, Rhian Irawan, Eka Tresna Jabbar, Fathir Abdul Khoerunissa, Salsa Koswara, Engkos Kovertina Rakhmi Indriana Kusumadewi, Intan Latiful Abror Lia Milana Lidya Tresna Wahyuni Mega Berliani Miftahuddin Al-Aziz Mochammad Bagasnanda Firmansyah Mochammad Bagasnanda Firmansyah Muhamamad Rifki Muhammad Fahmi Ajiz Muhammad Iqbal Rizmaya Muhammad Iqbal Rizmaya Muhammad Rifki Muhammad Rifki Muhammad Syifa Al Maroghi Muhammad Taufiq Muhammad Taufiq Mukhlis Muslimah, Dinda Desmonda Nadya Pratiwi Aisha Bakhtiar Nana Sutrisna Nana Sutrisna Nia Kurniati Nisa Brian Sulaeman Nugraha, Algi Nugraha, Faisol Nugraha, Rezha Nunu Nurdiana, Nunu Nurfajriah, Riska Nurhilda, Pebby Nurhimah, Enung Pangarsi Dyah Kusuma Wardani, Siti Pangestu, Arki Aji Pauzan, Muh Permana, Iip Indra Permana, Sri Mentari Widya Ningrum Prahara, Ervin Gusti Dwi Priyadi, Deni Purnama, Crisda Putra, Agam Maulana Rahayu, Syifaa Puspita Riepah, Ipah Rifki, Muhamamad Riki Riyanto Riri Nurazizah Ristina Siti Sundari Rivki Anja Afrenda Rohmanudin, Wildan Rusmanto, Ayu Hafidzah Rusyn, Volodymyr Safari Yonasi Salwa, Alya Jihan Sandi Fajar Rodiansyah Sarmidi Sarmidi Sarmidi Sarmidi Satria Winata Sidik Zapar Sidik Sudjana, Muhammad Ridwan Shaleh Tantri Wahyuni Tika Sifana Tri Ferga Prasetyo Usup Suparma Vini Arifiani Rohmat Volodymyr Rusyn Wahid, Abrar Wahyuni, Kartika Sri Wahyuni, Lidya Tresna Whydiantoro Wildan Rohmanudin Wildan Zhilal Manafi Wiranagari, Relifa G Yofi Awwaluddin Yunus, Riza M ZAPAR SIDIK, SIDIK