Speaker diarization (SD), the task of partitioning an audio stream into speaker-homogenous segments, is fundamental for analyzing multi-speaker recordings. Its application to “in-the-wild” data, such as content from the YouTube platform, poses significant challenges, including overlapped speech, ambient noise, and rapid speaker turns, thereby constituting an active research area. While numerous SD datasets are available, they predominantly focus on English and other high-resource languages. A notable scarcity of publicly accessible datasets exists for the Indonesian language, as extant corpora are primarily engineered for Automatic Speech Recognition (ASR). To address this resource deficit, this research introduces Maleo-Short, a new Indonesian multi-speaker dataset derived from YouTube. The dataset comprises 110 short conversational clips, with a total duration of 1 hours 32 minutes. A reliable ground truth was established through a meticulous manual annotation process using ELAN to generate precise speaker segmentation and transcription files. To validate its utility and assess its complexity, the dataset was evaluated using pre-trained baseline models. The empirical results confirm its status as a challenging benchmark, with the most effective models achieving a Diarization Error Rate (DER) of 32.64% and a Word Error Rate (WER) of 33.78%. Maleo-Short is presented as a valuable, publicly accessible resource intended to catalyze advancements in Indonesian speaker diarization research by facilitating the development and rigorous evaluation of SD systems on acoustically complex and realistic conversational data. Maleo-Short is available at https://doi.org/10.57967/hf/7944.