Journal of Applied Data Sciences
Vol 6, No 3: September 2025

Multi-Label Classification of Indonesian Voice Phishing Conversations: A Comparative Study of XLM-RoBERTa and ELECTRA

Hidayat, Ahmad (Unknown)
Madenda, Sarifuddin (Unknown)
Hustinawaty, Hustinawaty (Unknown)



Article Info

Publish Date
20 Jul 2025

Abstract

Mobile phones have become a primary means of communication, yet their advancement has also been exploited by cybercriminals, particularly through voice phishing schemes. Voice phishing is a form of social engineering fraud carried out via telephone conversations to illegally obtain personal or financial information. The complexity of voice phishing continues to increase, as a single conversation may involve multiple fraudulent schemes simultaneously, necessitating the application of multi-label classification to comprehensively identify all motives of fraud. Previous studies have predominantly utilized single-label approaches and foreign-language data, making them less relevant to the Indonesian language context and unable to produce speaker segmentation outputs for conversational analysis. This study contributes by developing a multi-label voice phishing classification system specifically for Indonesian telephone conversations to address this gap. Audio data were collected from open sources and simulated recordings, resulting in a total of 300 samples labeled into six categories: five phishing modes and one non-phishing category. The proposed system consists of a preprocessing pipeline that includes noise reduction, speaker segmentation, automatic transcription, and text cleaning to preserve the context of two-way conversations. Two machine learning models based on transformer architectures, XLM-RoBERTa and ELECTRA, are employed to identify various fraud schemes that may occur simultaneously within a single conversation. The dataset was split into training, validation, and testing sets with two division ratios for performance evaluation. Several combinations of hyperparameters were tested to obtain the most optimal model configuration. Evaluation was conducted using a supervised learning approach and various performance metrics. The experimental results show that XLM-RoBERTa achieved the highest average accuracy of 97.04 ± 1.15% and the highest average F1-score of 92.66 ± 2.59%. These results highlight the novelty of applying multi-label classification in the Indonesian language context for voice phishing detection, contributing to more effective fraud identification in real-world telephony systems.

Copyrights © 2025






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...