Claim Missing Document
Check
Articles

Found 3 Documents
Search

Comparison of LSTM, SVM, and naive bayes for classifying sexual harassment tweets Lailatul Nikmah, Tiara; Ammar, Muhammad Zhafran; Allatif, Yusuf Ridwan; Husna, Rizki Mahjati Prie; Kurniasari, Putu Ayu; Bahri, Andi Syamsul
Journal of Soft Computing Exploration Vol. 3 No. 2 (2022): September 2022
Publisher : SHM Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/joscex.v3i2.85

Abstract

Twitter is now a very open and extensive social media; anyone can freely express their opinion on any topic on social media. The content or discussion on Twitter is also quite diverse and unlimited. However, because it is unlimited, many misuse it for negative things. One of them is verbal sexual harassment through Twitter. This research aims to identify sexual harassment in an Indonesian tweet using sentiment analysis using the LSTM, SVM, and naive bayes methods with text normalization. In this study, 2990 tweets in the Indonesian language were tested from 4th to 6th in May 2022. The Twitter data shows that tweets included in sexual harassment are more than those not included in sexual harassment, totaling 2026 data. From the results of the evaluation of tweet data classification using text normalization with LSTM, the accuracy is 84.62%, SVM is 86.54%, and naive bayes is 85.45%. Using the SVM algorithm with text normalization gets the highest accuracy compared to LSTM and naive bayes in classifying Indonesian sexual harassment tweets.
Deep Learning-Based Detection of Online Gambling Promotion Spam in Indonesian YouTube Comments Ammar, Muhammad Zhafran; Putra, Ricky Eka; Yamasari, Yuni
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.11240

Abstract

Online gambling promotion has increasingly penetrated social media platforms, with YouTube comments becoming a frequent target for spam-based advertising. Such activities not only violate platform policies but also expose users to harmful content. Addressing this issue requires automated detection systems capable of handling noisy, informal, and highly imbalanced text data. This study investigates the effectiveness of four recurrent neural architectures LSTM, GRU, BiLSTM, and BiGRU for detecting gambling promotion comments in Indonesian YouTube data. To address class imbalance, multiple experimental scenarios were explored, including the original distribution, undersampling, oversampling, and class weighting. Model performance was evaluated using accuracy, precision, recall, F1-score, ROC-AUC, and confusion matrix analysis. The results show that bidirectional models outperformed their unidirectional counterparts, with BiGRU achieving the best overall performance. When combined with class weighting, BiGRU reached 98% accuracy, 0.83 F1-score, and 0.971 ROC-AUC, demonstrating a superior ability to detect minority-class instances. Oversampling improved recall substantially but increased false positives, while undersampling reduced accuracy; class weighting provided the most balanced performance across metrics. These findings confirm that BiGRU with class weighting offers the most practical balance between accuracy, recall, and computational efficiency, making it well-suited for real-time moderation systems. The study provides a strong foundation for future research on transformer-based architectures and cross-platform spam detection in Indonesian social media environments.
Efficient hierarchical summarization of long legal documents using a lightweight transformer and divide and conquer strategy Ammar, Muhammad Zhafran; Ricky Eka Putra; Yuni Yamasari
Journal of Soft Computing Exploration Vol. 7 No. 2 (2026): June 2026
Publisher : SHM Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/joscex.v7i2.5

Abstract

This research addresses the challenges of summarizing long and complex legal documents, which often exceed the input length limitations of transformer-based models and contain intricate legal reasoning structures. The purpose of this study is to develop an efficient and scalable summarization framework that preserves semantic fidelity and structural coherence in judicial summaries. To achieve this objective, a hybrid summarization pipeline is proposed by integrating a Bidirectional Encoder Representations from Transformers (BERT)-based extractive model with a hierarchical abstractive model based on Distilled Bidirectional and Auto-Regressive Transformers (DistilBART), combined with a Divide-and-Conquer strategy. The proposed method partitions long legal documents into smaller segments, processes each segment independently, and reconstructs them into a coherent final summary. Experiments were conducted on the Indian Legal Case Summarization dataset and evaluated using Recall-Oriented Understudy for Gisting Evaluation (ROUGE), BERTScore, and Cosine Similarity to assess both lexical overlap and semantic similarity. The results show that the hierarchical DistilBART model outperforms the extractive baseline, achieving a ROUGE-1 score of 0.3802 and a Cosine Similarity of 0.6917. These findings demonstrate that the proposed framework provides an effective solution for long-document summarization in the legal domain.