Claim Missing Document
Check
Articles

Handling Imbalance Dataset on Hoax Indonesian Political News Classification using IndoBERT and Random Sampling Fathin, Muhammad Ammar; Sibaroni, Yuliant; Prasetyowati, Sri Suryani
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 8, No 1 (2024): Januari 2024
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v8i1.7099

Abstract

The rapid adoption of the internet in Indonesia, with over 200 million active users as of January 2022, has dramatically transformed information dissemination, particularly through social media and online platforms. These platforms, while democratizing information sharing, have also become hotbeds for the spread of misinformation and hoaxes, significantly impacting the political landscape, as seen in the Jakarta gubernatorial election from late 2016 to April 2017. Research by the Indonesian Telematics Society (MASTEL) revealed a high prevalence of hoax content, predominantly socio-political, underscoring the critical need to address this misinformation and hoaxes challenge. This research delves into the challenge of detecting hoaxes in Indonesian political news, particularly focusing on the classification of news as factual or hoax in the presence of class imbalances within datasets. The dataset exhibits a significant class imbalance with 6,947 articles identified as hoaxes and 20,945 as non-hoaxes, Utilizing the IndoBERT model, a specialized variant of the BERT framework pre-trained on the Indonesian language, the study aims to assess its effectiveness in discerning between factual and hoax news. This involves fine-tuning IndoBERT for specific text classification tasks and exploring the impact of various resampling techniques, such as Random Over Sampling and Random Under Sampling, to address class imbalances since the dataset, significantly imbalanced with 6,947 articles labeled as hoaxes and 20,945 as non-hoaxes, necessitated these approaches. The study's findings demonstrate the IndoBERT model's consistent accuracy across different resampling methods like Random Over Sampling (ROS) and Random Under Sampling (RUS), highlighting its effectiveness in handling imbalanced datasets produce the accuracy of hoax detection with the 98.2% accuracy, 97.5% Recall, 97.8% F1-score, and 97.2% Precision. This is particularly relevant for tasks like misinformation detection, where data imbalance is common. The success of IndoBERT, a language-specific BERT model, in text classification for the Indonesian language contributes to the understanding of BERT-based models in diverse linguistic contexts.
Sentiment Analysis on Twitter(X) Related to Relocating the National Capital using the IndoBERT Method using Extraction Features of Chi-Square Arista, Dufha; Sibaroni, Yuliant; Prasetyo, Sri Suryani
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 8, No 1 (2024): Januari 2024
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v8i1.7198

Abstract

Sentiment analysis or commonly referred to as opinion mining is a field of science that can be used to get the percentage of positive sentiment and negative sentiment towards a person, company, institution, product, or even an issue or topic. Various topics are discussed on social media, one of which is Twitter (X). Starting from the economy, politics, social, culture, law and others. One of the most discussed topics on Twitter (X) is the transfer of Indonesia's capital city to East Kalimantan Province, which has drawn various opinions from netizens on Twitter (X). In this study, data regarding the transfer of the national capital taken by the author was taken from social media, namely from the social media Twitter (X) with a date range of January 1, 2022 to February 28, 2022. The method used in this research is IndoBERT using Chi-Square. Based on the experiments that have been carried out, the performance of IndoBERT with Chi-square selection features shows good results with an overall accuracy value of 94%, a precision value of 85%, a recall value of 91%, and an f1 value of 88.4% for all datasets.
Penerapan Metode Long Short-Term Memory dan Word2Vec dalam Analisis Sentimen Ulasan pada Aplikasi Ferizy Shyahrin, Mega Vebika; Sibaroni, Yuliant; Puspandari, Diyas
Techno.Com Vol. 22 No. 4 (2023): November 2023
Publisher : LPPM Universitas Dian Nuswantoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33633/tc.v22i4.9205

Abstract

Tranportasi merupakan hal yang penting bagi masyarakat dalam mobilitas sehari-hari. Karena memiliki peranan penting dan dapat memudahkan kehidupan masyarakat, pemerintah mulai mengoptimalkan pembangunan sarana transportasi dan memulai inovasi digital, salah satunya pada moda transportasi laut. Perseroan Terbatas Angkutan Sungai, Danau, dan Penyeberangan Indonesia (PT ASDP)  meluncurkan aplikasi Ferizy pada Google Playstore. Dalam inovasi ini, sentimen masyarakat dapat membantu untuk mengetahui kepuasan, kekurangan, saran, dan kritik. Terkait hal tersebut maka diperlukan analisis sentimen untuk memahami maksud ulasan. Analisis ini mengekstrak data ulasan lalu mengolah data tekstual secara otomatis untuk mendapatkan makna sentimen yang terkandung dalam ulasan. Penelitian ini mengimplementasikan klasifikasi Long Short-Term Memory (LSTM) dan ekstraksi fitur Word2Vec variasi skip-gram serta CBOW pada dataset ulasan aplikasi Ferizy. Hasil pengujian dari model menghasilkan nilai akurasi sebesar 88,20% untuk variasi skip-gram dan 74,20% untuk variasi CBOW.
Impact of Feature Extraction on Multi-Aspect Sentiment Classification for Livin'byMandiri Using BiLSTM Atikah, Balqis Sayyidahtul; Sibaroni, Yuliant; Puspandari, Diyas
Journal La Multiapp Vol. 5 No. 5 (2024): Journal La Multiapp
Publisher : Newinera Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.37899/journallamultiapp.v5i5.1541

Abstract

Mobile applications are currently experiencing very rapid development including applications in the financial sector. Livin'byMandiri is one of the mobile applications used to transact online without the need to go to the bank. This makes it very easy for customers to transact anywhere and anytime. Application reviews are user reviews that reflect the reputation of the application among the community, these application reviews can be found anywhere, so many companies use application reviews as a reference in developing their applications in the future. However, people's opinions on apps can vary and are influenced by many aspects. Therefore, aspect-based sentiment analysis can be applied to app reviews to get better results. This research focuses on analyzing the sentiment of Livin'byMandiri app reviews on the Google Play Store. In this research, the Bidirectional LSTM (Bi-LSTM) method is combined with TF-IDF and Word2Vec feature extraction. From the results of the experiments that have been carried out, the best accuracy results for the access aspect are 81.18% and F1-Score of 81.03%, the service aspect produces an accuracy of 82.82% and F1-Score of 82.74%, and for the convenience aspect produces an accuracy of 77.28% and F1-Score of 77.19%. In this experiment, it is also found that feature extraction has an effect on sentiment analysis, this is evidenced by an increase in accuracy of more than 1% for each aspect when TF-IDF feature extraction is added and also the combination of TF-IDF and Word2vec in the initial model built using only the Neural Network embedding layer.
WORD EMBEDDING OPTIMIZATION IN SENTIMENT ANALYSIS OF REVIEWS ON MYTELKOMSEL APP USING LONG SHORT-TERM MEMORY AND SYNTHETIC MINORITY OVER-SAMPLING TECHNIQUE Haziq, Muhammad Raffif; Sibaroni, Yuliant; Prasetyowati, Sri Suryani
Jurnal Teknik Informatika (Jutif) Vol. 5 No. 6 (2024): JUTIF Volume 5, Number 6, Desember 2024
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2024.5.6.2498

Abstract

Telkomsel is one of the internet service provider companies that has a mobile-based application called MyTelkomsel which functions to facilitate users in conducting online services independently. Users of the application certainly have their own responses about the application, so that users can provide responses to the application. Therefore, sentiment analysis can be one of the solutions to find out public sentiment towards the application. In this research, the author builds a system for sentiment analysis using word embedding Word2vec, GloVe, FastText to get word representation in vector form with classification using Long Short-Term Memory (LSTM) combined with Synthetic Minority Over-sampling Technique (SMOTE) which can handle data imbalance. The data used comes from user reviews of the MyTelkomsel application found on the Google Play Store. This study compares the performance of several word embedding in LSTM and LSTM-SMOTE classifiers. The results showed the results show that the performance of three-word embedding on the LSTM model is superior compared to the LSTM-SMOTE model. Overall, it was found that the combination of FastText and LSTM gave the best performance compared to the other five combinations with an accuracy value of 89.11%.
HATE SPEECH DETECTION USING GLOVE WORD EMBEDDING AND GATED RECURRENT UNIT Ardana, Aulia Riefqi; Sibaroni, Yuliant
Jurnal Teknik Informatika (Jutif) Vol. 5 No. 6 (2024): JUTIF Volume 5, Number 6, Desember 2024
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2024.5.6.2557

Abstract

Social media has become a tool that makes it easier for people to exchange information. The freedom to share information has opened the door for increased incidents of hate speech on social media. Hate speech detection is an interesting topic because with the increasing use of social media, hate speech can quickly spread and trigger significant negative impacts, discrimination, and social conflict. This research aims to see the effect of GRU method, GloVe word embedding and word modifier algorithm in detecting hate speech. GRU and GloVe are used in this research for the hate speech detection system, where deep learning with a Gated Recurrent Unit (GRU) and Word Embedding with the Global Vector model (GloVe) converts words in text into numerical vectors that represent the meaning and context of the words. GRU is chosen due to its ability to capture long-term dependencies in textual data with higher computational efficiency compared to Long Short-Term Memory (LSTM). Gated Recurrent Unit (GRU) model processes the sequence of words to understand the sentence structure. GRU model processes the sequence of words to understand the sentence structure. The evaluation results for the classification of hate speech using GRU and GloVe are 90.7% accuracy and 91% F1 score. With the combination of informal word modifier algorithms there is an increase with a value of 92.8% F1 and 92.4% accuracy. in conclusion, the use of informal word modifier algorithms can increase the evaluation value in detecting hate speech.
Public Sentiment Dynamics: Analysis of Twitter/X Data on the 2024 Indonesian Election with NB-SVM Satyananda, Karuna Dewa; Sibaroni, Yuliant
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 8, No 3 (2024): Juli 2024
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v8i3.7710

Abstract

This research analyzes the dynamics of public sentiment towards three pairs of presidential candidates in the 2024 Indonesian Election. This research was conducted using Twitter data as a source of information to gain a deeper understanding of the pattern of public sentiment during six crucial phases in the context of the election. The data is analyzed periodically during the election period. Sentiment analysis was carried out using the Naïve Bayes-Support Vector Machine classification approach to understand the sentiment patterns that emerged in each phase. NB-SVM utilizes class frequency information from NB to weight features, then trains separate SVMs for each class using these weighted features, improving classification accuracy. Models using NB-SVM classification produce better accuracy than models using NB and SVM classification, with an average accuracy of 76%. In Pair 01, a dynamic pattern was formed, namely a decrease in the level of positive sentiment during the debate and increasing again at a later time. Meanwhile, for Pair 02 and 03, a pattern was not formed for different reasons, namely sentiment that was too stable for Pair 02, and unstable sentiment for Pair 03. While Pair 01 obtained the most positive sentiment, Pair 02 received the most negative, with an average of 65.19% during the election process. This research proves that the results of sentiment analysis on Twitter/X contradict the official results by KPU of the general election in Indonesia.
Sentiment Analysis on TikTok App using Long Short-Term Memory (LSTM) with Stochastic Gradient Descent (SGD) Optimization Rizky, Muhammad Zacky Faqia; Sibaroni, Yuliant; Prasetiyowati, Sri Suryani
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 8, No 3 (2024): Juli 2024
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v8i3.7699

Abstract

TikTok is currently one of the most popular social media apps. The site contains content that is creative, educational, innovative, as well as content that features lifestyle, cyberbullying, and inappropriate behavior. These diverse contents can trigger both positive and negative sentiments. This research aims to analyze the sentiment of the TikTok application by integrating feature extraction techniques, feature expansion, and optimization algorithms to improve the performance of the Long Short-Term Memory (LSTM) model. This research uses a dataset of 15,049 TikTok app reviews from the Google Play Store. Sentiment analysis is performed through four scenarios: the first scenario uses the LSTM model as the basis for classification, the second scenario combines LSTM with Word2Vec as feature extraction to convert initially unstructured text data into a structured format, the third scenario integrates LSTM and Word2Vec with FastText as feature expansion to improve the quality of representation and the model's ability to understand complex contexts, and the fourth scenario adds the Stochastic Gradient Descent (SGD) optimization algorithm to help improve the performance of the LSTM model. The results obtained showed that through the integration of feature extraction techniques, feature expansion, and optimization algorithms, the performance of LSTM increased by 7.44%. This research successfully developed an effective method that proved positive outcomes and will contribute to the development of a sentiment analysis system designed to help policymakers and application developers solve negative issues.
Multilabel Hate Speech Classification in Indonesian Political Discourse on X using Combined Deep Learning Models with Considering Sentence Length Angger Saputra, Revelin; Sibaroni, Yuliant
Jurnal Ilmu Komputer dan Informasi Vol. 18 No. 1 (2025): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Informatio
Publisher : Faculty of Computer Science - Universitas Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21609/jiki.v18i1.1440

Abstract

Hate speech, as public expression of hatred or offensive discourse targeting race, religion, gender, or sexual orientation, is widespread on social media. This study assesses BERT-based models for multi-label hate speech detection, emphasizing how text length impacts model performance. Models tested include BERT, BERT-CNN, BERT-LSTM, BERT-BiLSTM, and BERT with two LSTM layers. Overall, BERT-BiLSTM achieved the highest (82.00%) and best performance on longer texts (83.20% ) with high and , highlighting its ability to capture nuanced context. BERT-CNN excelled in shorter texts, achieving the highest (79.80%) and an of 79.10%, indicating its effectiveness in extracting features in brief content. BERT-LSTM showed balanced and across text lengths, while BERT-BiLSTM, although high in r, had slightly lower on short texts due to its reliance on broader context. These results highlight the importance of model selection based on text characteristics: BERT-BiLSTM is ideal for nuanced analysis in longer texts, while BERT-CNN better captures key features in shorter content.
Spatio-temporal COVID-19 Spread Prediction: Comparing SVM with Time-Expanded Features and RNN Models Gusti Aji, Raden Aria; Prasetiyowati, Sri Suryani; Sibaroni, Yuliant
Building of Informatics, Technology and Science (BITS) Vol 6 No 4 (2025): March 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v6i4.6548

Abstract

Covid-19 which spread in early 2020, still needs to be observed, considering the high growth rate of the pandemic at that time. The right prediction model is needed, because it can estimate the speed and extent of its spread for some time to come. This study develops a prediction model for the classification of the spread of Covid-19 in the future using SVM with time-based feature expansion and RNN. The scenario developed to determine the effect of time-based feature expansion and kernel function on classification performance using time series and spatial data. The results obtained show that SVM with time-based feature expansion achieves the most optimal performance using a polynomial kernel with an accuracy of 96.23%, a precision of 96.48%, a recall of 96.23%, and an F1-score of 96.21%. The performance of the SVM is superior to RNN which achieves an accuracy of 93.55%, a precision of 87.51%, a recall of 93.55%, and an F1-score of 90.43. Spatial prediction using Kriging interpolation can provide an overview of the spread of COVID-19 in all villages in Bandung City. The contribution of this research can provide much-needed information for policy makers and the community in managing future pandemic predictions and management strategies in the field of public health.
Co-Authors Abduh Salam Adhe Akram Azhari Aditya Andar Rahim Aditya Firman Ihsan Aditya Gumilar Aditya Iftikar Riaddy Adiwijaya Agi Maulana Al Ghazali, Nabiel Muhammad Alfauzan, Muhammad Fikri Alya, Hasna Rafida Andrew Wilson Angger Saputra, Revelin Annisa Aditsania Apriani, Iklima Aqilla, Livia Naura Ardana, Aulia Riefqi Arista, Dufha Arminta, Adisaputra Nur Arya Pratama Anugerah Asramanggala, Muhammad Sulthon Atikah, Balqis Sayyidahtul Attala Rafid Abelard Aufa, Rizki Nabil Aulia Rayhan Syaifullah Aurora Az Zahra, Elita Azmi Aulia Rahman Bunga Sari Chamadani Faisal Amri Chindy Amalia Claudia Mei Serin Sitio Damar, Muhammad Damarsari Cahyo Wilogo Delvanita Sri Wahyuni Derwin Prabangkara Desianto Abdillah Devi Ayu Peramesti Dhina Nur Fitriana Dhina Nur Fitriana Diyas Puspandari Ekaputra, Muhammad Novario Ellisa Ratna Dewi Ellisa Ratna Dewi Elqi Ashok Erwin Budi Setiawan Fadhilah Nadia Puteri Fadli Fauzi Zain Fairuz, Mitha Putrianty Faiza Aulia Rahma Putra Farizi, Azziz Fachry Al Fatha, Rizkialdy Fathin, Muhammad Ammar Fatihah Rahmadayana Fatri Nurul Inayah Fauzaan Rakan Tama Feby Ali Dzuhri Fery Ardiansyah Effendi Ferzi Samal Yerzi Fhira Nhita Fitriansyah, Alam Rizki Fitriyani Fitriyani F. Fitriyani Fitriyani Fitriyani Fitriyani Gilang Brilians Firmanesha Gusti Aji, Raden Aria Gutama, Soni Andika Hanif, Ibrahim Hanurogo, Tetuko Muhammad Hanvito Michael Lee Hawa, Iqlima Putri Haziq, Muhammad Raffif I Gusti Ayu Putu Sintha Deviya Yuliani I Putu Ananda Miarta Utama Ibnu Muzakky M. Noor Indra Kusuma Yoga Indwiarti irbah salsabila Irfani Adri Maulana Irma Palupi Islamanda, Muhammad Dinan Izzan Faikar Ramadhy Izzatul Ummah Janu Akrama Wardhana Jauzy, Muhammad Abdurrahman Al Kemas Muslim Lhaksmana Kinan Salaatsa, Titan Ku Muhammad Naim Ku Khalif Lanny Septiani Laura Imanuela Mustamu Lesmana, Aditya Lintang Aryasatya Lisbeth Evalina Siahaan Made Mita Wikantari Mahadzir, Shuhaimi Maharani, Anak Agung Istri Arinta Mahmud Imrona Maulida , Anandita Prakarsa Mitha Putrianty Fairuz Muhamad Agung Nulhakim Muhammad Arif Kurniawan Muhammad Damar Muhammad Ghifari Adrian Muhammad Hadyan Baqi Muhammad Ikram Kaer Sinapoy Muhammad Kiko Aulia Reiki Muhammad Novario Ekaputra Muhammad Rajih Abiyyu Musa Muhammad Reza Adi Nugraha Muldani, Muhamad Dika Nanda Ihwani Saputri Naufal Alvin Chandrasa Ni Made Dwipadini Puspitarini Niken Dwi Wahyu Cahyani Novitasari, Ariqoh Nuraena Ramdani Okky Brillian Hibrianto Okky Brillian Hibrianto Pernanda Arya Bhagaskara S M Pilar Gautama, Hadid Prasetiyowati, Sri Prasetyo, Sri Suryani Prasetyowati, Sri Sulyani Prawiro Weninggalih Priyan Fadhil Supriyadi Purwanto, Brian Dimas Puspandari, Dyas Putra, Daffa Fadhilah Putra, Ihsanudin Pradana Putra, Maswan Pratama Putri, Dinda Rahma Putri, Pramaishella Ardiani Regita Rachmadania Irmanita Rafik Khairul Amin Rafika Salis Rahmanda, Rayhan Fadhil Raisa Benaya Revi Chandra Riana Rian Febrian Umbara Rian Putra Mantovani Ridha Novia Ridho Isral Essa Ridho, Fahrul Raykhan Rifaldy, Fadil Rifki Alfian Abdi Malik Riski Hamonangan Simanjuntak Rizki Annas Sholehat Rizky Fauzi Ramadhani Rizky Yudha Pratama Rizky, Muhammad Zacky Faqia Salis, Rafika Salsabila, Syifa Saniyah Nabila Fikriyah Saragih, Pujiaty Rezeki Satyananda, Karuna Dewa Septian Nugraha Kudrat Septian Nugraha Kudrat Serly Setyani Shyahrin, Mega Vebika Sinaga, Astria M P Siti Inayah Putri Siti Uswah Hasanah Sri Suryani Prasetiyowati Sri Suryani Prasetyowati Sri Suryani Sri Suryani Sri Utami Sujadi, Cika Carissa Suryani Prasetyowati, Sri Syarif, Rizky Ahsan Umulhoir, Nida Varissa Azis, Diva Azty Viny Gilang Ramadhan Vitria Anggraeni WAHYUDI, DIKI Widya Pratiwi Ali Winico Fazry Wira Abner Sigalingging Zaenudin, Muhammad Faisal Zaidan, Muhammad Naufal Zain, Fadli Fauzi ZK Abdurahman Baizal