Claim Missing Document
Check
Articles

Found 3 Documents
Search

Virality classification from Twitter data using pre-trained language model and multi-layer perceptron Tedjasulaksana, Jeffrey Junior; Girsang, Abba Suganda
Indonesian Journal of Electrical Engineering and Computer Science Vol 35, No 3: September 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v35.i3.pp1952-1962

Abstract

Twitter is one of the well-known text-based social media that is often used to disseminate content. According to Katadata, Indonesia ranked fifth in the world in 2023. So many people or organizations want to make tweets go viral. Therefore, this research aims to develop a model that uses tweet data from the Indonesian language Twitter social media to categorize the level of virality. There are several tasks in classifying the level of virality, such as upsampling data, predicting sentiment and emotion, and text embedding. Upsampling data was carried out because the dataset used was an imbalanced dataset. Data upsampling, emotions, and text embedding is carried out using the bidirectional encoder representation from transformers (BERT) model. Meanwhile, sentiment prediction uses the Ro-bustly optimized BERT pretraining approach (RoBERTa). The results of text embedding, sentiment, emotion, will be combined with Twitter metadata then all features will be fed into the multi-layer perceptron (MLP) model to classifying the level of virality which is divided into 3 classes based on the number of retweets, namely low, medium and high. The proposed method produces an F1-score of 49% and an accuracy of 95% and performs better than the baseline model.
Classifying Viral Twitter with Transformer Models and Multi-Layer Perceptron Tedjasulaksana, Jeffrey Junior; Gunawan, Alexander Agung Santoso
Engineering, MAthematics and Computer Science Journal (EMACS) Vol. 7 No. 1 (2025): EMACS
Publisher : Bina Nusantara University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21512/emacsjournal.v7i1.11580

Abstract

The classification of virality levels in Indonesian tweets is explored in this research using advanced natural language processing techniques and machine learning algorithms. Transformer models such as RoBERTa for sentiment analysis and XLNet for text embedding, alongside Multi-Layer Perceptron (MLP) classifiers, are leveraged to address the challenge of predicting tweet virality. Emotion features are incorporated, and cost-sensitive methods for handling class imbalance are implemented, resulting in robust performance demonstrated by our model. Intriguing correlations between tweet sentiment, emotion distribution, and virality levels are uncovered through sentiment analysis and emotion detection. The efficacy of XLNet in capturing contextual nuances, outperforming BERTweet, is highlighted by our findings. Furthermore, the integration of emotion features and cost-sensitive methods enhances the model's predictive accuracy, offering valuable insights for marketers and businesses seeking to optimize their social media strategies. The proposed model achieves an accuracy of 95% and an F1-Score of 59%.
Klasifikasi Tingkat Laju Data Covid-19 Untuk Mitigasi Penyebaran Menggunakan Metode Modified K-Nearest Neighbor (MKNN) Cholissodin, Imam; Evanita, Felicia Marvela; Tedjasulaksana, Jeffrey Junior; Wahyuditomo, Kukuh Wicaksono
Jurnal Teknologi Informasi dan Ilmu Komputer Vol 8 No 3: Juni 2021
Publisher : Fakultas Ilmu Komputer, Universitas Brawijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.25126/jtiik.2021834400

Abstract

COVID-19 atau Coronavirus Disease 2019 merupakan sebuah penyakit yang disebabkan oleh virus yang dapat menular melalui saluran pernapasan pada hewan atau manusia dan menyebabkan ribuan orang meninggal hampir di seluruh dunia, sehingga dinyatakan sebagai sebuah pandemi di banyak negara, termasuk di Indonesia. Kasus COVID-19 pertama kali ditemukan di Indonesia pada tanggal 2 Maret 2020, dalam menangani pandemi COVID-19 pemerintah menerapkan social distancing dengan menjaga jarak antara satu sama lain sejauh lebih dari 1 meter dan menerapkan protokol kesehatan yang telah diatur saat melakukan aktivitas di luar rumah sesuai anjuran World Health Organization (WHO). Rendahnya kesadaran masyarakat Indonesia dalam menerapkan social distancing dan protokol kesehatan menyebabkan bertambahnya kasus positif COVID-19 di Indonesia secara signifikan sehingga banyak korban yang meninggal, oleh karena itu pada penelitian ini kami membuat sistem klasifikasi tingkat laju data COVID-19 untuk mitigasi penyebaran di seluruh provinsi di Indonesia dengan menggunakan metode Modified K-Nearest Neighbor (MKNN) dengan hasil keluaran berupa kelas laju penyebaran yaitu laju penyebaran rendah yang artinya mitigasi penybarannya tinggi, kemudian kelas laju penyebaran sedang yang artinya mitigasi penyebarannya sedang, dan laju penyebaran tinggi yang berarti mitigasi penyebaran rendah dan dijelaskan lebih lanjut pada bagian metodologi penelitian. Hasil keluaran dari sistem bertujuan untuk meningkatkan kesadaran masyarakat Indonesia dalam mencegah COVID-19 dengan melihat kelas laju penyebaran pada masing-masing provinsi di Indonesia. Alasan penggunaan metode Modified K-Nearest Neighbor pada penelitian ini adalah karena metode Modified K-Nearest Neighbor merupakan salah satu metode klasifikasi yang cukup baik, dimana pada metode ini dilakukan pemvalidasian dan pembobotan yang bobot nya ditentukan dengan menghitung fraksi dari tetangga berlabel yang sama dengan total jumlah tetangga. Parameter yang digunakan dalam proses klasifikasi adalah jumlah kasus positif, jumlah orang yang sembuh, dan jumlah orang yang meninggal akibat COVID-19. Data yang digunakan pada penelitian ini berasal dari situs resmi kementerian kesehatan republik Indonesia yang dapat diakses pada link https://infeksiemerging.kemkes.go.id/ dengan jumlah data latih sebanyak 374 data pada tanggal 12 Mei 2020 sampai 22 Mei 2020  dan data uji sebanyak 136 data pada tanggal 23 Mei 2020 sampai tanggal 26 Mei 2020 , hasil akurasi yang dihasilkan adalah 97,79% dengan nilai K = 3. AbstractCOVID-19 or Coronavirus 2019 is a disease caused by a virus that can be transmitted through the respiratory tract to animals or humans and causes more people to die around the world, making it a pandemic in many countries, including Indonesia. COVID-19 cases were first discovered in Indonesia on March 2, 2020. Under the COVID-19 pandemic agreement, the government imposed a social grouping with a grouping of more than 1 meter apart from one another and the transfer of related health protection when carrying out activities outside the home as directed by the World Health Organization(WHO). Considering the Indonesian people in implementing social preservation and protecting health policies increase the positive acquisition of COVID-19 in Indonesia significantly related to the number of victims who died, therefore in this study, we created a COVID-19 data level assessment system for transfer mitigation in all provinces in Indonesia by using the Modified K-Nearest Neighbor (MKNN) method with the output in the form of a spread rate class, namely a low spread rate which means that the spread mitigation is high, then the medium spread rate class which means the spread mitigation is moderate, and the spread rate is high which means low spread mitigation which is further explained in the section on the research methodology. The purpose of the system output is to increase the awareness of the Indonesian people in preventing COVID-19. The parameters used in the classification process are the number of positives, the number of people recovered, and the number of people died by COVID-19 by looking at the class distribution rate in each province in Indonesia. The reason for using the Modified K-Nearest Neighbor method in this research is because the Modified K-Nearest Neighbor method is a fairly good classification method, where this method is validated and weighted whose weight is determined by calculating the fraction of neighbors labeled the same as the total of  neighbors number. The data used in this study was released from the official website of the Ministry of Health of the Republic of Indonesia which can be accessed at the link https://infection.infemerging.kemkes.go.id/ with a total of 374 training data from May 12, 2020 to May 22, 2020 and test data As many as 136 data from 23 May 2020 to 26 May 2020, the resulting accuracy was 97.79% with a K = 3.