Claim Missing Document
Check
Articles

Found 36 Documents
Search

Virality classification from Twitter data using pre-trained language model and multi-layer perceptron Tedjasulaksana, Jeffrey Junior; Girsang, Abba Suganda
Indonesian Journal of Electrical Engineering and Computer Science Vol 35, No 3: September 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v35.i3.pp1952-1962

Abstract

Twitter is one of the well-known text-based social media that is often used to disseminate content. According to Katadata, Indonesia ranked fifth in the world in 2023. So many people or organizations want to make tweets go viral. Therefore, this research aims to develop a model that uses tweet data from the Indonesian language Twitter social media to categorize the level of virality. There are several tasks in classifying the level of virality, such as upsampling data, predicting sentiment and emotion, and text embedding. Upsampling data was carried out because the dataset used was an imbalanced dataset. Data upsampling, emotions, and text embedding is carried out using the bidirectional encoder representation from transformers (BERT) model. Meanwhile, sentiment prediction uses the Ro-bustly optimized BERT pretraining approach (RoBERTa). The results of text embedding, sentiment, emotion, will be combined with Twitter metadata then all features will be fed into the multi-layer perceptron (MLP) model to classifying the level of virality which is divided into 3 classes based on the number of retweets, namely low, medium and high. The proposed method produces an F1-score of 49% and an accuracy of 95% and performs better than the baseline model.
BUSINESS INTELLIGENCE (BI) PRESIDENTIAL CANDIDATES BASED ON SOCIAL NETWORK ANALYSIS (SNA) WITH TWITTER DATA Ali, Ichsan; Girsang, Abba Suganda
Parameter: Journal of Statistics Vol. 4 No. 2 (2024)
Publisher : Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Tadulako

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22487/27765660.2024.v4.i2.17143

Abstract

The twitter social network is widely used to discuss all kinds of topics, including those related to politics. Analyzing online conversations on Twitter to map the popularity of political figures as candidates for the Indonesian presidential election is a popular and challenging research area. In the Twitter network, citizens can express themselves and communicate with political figures. The conversational data in Twitter is very complex, so Business Intelligence is needed to transform raw data into meaningful and useful information to see the popularity of Indonesian presidential election candidates. The analysis used is Social Network Analysis (SNA) by measuring Degree Centrality, Eigenvector Centrality, Betweenness Centrality, Closeness Centrality. The presidential candidates in this study, Ganjar Pranowo with a twitter account “ganjarpranowo”, Puan Maharani with a twitter account “puanmaharani_ri”, and Anies Baswedan with a twitter account “aniesbaswedan”. The actor "aniesbaswedan" excels in the value of degree centrality and betweenness centrality. The “aniesbaswedan” account is the actor who has the most influence on social network interactions based on the total number of interactions generated, then the account also becomes a bridge or liaison in the interactions of other actors in the network.
Automated multi-document summarization using extractive-abstractive approaches Nasari, Maulin; Girsang, Abba Suganda
International Journal of Informatics and Communication Technology (IJ-ICT) Vol 13, No 3: December 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijict.v13i3.pp400-409

Abstract

This study presents a multi-document text summarizing system that employs a hybrid approach, including both extractive and abstractive methods. The goal of document summarizing is to create a coherent and comprehensive summary that captures the essential information contained in the document. The difficulty in multi-document text summarization lies in the lengthy nature of the input material and the potential for redundant information. This study utilises a combination of methods to address this issue. This study uses the TextRank algorithm as an extractor for each document to condense the input sequence. This extractor is designed to retrieve crucial sentences from each document, which are then aggregated and utilised as input for the abstractor. This study uses bidirectional and auto-regressive transformers (BART) as an abstractor. This abstractor serves to condense the primary sentences in each document into a more cohesive summary. The evaluation of this text summarizing system was conducted using the ROUGE measure. The research yields ROUGE R1 and R2 scores of 41.95 and 14.81, respectively.
Transformer-based abstractive indonesian text summarization Aurelia, Miracle; Monica, Sheila; Girsang, Abba Suganda
International Journal of Informatics and Communication Technology (IJ-ICT) Vol 13, No 3: December 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijict.v13i3.pp388-399

Abstract

The volume of data created, captured, copied, and consumed worldwide has increased from 2 zettabytes in 2010 to over 97 zettabytes in 2020, with an estimation of 181 zettabytes in 2025. Automatic text summarization (ATS) will ease giving points of information and will increase efficiency at the time consumed to understand the information. Therefore, improving ATS performance in summarizing news articles is the goal of this paper. This work will fine-tune the BART model using IndoSum, Liputan6, and Liputan6 augmented dataset for abstractive summarization. Data augmentation for Liputan6 will be augmented with the ChatGPT method. This work will also use r ecall-oriented understudy of gisting evaluation (ROUGE) as an evaluation metric. The data augmentation with ChatGPT used 10% of the clean news article from the Liputan6 training dataset and ChatGPT generated the abstractive summary based on that input, culminating in over 36 thousand data for the model’s fine-tuning. BART model that was finetuned using Indosum, Liputan6, and augmented Liputan6 dataset has the best ROUGE-2 score, outperforming ORACLE’s model although ORACLE still has the best ROUGE-1 and ROUGE-L score. This concludes that fine-tuning the BART model with multiple datasets will increase the performance of the model to do abstractive summarization tasks.
Utilization of geocoding for mapping infrastructure impacts and mobility due to floods in indonesia based on twitter analytics Taufiq, Muhammad Imam; Girsang, Abba Suganda
JPPI (Jurnal Penelitian Pendidikan Indonesia) Vol 10, No 3 (2024): JPPI (Jurnal Penelitian Pendidikan Indonesia)
Publisher : Indonesian Institute for Counseling, Education and Theraphy (IICET)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29210/020244467

Abstract

Flooding, a frequent natural disaster in Indonesia, is caused by several factors such as high-intensity rainfall, climate change, inadequate drainage and urban infrastructure challenges, impacting communities, infrastructure and economic activities. The lack of accurate and centralized data hinders government efforts to identify affected areas and respond effectively. Named Entity Recognition (NER), a machine learning-based information extraction tool, offers the potential for geocoding flood-related data from social media, such as Twitter. The purpose of this research is to develop a Named Entity Recognition (NER)-based model to extract location information from Twitter and visualize flood impacts through geocoding. The method used is a combination of Qualitative Analysis with Machine Learning and Geospatial Analysis to assess flooding impacts using Twitter data. Initially, a qualitative analysis of tweets extracts flood-related keywords to identify patterns. Then, Named Entity Recognition (NER) identifies locations, which are converted into geographic coordinates through geocoding for map visualization. The results show that location extraction from flood-related tweets using the Named Entity Recognition (NER) model and geocoding produces very useful and accurate data. About 50% of the flood-related tweets included location tokens, which shows the importance of geographic information in understanding the impact of disasters. The location extraction process using the NER model proved to be effective, although there were some discrepancies between the extracted location tokens and the actual geographic data, especially at the more detailed location level. However, the evaluation results show that 99.5% of the extracted locations correspond to valid locations, especially in the Indonesian region. This shows that the use of the NER model and geocoding is highly effective in analyzing flood impacts and provides significant benefits in disaster management and geospatial analysis based on social media data.
Analysis of named-entity effect on text classification of traffic accident data using machine learning Putra, Anugrah Dwiatmaja; Girsang, Abba Suganda
Indonesian Journal of Electrical Engineering and Computer Science Vol 25, No 3: March 2022
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v25.i3.pp1672-1678

Abstract

With the rising number of accidents in Indonesia, it is still necessary to evaluate and analyze accident data. The categorization of traffic accident data has been developed using word embedding, however additional work is needed to achieve better results. Several informative named entities are frequently sufficient to differentiate whether or not information on a traffic accident exists. Named-entities are informational characteristics that can offer details about a text. The influence of named-entities on thematic text categorization is examined in this paper. The information was collected using a Twitter social media crawl. Preprocessing is done at the beginning of the process to modify and delete useful text as well as label specified entities. On Support Vector Machine (SVM), scheme comparisons were performed for (i) Word Embedding, (ii) the number of occurrences of Named Entities, and (iii) the combination of the two is known as a Hybrid. The Hybrid scheme produced an improvement in classification accuracy of 90.27 percent when compared to Word Embedding scheme and occurrences of named entities scheme, according to tests conducted using 1.885 data consisting of 788 accident data and 1.067 non-accident data.
Multi-layer perceptron hyperparameter optimization using Jaya algorithm for disease classification Novika, Andien Dwi; Girsang, Abba Suganda
Indonesian Journal of Electrical Engineering and Computer Science Vol 35, No 1: July 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v35.i1.pp620-630

Abstract

This study introduces an innovative hyperparameter optimization approach for enhancing multilayer perceptrons (MLP) using the Jaya algorithm. Addressing the crucial role of hyperparameter tuning in MLP’s performance, the Jaya algorithm, inspired by social behavior, emerges as a promising optimization technique without algorithm-specific parameters. Systematic application of Jaya dynamically adjusts hyperparameter values, leading to notable improvements in convergence speeds and model generalization. Quantitatively, the Jaya algorithm consistently achieves convergences at first iteration, faster convergence compared to conventional methods, resulting in 7% higher accuracy levels on several datasets. This research contributes to hyperparameter optimization, offering a practical and effective solution for optimizing MLP in diverse applications, with implications for improved computational efficiency and model performance.
Human detection in CCTV screenshot using fine-tuning VGG-19 Dewangga, Firdaus Angga; Girsang, Abba Suganda
International Journal of Informatics and Communication Technology (IJ-ICT) Vol 14, No 2: August 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijict.v14i2.pp645-652

Abstract

Closed-circuit television (CCTV) systems have generated a vast amount of visual data crucial for security and surveillance purposes. Effectively categorizing security level types is vital for maintaining asset security effectively. This study proposes a practical approach for classifying CCTV screenshot images using visual geometry group (VGG-19) transfer learning, a convolutional neural network (CNN) classification model that works really well in image classification. The task in classification compromise of categorizing screenshots into two classes: “humans present” and “no humans present.” Fine-tuning VGG-19 model attained 98% training accuracy, 98% validation accuracy, and 85% test accuracy for this classification. To evaluate its performance, we compared fine-tuning VGG-19 model with another method. The VGG-19-based fine-tuning model demonstrates effectiveness in handling image screenshots, presenting a valuable tool for CCTV image classification and contributing to the enhancement of asset security strategies.
IndoBART optimization for question answer generation system with longformer attention Andrew, Peter; Girsang, Abba Suganda
International Journal of Informatics and Communication Technology (IJ-ICT) Vol 14, No 2: August 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijict.v14i2.pp478-487

Abstract

The Incorporation of Question Answering system holds immense potential for addressing Indonesia’s educational disparities between the abundance of high school students and the limited number of teachers in Indonesia. These studies aim to enhance the Question Answering System model tailored for the Indonesian language dataset through enhancements to the Indonesian IndoBART model. Improvement was done by incorporating Longformer’s sliding windows attention mechanism into the IndoBART model, it would increase model proficiency in managing extended sequence tasks such as question answering. The dataset used in this research was TyDiQA multilingual dataset and translated the SQuADv2 dataset. The evaluation indicates that the Longformer-IndoBART model outperforms its predecessor on the TyDiQA dataset, showcasing an average 26% enhancement across F1, Exact Match, BLEU, and ROUGE metrics. Nevertheless, it experienced a minor setback on the SQuAD v2 dataset, leading to an average decrease of 0.6% across all metrics.
Solving Simulated Imbalanced Body Performance Data using A-SUWO and Tomek Link Algorithm Febryan Grady; Joel Rizky Wahidiyat; Abba Suganda Girsang
Journal of Applied Engineering and Technological Science (JAETS) Vol. 6 No. 2 (2025): Journal of Applied Engineering and Technological Science (JAETS)
Publisher : Yayasan Riset dan Pengembangan Intelektual (YRPI)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.37385/jaets.v6i2.4738

Abstract

This research examines the impact of various sampling techniques on the performance of classification models in the context of imbalanced datasets, employing the body performance dataset as a case study.  Many studies in this field analyze the effect of sampling techniques on a model performance, however they often begin with imbalance datasets, lacking a balanced baseline for comparison. This research addresses that gap by simulating an imbalanced dataset from an originally balanced dataset, obtaining a target reference point for evaluating the effectiveness of the sampling methods. The dataset is categorized into three versions: (1) a normal distribution, (2) a simulated imbalanced distribution, and (3) a synthesized dataset achieved through various data sampling techniques, including oversampling with Adaptive Semi-Unsupervised Weighted Oversampling (A-SUWO), undersampling with Tomek Link, and hybrid sampling combining both techniques. The primary objective of this research is to identify sampling techniques, when combined with model performance, closely match the performance observed in the original balanced dataset. Based on all experiments using Decision Tree, Random Forest, and K-Nearest Neighbors (KNN) as classifiers, both A-SUWO and Tomek Link led to overfitting due to discernible gap between the training and testing accuracy, averaging 0.21304. Despite overftting and general performance issue, the undersampling with Tomek Link obtained highest test accuracy (0.65023), outperforming A-SUWO (0.62883) and the hybrid approach (0.63568) on average. These findings highlight the importance of appropriate sampling techniques and optimizing model performance in imbalanced datasets.