Claim Missing Document
Check
Articles

Found 22 Documents
Search

Aspect-Based Sentiment Analysis of Transportation Electrification Opinions on YouTube Comment Data Adilla, Rahmi Elfa; Huda, Muhammad; Aziz, Muhammad; Suadaa, Lya Hulliyyatus
Jurnal Aplikasi Statistika & Komputasi Statistik Vol 16 No 2 (2024): Jurnal Aplikasi Statistika & Komputasi Statistik
Publisher : Politeknik Statistika STIS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34123/jurnalasks.v16i2.790

Abstract

Introduction/Main Objectives: This research aims to conduct an aspect-based sentiment analysis of transportation electrification opinions on YouTube comment data. Background Problems: It is difficult to summarize the sentiment of many YouTube user comments related to electric vehicles (EVs) based on their aspects; therefore, aspect-based sentiment analysis is needed to conduct further analysis. Novelty: This study identifies five aspects of EV and their sentiments at the same time. The aspects are usefulness, ease of use, comfort, cost, and incentive policies. One of this study’s methods is the transfer learning model. This model can be a solution to overcome the shortcomings of deep learning in classifying aspect-based sentiment classification on small datasets. Research Methods: The sentiment classification model used is a machine learning model, namely support vector machine (SVM) and transfer learning models from pre-trained IndoBERT and mBERT. Finding/Results: Based on the experimental results, transfer learning from the IndoBERT model achieved the best performance with accuracy and F1-Score of 89.17% and 52.66%, respectively. Furthermore, the best IndoBERT model was developed with input in the form of a combination of aspects and comment sentences. Experimental results show that there is an improvement in performance with accuracy and F1-Score of 90% and 60.70%, respectively.
Kajian Penerapan Machine Learning untuk Sistem Rekomendasi Mitra Statistik BPS Septianugraha, Damar; Wilantika, Nori; Suadaa, Lya Hulliyyatus; Prasetyo, Rindang Bangun; Huraira, Sabit
Seminar Nasional Official Statistics Vol 2024 No 1 (2024): Seminar Nasional Official Statistics 2024
Publisher : Politeknik Statistika STIS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34123/semnasoffstat.v2024i1.2211

Abstract

BPS routinely conducts censuses and surveys involving BPS partners in data collection and processing. Ensuring these partners exhibit good performance is crucial to minimize the risk of moral hazard, which can negatively impact stakeholders. This research aims to implement machine learning into an information system to recommend statistical partners based on classification results. The best model identified is XGBoost, which is integrated into the system for generating recommendations. System testing using black-box methods confirmed compliance in 41 scenarios. Additionally, the System Usability Scale (SUS) questionnaire yielded an average score of 65.5, indicating the system's potential and suitability for further development. The findings offer insights into utilizing partner characteristics data and evaluation in BPS's censuses and surveys, particularly for selecting assigned partners.
Pengembangan Aplikasi Chatbot dengan Large Language Model untuk Text-to-SQL Generation Nugraha, Gede Putra; Suadaa, Lya Hulliyyatus; Wilantika, Nori; Maghfiroh, Lutfi Rahmatuti
Seminar Nasional Official Statistics Vol 2024 No 1 (2024): Seminar Nasional Official Statistics 2024
Publisher : Politeknik Statistika STIS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34123/semnasoffstat.v2024i1.2252

Abstract

The agricultural census query builder system has two modes: a query builder mode with an interface that facilitates the selection of tables, columns, and query criteria, and an SQL programming mode for executing SQL queries. The system provides a list of queries for basic anomaly checking nationwide, but advanced and unique anomaly checking for each work unit requires writing SQL queries from scratch, which is inefficient. This research developed a chatbot application that translates user queries into SQL queries for data anomaly checking. This chatbot uses the Large Language Model (LLM) GPT-4o. The chatbot application development uses the Rapid Application Development (RAD) model for rapid system development. Black Box Test and System Usability Test with System Usability Scale (SUS) show the results as expected by the user, with an average SUS score of 84.17 which indicates the chatbot application is acceptable.
Named Entity Recognition pada Kueri Pencarian Statistik Wildannissa Pinasti; Lya Hulliyyatus Suadaa
Jurnal Nasional Teknik Elektro dan Teknologi Informasi Vol 13 No 3: Agustus 2024
Publisher : Departemen Teknik Elektro dan Teknologi Informasi, Fakultas Teknik, Universitas Gadjah Mada

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22146/jnteti.v13i3.11580

Abstract

Search engines must understand user queries to provide relevant search results. Search engines can enhance their understanding of user intent by employing named entity recognition (NER) to identify the entity in the query. Knowing the types of entities in the query can be the initial step in helping search engines better understand search intent. In this research, a dataset was constructed using search query history from the Statistics Indonesia (Badan Pusat Statistik, BPS) website, and NER in query modeling was employed to extract entities from search queries related to statistical datasets. The research stages included query data collection, query data preprocessing, query data labeling, NER in query modeling, and model evaluation. The conditional random field (CRF) model was employed for NER in query modeling with two scenarios: CRF with basic features and CRF with basic features plus part of speech (POS) features. The CRF model was used due to its well-known effectiveness in natural language processing (NLP), particularly for tasks like NER with sequence labeling. In this research, the basic CRF and the CRF model with POS feature achieved an F1-score of 0.9139 and 0.9110, respectively. A case study on a Linked Open Data (LOD) statistical dataset indicated that searches with synonym query expansion on entities from NER in query produced better search results than regular searches without query expansion. The model's performance incorporating additional POS tagging features did not result in a significant improvement. Therefore, it is recommended that future research will elaborate on deep learning.
Study of the Application of Text Augmentation with Paraphrasing to Overcome Imbalanced Data in Indonesian Text Classification Sari, Mutiara Indryan; Suadaa, Lya Hulliyyatus
JOIN (Jurnal Online Informatika) Vol 10 No 1 (2025)
Publisher : Department of Informatics, UIN Sunan Gunung Djati Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15575/join.v10i1.1472

Abstract

Data imbalance in text classification often leads to poor recognition of minority classes, as classifiers tend to favor majority categories. This study addresses the data imbalance issue in Indonesian text classification by proposing a novel text augmentation approach using fine-tuned pre-trained models: IndoGPT2, IndoBART-v2, and mBART50. Unlike back-translation, which struggles with informal text, text augmentation using pre-trained models significantly improves the F1 score of minority labels, with fine-tuned mBART50 outperforming back translation and other models by balancing semantic preservation and lexical diversity. However, the approach faces limitations, including the risk of overfitting due to synthetic text's lack of natural variations, restricted generalizability from reliance on datasets such as ParaCotta, and the high computational costs associated with fine-tuning large models like mBART50. Future research should explore hybrid methods that integrate synthetic and real-world data to enhance text quality and diversity, as well as develop smaller, more efficient models to reduce computational demands. The findings underscore the potential of pre-trained models for text augmentation while emphasizing the importance of considering dataset characteristics, language style, and augmentation volume to achieve optimal results.
Model Klasifikasi Multilabel pada Publikasi Penelitian SDG dengan Pendekatan Multilevel dan Hierarki Berliana Sugiarti Putri; Lya Hulliyyatus Suadaa; Efri Diah Utami
Jurnal Nasional Teknik Elektro dan Teknologi Informasi Vol 14 No 1: Februari 2025
Publisher : Departemen Teknik Elektro dan Teknologi Informasi, Fakultas Teknik, Universitas Gadjah Mada

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22146/jnteti.v14i1.16265

Abstract

The growing number of research publications complicates the identification of the implementation of research publications, especially related to sustainable development goals (SDGs). The research publication categorization into SDG levels has not been conducted. The Center for Research and Community Service (Pusat Penelitian dan Pengabdian Masyarakat, PPPM) Politeknik Statistika (Polstat) STIS needs this to monitor lecturers in implementing SDGs. This study aimed to implement and evaluate problem transformation methods and machine learning classification algorithms with a multilevel and hierarchical approach to categorize research publications into SDG levels. Problem transformation methods used were binary relevance, label powerset (LP), and classifier chains. Machine learning classification algorithms used were logistic regression (LR) and support vector machine (SVM). The inputs included titles, abstracts, and titles and abstracts. The best filter model that classified data into SDGs-non-SDGs was the model with titles and SVM, with an accuracy of 0.8634. The best level model for classifying data to SDG level was the model using titles, LP, and SVM with multilevel approaches. The level model classified data into four pillars, goals, targets, and indicators of SDGs, with an accuracy of 0.8067, 0.7501, 0.6792, and 0.6194, respectively. In comparison to other inputs with more comprehensive information, the results showed that title inputs yielded the best accuracy due to the simultaneous use of English and Indonesian. Future research can modify the model to utilize a single language input to optimize the term frequency-inverse document frequency (TF-IDF) process, hence, the word meanings from each language are not considered different important words.
Prediction of Main Transportation Modes using Passive Mobile Positioning Data (Passive MPD) Farhan, Muhammad; Suadaa, Lya Hulliyyatus; Sugiri; Munaf, Alfatihah Reno Maulani Nuryaningsih Soekri Putri; Pramana, Setia
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 9 No 1 (2025): February 2025
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29207/resti.v9i1.6128

Abstract

Indicators of the main mode of transportation used by domestic tourists during tourism trips cannot yet be estimated using Passive MPD which is recorded based on the location of the BTS that captures the cellular activity of domestic tourists. Previous research on identifying transportation modes from Passive MPD has its own shortcomings because it only relies on speed and travel time features. Meanwhile, there is Active MPD which is recorded using active geo-positioning and real-time, where the research involves many features and has a data structure similar to Passive MPD. Therefore, this research aims to conduct a study of the implementation of the method used to identify modes of transportation in Active MPDs to Passive MPDs as an approach to predicting the main modes of transportation. As a result, the transportation mode identification method in the Active MPD can be implemented in the Passive MPD. The best accuracy of 83.56% was obtained by the LightGBM model using all features. However, the Multinomial Logistic Regression model, which only uses 10 selected features, is the most effective and efficient model with an accuracy of 76.43% and a much shorter execution time
Automatic Classification of Multilanguage Scientific Papers to the Sustainable Development Goals Using Transfer Learning Suadaa, Lya Hulliyyatus; Monika, Anugerah Karta; Putri, Berliana Sugiarti; Rimawati, Yeni
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 9 No 3 (2025): June 2025
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29207/resti.v9i3.6560

Abstract

The classification of scientific papers according to their relevance to Sustainable Development Goals (SDGs) is a critical task in identifying the research development status of goals. However, with the growing volume of scientific literature published worldwide in multiple languages, manual categorization of these papers has become increasingly complex and time-consuming. Furthermore, the need for a comprehensive multilingual dataset to train effective models complicates the task, as obtaining such datasets for various languages is resource intensive. This study proposes a solution to this problem by leveraging transfer learning techniques to automatically classify scientific papers into SDG labels. By fine-tuning pretrained multilingual models mBERT on SDG publication datasets in a multilabel approach, we demonstrate that transfer learning can significantly improve classification performance, even with limited labelled data, compared to SVM. Our approach enables the effective processing of scientific papers in different languages and facilitates the seamless mapping of research to the relevance of SDGs, the four pillars of SDGs, and the 17 goals of SDGs. The proposed method addresses the scalability issue in SDG classification and lays the groundwork for more efficient systems that can handle the multilingual nature of modern scientific publications.
Multi-Source Data Fusion For Data Extraction and Integration of Scientific Publications in Academic Institution STIS Maulidya, Luthfi; Suadaa, Lya Hulliyyatus; Wijayanto, Arie Wahyu; Ridho, Farid
Jurnal Nasional Pendidikan Teknik Informatika: JANAPATI Vol. 14 No. 2 (2025)
Publisher : Prodi Pendidikan Teknik Informatika Universitas Pendidikan Ganesha

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.23887/janapati.v14i2.87050

Abstract

Scientific research publication data is one of the most important data required by academic and research institution because it can be used as a reference to measure the performance of lecturers in research activities, to assess study programs and university accreditation, to identify research trends, and to plan research development policies and strategies. However, to fulfill these data needs, research data must be collected and integrated from various data sources due to the diversity of databases. One of the portals that provides scientific research publication data for universities in Indonesia is Sinta (Science and Technology Index). The integrated research databases in Sinta are Scopus, Web of Science (WoS), Garba Rujukan Digital (Garuda), and Google Scholar. However, there are limitations, namely that some scientific research publication metadata in Sinta are still not covered, such as Digital Object Identifier (DOI), abstract, author's full name, publication/journal name, publication type, and number of citations. In addition, each data source has a different data format, which requires data processing so that it can be integrated. Processing and integrating research data from different sources will be very inefficient if it is done manually and not computerized. Therefore, this study proposes a data engineering pipeline framework for the extraction and integration of scientific research publication data from various data sources using the multi-source data fusion method with the Unified Cube methodology approach, which is then implemented by building a web interface. We use Politeknik Statistika STIS, Jakarta as a case study. This framework refers to the data engineering lifecycle and multi-source data fusion method based on abstraction levels for the extraction and integration of scientific research publication data. Then, the transformed data will be classified using rule-based classification. The results show that the accuracy of the framework was more than 90% and the accuracy of the classification results was 87.5%.
OPTIMIZING LONG TEXT CLASSIFICATION PERFORMANCE THROUGH KEYWORD-BASED SENTENCE SELECTION: A CASE STUDY ON ONLINE NEWS CLASSIFICATION FOR INDONESIAN GDP GROWTH-RATE DETECTION Sholawatunnisa, Dinda Pusparahmi; Suadaa, Lya Hulliyyatus
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 18 No 2 (2024): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol18iss2pp1081-1094

Abstract

Efficiently managing lengthy textual data, particularly in online news, is crucial for enhancing the performance of long text classification. This study delves into innovative approaches to streamline the Gross Domestic Product (GDP) computation process by harnessing modern data analytics, Natural Language Processing (NLP), and online news sources. Leveraging online news data introduces real-time information, promising to improve the accuracy and timeliness of economic indicators like GDP. However, handling the complexity of extensive textual data poses a challenge, demanding advanced NLP techniques. This research shifts from traditional word-weight-based methods to keyword-based extractive summarization techniques to address this. These tailored approaches ensure that selected sentences align precisely with specific keywords relevant to the research case, such as GDP growth rate detection. The study emphasizes the necessity of adapting summarization methods to capture information in unique research contexts effectively. According to classification results, the implementation of sentence selection successfully demonstrated improved performance in terms of classification accuracy. Specifically, there was an average accuracy increase of 0.0226 for machine learning and 0.0164 for transfer learning models. Additionally, in terms of computational efficiency, sentence selection also accelerates processing time during hyperparameter tuning and fine-tuning, as observed using the same computational resources.