Sari, Mutiara Indryan
Unknown Affiliation

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Study of the Application of Text Augmentation with Paraphrasing to Overcome Imbalanced Data in Indonesian Text Classification Sari, Mutiara Indryan; Suadaa, Lya Hulliyyatus
JOIN (Jurnal Online Informatika) Vol 10 No 1 (2025)
Publisher : Department of Informatics, UIN Sunan Gunung Djati Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15575/join.v10i1.1472

Abstract

Data imbalance in text classification often leads to poor recognition of minority classes, as classifiers tend to favor majority categories. This study addresses the data imbalance issue in Indonesian text classification by proposing a novel text augmentation approach using fine-tuned pre-trained models: IndoGPT2, IndoBART-v2, and mBART50. Unlike back-translation, which struggles with informal text, text augmentation using pre-trained models significantly improves the F1 score of minority labels, with fine-tuned mBART50 outperforming back translation and other models by balancing semantic preservation and lexical diversity. However, the approach faces limitations, including the risk of overfitting due to synthetic text's lack of natural variations, restricted generalizability from reliance on datasets such as ParaCotta, and the high computational costs associated with fine-tuning large models like mBART50. Future research should explore hybrid methods that integrate synthetic and real-world data to enhance text quality and diversity, as well as develop smaller, more efficient models to reduce computational demands. The findings underscore the potential of pre-trained models for text augmentation while emphasizing the importance of considering dataset characteristics, language style, and augmentation volume to achieve optimal results.
PENGGUNAAN REMOTE SENSING DAN GOOGLE TRENDS UNTUK ESTIMASI PRODUK DOMESTIK BRUTO INDONESIA Kamal, Firhand Yusuf; Sari, Mutiara Indryan; Utami, Maulidya Fan Ghul Udzan; Kartiasih, Fitri
Equilibrium: Jurnal Penelitian Pendidikan dan Ekonomi Vol. 21 No. 02 (2024): Equilibrium: Jurnal Penelitian Pendidikan dan Ekonomi
Publisher : Universitas Kuningan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.25134/equi.v21i02.9455

Abstract

AbstrakPembangunan ekonomi merupakan salah satu topik yang penting untuk dikaji karena memberi gambaran tingkat kesejahteraan suatu negara. Akan tetapi, kebutuhan data yang menggambarkan pembangunan ekonomi, khususnya Produk Domestik Bruto (PDB) belum tersedia secara real-time. Penggunaan big data, seperti Night Time Light (NTL) dan Google Trends, untuk mengestimasi PDB menjadi salah satu solusi permasalahan tersebut. Meskipun begitu, kedua jenis big data tersebut masih memiliki kekurangan sebagai proxy untuk variabel PDB. NTL memiliki kekurangan karena data yang dihasilkan tidak mampu membedakan apakah cahaya yang ditangkap merupakan cahaya yang berasal dari listrik atau cahaya temporal, seperti api, pantulan cahaya, dan lain sebagainya. Begitu pula dengan Google Trends yang masih memiliki kekurangan dimana kata kunci yang digunakan tidak selalu merepresentasikan pola perilaku masyarakat secara konsisten. Metode penelitian ini bersifat kuantitatif dengan menggunakan data NTL dan Google Trends yang dievaluasi dan dibandingkan untuk melihat jenis data mana yang terbaik dalam menghasilkan estimasi PDB. Hasil penelitian menunjukkan bahwa penggunaan NTL, Google Trends, dan gabungan keduanya dapat digunakan untuk memprediksi PDB. Hal tersebut dapat terlihat dari hasil pemodelan yang tidak bersifat overfitting dan memiliki nilai MAPE di bawah 10%. Selain itu, penggunaan gabungan kedua data tersebut menjadi pilihan terbaik dalam mengestimasi PDB yang ditunjukkan dengan hasil evaluasi terbaik, yakni nilai RMSE sebesar 15792,73 dan nilai MAPE sebesar 0,52%. Kata kunci: google trends; produk domestik bruto; night time light; remote sensing AbstractEconomic development is an important topic to study because it provides an overview of the level of welfare of a country. However, the need for data that describes economic development, especially Gross Domestic Product (GDP), is not yet available in real-time. The use of big data, such as Night Time Light (NTL) and Google Trends, to estimate GDP is one solution to this problem. Even so, both types of big data still have shortcomings as proxies for GDP variables. NTL has shortcomings because the data produced is unable to distinguish whether the light captured is light originating from electricity or temporal light, such as fire, reflected light, and so on. Likewise, Google Trends still has shortcomings in that the keywords used do not always consistently represent people's behavior patterns. This research method is quantitative using NTL and Google Trends data which are evaluated and compared to see which type of data is best in producing GDP estimates. The research results show that the use of NTL, Google Trends, and a combination of both can be used to predict GDP. This can be seen from the modeling results which are not overfitting and have a MAPE value below 10%. Apart from that, using a combination of these two data is the best choice in estimating GDP as shown by the best evaluation results, namely an RMSE value of 15792.73 and a MAPE value of 0.52%.  Keywords: google trends; gross domestic product; nighttime light; remote sensing