Claim Missing Document
Check
Articles

Found 27 Documents
Search

Abstractive Text Summarization using Pre-Trained Language Model "Text-to-Text Transfer Transformer (T5)" Qurrota A’yuna Itsnaini; Mardhiya Hayaty; Andriyan Dwi Putra; Nidal A.M Jabari
ILKOM Jurnal Ilmiah Vol 15, No 1 (2023)
Publisher : Prodi Teknik Informatika FIK Universitas Muslim Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33096/ilkom.v15i1.1532.124-131

Abstract

Automatic Text Summarization (ATS) is one of the utilizations of technological sophistication in terms of text processing assisting humans in producing a summary or key points of a document in large quantities. We use Indonesian language as objects because there are few resources in NLP research using Indonesian language. This paper utilized PLTMs (Pre-Trained Language Models) from the transformer architecture, namely T5 (Text-to-Text Transfer Transformer) which has been completed previously with a larger dataset. Evaluation in this study was measured through comparison of the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) calculation results between the reference summary and the model summary. The experiments with the pre-trained t5-base model with fine tuning parameters of 220M for the Indonesian news dataset yielded relatively high ROUGE values, namely ROUGE-1 = 0.68, ROUGE-2 = 0.61, and ROUGE-L = 0.65. The evaluation value worked well, but the resulting model has not achieved satisfactory results because in terms of abstraction, the model did not work optimally. We also found several errors in the reference summary in the dataset used.
The observed preprocessing strategies for doing automatic text summarizing Muhammad Farhan Juna; Mardhiya Hayaty
Computer Science and Information Technologies Vol 4, No 2: July 2023
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/csit.v4i2.p119-126

Abstract

It is challenging for humans to keep up with the rapid creation of digital information due to the explosion of digital information. A written document can be analyzed to extract meaningful information using automatic text summarization. This research proposes 16 different experimental settings in which the model developed by IndoBERT will be applied in order to answer the question of how much of an impact preprocessing has on the quality of summaries produced by automatic text summarization. In order to answer this question, the researchers have devised this study. In this study, we will explicitly talk about preprocessing strategies by conducting tests with different combinations of preprocessing techniques. These techniques include data cleansing, stopwords, stemming, and case folding. After that, the recall-oriented understudy for gisting evaluation (ROUGE) assessment will be used to conduct the measurement of the research results. According to the findings of this research, the optimal level of performance may be accomplished by combining the processes of data cleaning and case folding with scores of 0.78, 0.60, and 0.68 for ROUGE-1, ROUGE-2, and ROUGE-L respectively.
Hate speech detection on Indonesian text using word embedding method-global vector Mardhiya Hayaty; Arif Dwi Laksito; Sumarni Adi
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 12, No 4: December 2023
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v12.i4.pp1928-1937

Abstract

Hate speech is defined as communication directed toward a specific individual or group that involves hatred or anger and a language with solid arguments leading to someone's opinion can cause social conflict. It has a lot of potential for individuals to communicate their thoughts on an online platform because the number of Internet users globally, including in Indonesia, is continually rising. This study aims to observe the impact of pre-trained global vector (GloVe) word embedding on accuracy in the classification of hate speech and non-hate speech. The use of pre-trained GloVe (Indonesian text) and single and multi-layer long short-term memory (LSTM) classifiers has performance that is resistant to overfitting compared to pre-trainable embedding for hatespeech detection. The accuracy value is 81.5% on a single layer and 80.9% on a double-layer LSTM. The following job is to provide pre-trained with formal and non-formal language corpus; pre-processing to overcome non-formal words is very challenging.
ABSTRACTIVE-BASED AUTOMATIC TEXT SUMMARIZATION ON INDONESIAN NEWS USING GPT-2 Aini Nur Khasanah; Mardhiya Hayaty
JURTEKSI (Jurnal Teknologi dan Sistem Informasi) Vol 10, No 1 (2023): Desember 2023
Publisher : STMIK Royal

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33330/jurteksi.v10i1.2492

Abstract

Automatic text summarization is challenging research in natural language processing, aims to obtain important information quickly and precisely. There are two main approach techniques for text summary: abstractive and extractive summary. Abstractive Summarization generates new and more natural words, but the difficulty level is higher and more challenging. In previous studies, RNN and its variants are among the most popular Seq2Seq models in text summarization. However, there are still weaknesses in saving memory; gradients are lost in long sentences so resulting in a decrease in lengthy text summaries. This research proposes a Transformer model with an Attention mechanism that can fetch important information, solve parallelization problems, and summarize long texts. The Transformer model we propose is GPT-2. GPT-2 uses decoders to predict the next word using the pre-trained model from w11wo/indo-gpt2-small, implemented on the Indosum Indonesian dataset. Evaluation assessment of the model performance using ROUGE evaluation. The study's results get an average result recall for R-1, R-2, and R-L were 0.61, 0.51, and 0.57, respectively. The summary results can paraphrase sentences, but some still use the original words from the text. Future work increase the amount of data from the dataset to improve the result of more new sentence paraphrases.
Random and Synthetic Over-Sampling Approach to Resolve Data Imbalance in Classification Hayaty, Mardhiya; Muthmainah, Siti; Ghufran, Syed Muhammad
International Journal of Artificial Intelligence Research Vol 4, No 2 (2020): December 2020
Publisher : Universitas Dharma Wacana

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (325.603 KB) | DOI: 10.29099/ijair.v4i2.152

Abstract

High accuracy value is one of the parameters of the success of classification in predicting classes. The higher the value, the more correct the class prediction.  One way to improve accuracy is dataset has a balanced class composition. It is complicated to ensure the dataset has a stable class, especially in rare cases. This study used a blood donor dataset; the classification process predicts donors are feasible and not feasible; in this case, the reward ratio is quite high. This work aims to increase the number of minority class data randomly and synthetically so that the amount of data in both classes is balanced. The application of SOS and ROS succeeded in increasing the accuracy of inappropriate class recognition from 12% to 100% in the KNN algorithm. In contrast, the naïve Bayes algorithm did not experience an increase before and after the balancing process, which was 89%. 
ABSTRACTIVE-BASED AUTOMATIC TEXT SUMMARIZATION ON INDONESIAN NEWS USING GPT-2 Khasanah, Aini Nur; Hayaty, Mardhiya
JURTEKSI (jurnal Teknologi dan Sistem Informasi) Vol. 10 No. 1 (2023): Desember 2023
Publisher : Lembaga Penelitian dan Pengabdian Kepada Masyarakat (LPPM) STMIK Royal Kisaran

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33330/jurteksi.v10i1.2492

Abstract

Automatic text summarization is challenging research in natural language processing, aims to obtain important information quickly and precisely. There are two main approach techniques for text summary: abstractive and extractive summary. Abstractive Summarization generates new and more natural words, but the difficulty level is higher and more challenging. In previous studies, RNN and its variants are among the most popular Seq2Seq models in text summarization. However, there are still weaknesses in saving memory; gradients are lost in long sentences so resulting in a decrease in lengthy text summaries. This research proposes a Transformer model with an Attention mechanism that can fetch important information, solve parallelization problems, and summarize long texts. The Transformer model we propose is GPT-2. GPT-2 uses decoders to predict the next word using the pre-trained model from w11wo/indo-gpt2-small, implemented on the Indosum Indonesian dataset. Evaluation assessment of the model performance using ROUGE evaluation. The study's results get an average result recall for R-1, R-2, and R-L were 0.61, 0.51, and 0.57, respectively. The summary results can paraphrase sentences, but some still use the original words from the text. Future work increase the amount of data from the dataset to improve the result of more new sentence paraphrases.
Implementasi Software Plagiasi dan Google Classroom Untuk Membantu Penilaian Tugas Siswa Pada SMK Nasional Berbah-Seleman Hayaty, Mardhiya
Jurnal ABDINUS : Jurnal Pengabdian Nusantara Vol 3 No 2 (2020): Volume 3 Nomor 2 Tahun 2020
Publisher : Universitas Nusantara PGRI Kediri

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29407/ja.v3i2.13812

Abstract

Work on student assignments and assessments are done manually so that evaluation cannot be done objectively because many tasks are similar or even the same as other student assignments. Copying other people's work is unlawful; students lack an understanding of the definition of plagiarism. Therefore education about this is done early, especially for the world of knowledge, which incidentally really appreciates the work of others. Making plagiarism software is needed to answer these challenges; this service activity provides training to teachers in managing online-based student assignments and checking assignment documents using plagiarism software. This activity can make it easier for teachers to offer assignment assessments and provide students with an understanding of the originality of a work.