Claim Missing Document
Check
Articles

Found 4 Documents
Search
Journal : Knowledge Engineering and Data Science

Indonesian Sentence Boundary Detection using Deep Learning Approaches Joan Santoso; Esther Irawati Setiawan; Christian Nathaniel Purwanto; Fachrul Kurniawan
Knowledge Engineering and Data Science Vol 4, No 1 (2021)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v4i12021p38-48

Abstract

Detecting the sentence boundary is one of the crucial pre-processing steps in natural language processing. It can define the boundary of a sentence since the border between a sentence, and another sentence might be ambiguous. Because there are multiple separators and dynamic sentence patterns, using a full stop at the end of a sentence is sometimes inappropriate. This research uses a deep learning approach to split each sentence from an Indonesian news document. Hence, there is no need to define any handcrafted features or rules. In Part of Speech Tagging and Named Entity Recognition, we use sequence labeling to determine sentence boundaries. Two labels will be used, namely O as a non-boundary token and E as the last token marker in the sentence. To do this, we used the Bi-LSTM approach, which has been widely used in sequence labeling. We have proved that our approach works for Indonesian text using pre-trained embedding in Indonesian, as in previous studies. This study achieved an F1-Score value of 98.49 percent. When compared to previous studies, the achieved performance represents a significant increase in outcomes..
Indonesian Language Term Extraction using Multi-Task Neural Network Joan Santoso; Esther Irawati Setiawan; Fransiskus Xaverius Ferdinandus; Gunawan Gunawan; Leonel Hernandez
Knowledge Engineering and Data Science Vol 5, No 2 (2022)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v5i22022p160-167

Abstract

The rapidly expanding size of data makes it difficult to extricate information and store it as computerized knowledge. Relation extraction and term extraction play a crucial role in resolving this issue. Automatically finding a concealed relationship between terms that appear in the text can help people build computer-based knowledge more quickly. Term extraction is required as one of the components because identifying terms that play a significant role in the text is the essential step before determining their relationship. We propose an end-to-end system capable of extracting terms from text to address this Indonesian language issue. Our method combines two multilayer perceptron neural networks to perform Part-of-Speech (PoS) labeling and Noun Phrase Chunking. Our models were trained as a joint model to solve this problem. Our proposed method, with an f-score of 86.80%, can be considered a state-of-the-art algorithm for performing term extraction in the Indonesian Language using noun phrase chunking.
Maximum Marginal Relevance and Vector Space Model for Summarizing Students' Final Project Abstracts Gunawan Gunawan; Fitria Fitria; Esther Irawati Setiawan; Kimiya Fujisawa
Knowledge Engineering and Data Science Vol 6, No 1 (2023)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v6i12023p57-68

Abstract

Automatic summarization is reducing a text document with a computer program to create a summary that retains the essential parts of the original document. Automatic summarization is necessary to deal with information overload, and the amount of data is increasing. A summary is needed to get the contents of the article briefly. A summary is an effective way to present extended information in a concise form of the main contents of an article, and the aim is to tell the reader the essence of a central idea. The simple concept of a summary is to take an essential part of the entire contents of the article. Which then presents it back in summary form. The steps in this research will start with the user selecting or searching for text documents that will be summarized with keywords in the abstract as a query. The proposed approach performs text preprocessing for documents: sentence breaking, case folding, word tokenizing, filtering, and stemming. The results of the preprocessed text are weighted by term frequency-inverse document frequency (tf-idf), then weighted for query relevance using the vector space model and sentence similarity using cosine similarity. The next stage is maximum marginal relevance for sentence extraction. The proposed approach provides comprehensive summarization compared with another approach. The test results are compared with manual summaries, which produce an average precision of 88%, recall of 61%, and f-measure of 70%.
Timbre Style Transfer for Musical Instruments Acoustic Guitar and Piano using the Generator-Discriminator Model Nagari, Widean; Santoso, Joan; Setiawan, Esther Irawati
Knowledge Engineering and Data Science Vol 7, No 1 (2024)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v7i12024p101-116

Abstract

Music style transfer is a technique for creating new music by combining the input song's content and the target song's style to have a sound that humans can enjoy. This research is related to timbre style transfer, a branch of music style transfer that focuses on using the generator-discriminator model. This exciting method has been used in various studies in the music style transfer domain to train a machine learning model to change the sound of instruments in a song with the sound of instruments from other songs. This work focuses on finding the best layer configuration in the generator-discriminator model for the timbre style transfer task. The dataset used for this research is the MAESTRO dataset. The metrics used in the testing phase are Contrastive Loss, Mean Squared Error, and Perceptual Evaluation of Speech Quality. Based on the results of the trials, it was concluded that the best model in this research was the model trained using column vectors from the mel-spectrogram. Some hyperparameters suitable in the training process are a learning rate 0.0005, batch size greater than or equal to 64, and dropout with a value of 0.1. The results of the ablation study show that the best layer configuration consists of 2 Bi-LSTM layers, 1 Attention layer, and 2 Dense layers.