Garuda - Garba Rujukan Digital

Sistem Pencarian Ayat Al-Quran Berdasarkan Kemiripan Ucapan Menggunakan Algoritma Soundex dan Damerau-Levenshtein Distance Puruhita Ananda Arsaningtyas; Moch. Arif Bijaksana; Said Al Faraby
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Corpus Quality Improvement to Improve the Quality of Statistical Translator Machines (Case Study of Indonesian Language to Java Krama) Muhammad Gerdy Asparilla; Herry Sujaini; Rudy Dwi Nyoto
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Language is a communication tool that is used as a means to interact with the surrounding community. The ability to master many languages will certainly make it easier to interact with other people from different regions. Therefore, translators are needed to increase knowledge of various languages. Statistical Machine Translation (Statistical Machine Translation) is a machine translation approach with translation results produced on the basis of statistical models whose parameters are taken from the results of parallel corpus analysis. Parallel body is a pair of corpus containing sentences in a language and translation. One feature that is used to improve the quality of translation results is with corpus optimization. The aim to be achieved in this study is to look at the influence of the quality of the corpus by filtering out pairs of sentences with quality translation. The filter used is the minimum value of each sentence that is tested by the Bilingual Evaluation Understudy (BLEU) method. Testing is done by comparing the accuracy of the results of the translation before and after corpus optimization. From the results of the research, the use of corpus optimization can improve the quality of translation for Indonesian translation machines to Javanese manners. This can be seen from the results of testing by adding corpus optimization to 15 test sentences outside the corpus, there is an average increase in BLEU values of 10.53% and by using 100 test sentences derived from corpus optimization there is an average increase in BLEU values of 11.63% in automated testing and 0.03% on testing by linguists. Based on this, the machine translating Indonesian statistics into Javanese language using the corpus optimization feature can increase the accuracy of the translation results

Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen Lukman Hakim; Fadli Husein Wattiheluw; Agus Zainal Arifin; Aminul Wahib
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Multi-document summarization is a technique for getting information. The information consists of several lines of sentences that aim to describe the contents of the entire document relevantly. Several algorithms with various criteria have been carried out. In general, these criteria are the preprocessing, cluster, and representative sentence selection to produce summaries that have high relevance. In some conditions, the cluster stage is one of the important stages to produce summarization. Existing research cannot determine the number of clusters to be formed. Therefore, we propose clustering techniques using cluster hierarchy. This technique measures the similarity between sentences using cosine similarity. These sentences are clustered based on their similarity values. Clusters that have the highest level of similarity with other clusters will be merged into one cluster. This merger process will continue until one cluster remains. Experimental results on the 2004 Document Understanding Document (DUC) dataset and using two scenarios that use 132, 135, 137 and 140 clusters resulting in fluctuating values. The smaller the number of clusters does not guarantee an increase in the value of ROUGE-1. The method proposed using the same number of clusters has a lower ROUGE-1 value than the previous method. This is because in cluster 140 the similarity values in each cluster experienced a decrease in similarity values.

Uji Coba Korpus Data Wicara BPPT sebagai Data Latih Sistem Pengenalan Wicara Bahasa Indonesia Made Gunawan; Elvira Nurfadhilah; Lyla Ruslana Aini; M. Teduh Uliniansyah; Gunarso -; Agung Santosa; Juliati Junde
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Kami menyajikan hasil uji coba pengenalan wicara menggunakan Korpus Data Wicara BPPT yang dikembangkan tahun 2013 (KDW-BPPT-2013) dengan menggunakan anggaran DIPA tahun 2013. Korpus ini digunakan sebagai data latih dan data uji. Korpus ini berisi ujaran dari 200 pembicara yang terdiri dari 50 laki-laki dewasa, 50 laki-laki remaja, 50 perempuan dewasa, dan 50 perempuan remaja dengan masing-masing mengucapkan 250 kalimat. Total lama ujaran data wicara ini sekitar 92 jam. Uji coba dilakukan dengan menggunakan Kaldi dan menghasilkan Word Error Rate (WER) GMM 2,52 % dan DNN 1,64%.

Building Monolingual Word Alignment For Indonesian Al-Quran Translation Galih Rizky Prabowo; Moch Arif Bijaksana
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Peringkasan Otomatis Multi Dokumen menggunakan Hirarki Kluster Lukman Hakim; Fadli Husein Wattiheluw; Agus Zainal Arifin; Aminul Wahib
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26418/jlk.v1i2.86

Multi-document summarization is a technique for getting information. The information consists of several lines of sentences that aim to describe the contents of the entire document relevantly. Several algorithms with various criteria have been carried out. In general, these criteria are the preprocessing, cluster, and representative sentence selection to produce summaries that have high relevance. In some conditions, the cluster stage is one of the important stages to produce summarization. Existing research cannot determine the number of clusters to be formed. Therefore, we propose clustering techniques using cluster hierarchy. This technique measures the similarity between sentences using cosine similarity. These sentences are clustered based on their similarity values. Clusters that have the highest level of similarity with other clusters will be merged into one cluster. This merger process will continue until one cluster remains. Experimental results on the 2004 Document Understanding Document (DUC) dataset and using two scenarios that use 132, 135, 137 and 140 clusters resulting in fluctuating values. The smaller the number of clusters does not guarantee an increase in the value of ROUGE-1. The method proposed using the same number of clusters has a lower ROUGE-1 value than the previous method. This is because in cluster 140 the similarity values in each cluster experienced a decrease in similarity values.

Home Page

OAI Link

Editorial Team

Contact

Reviewer

Google Scholar

Contact Name
-

Contact Email
-

Phone
-

Journal Mail Official
-

Editorial Address
-

Location
Unknown,

Unknown

INDONESIA

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Home Page

OAI Link

Editorial Team

Contact

Reviewer

Google Scholar

Contact Name -

Contact Email -

Phone -

Journal Mail Official -

Editorial Address -

Location Unknown, Unknown INDONESIA

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Contact Name
-

Contact Email
-

Phone
-

Journal Mail Official
-

Editorial Address
-

Location
Unknown,

Unknown

INDONESIA