cover
Contact Name
-
Contact Email
-
Phone
-
Journal Mail Official
-
Editorial Address
-
Location
Unknown,
Unknown
INDONESIA
Jurnal Linguistik Komputasional
ISSN : -     EISSN : 26219336     DOI : -
Core Subject : Science,
Jurnal Linguistik Komputasional (JLK) menerbitkan makalah orisinil di bidang lingustik komputasional yang mencakup, namun tidak terbatas pada : Phonology, Morphology, Chunking/Shallow Parsing, Parsing/Grammatical Formalisms, Semantic Processing, Lexical Semantics, Ontology, Linguistic Resources, Statistical and Knowledge based methods, POS tagging, Discourse, Paraphrasing/Entailment/Generation, Machine Translation, Information Retrieval, Text Mining, Information Extraction, Summarization, Question Answering, Dialog Systems, Spoken Language Processing, Speech Recognition and Synthesis.
Arjuna Subject : -
Articles 5 Documents
Search results for , issue "Vol 2 No 1 (2019): Vol. 2, No. 1" : 5 Documents clear
Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen Muhammad Zidny Naf'an; Auliya Burhanuddin; Ade Riyani
Jurnal Linguistik Komputasional Vol 2 No 1 (2019): Vol. 2, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (476.917 KB) | DOI: 10.26418/jlk.v2i1.17

Abstract

Plagiarisme merupakan tindakan mengambil sebagian atau seluruh ide seseorang berupa dokumen maupun teks tanpa mencantumkan sumber pengambilan informasi. Penelitian ini bertujuan untuk mendeteksi kemiripan dokumen teks menggunakan algoritma cosine similarity dan pembobotan TF-IDF sehingga dapat digunakan untuk menentukan nilai plagiarisme. Dokumen yang digunakan untuk perbandingan teks ini adalah abstrak bahasa Indonesia. Hasil penelitian yaitu saat dilakukan stemming nilai kemiripan lebih tinggi rata-rata 10% daripada tidak dilakukan proses stemming. Penelitian ini menghasilkan nilai similaritas diatas 50% untuk dokumen yang tingkat kemiripannya tinggi. Sedangkan untuk dokumen dengan tingkat kemiripan rendah atau tidak berplagiat menghasilkan nilai similarity dibawah 40%. Dengan metode yang digunakan pada preprocessing yang terdiri dari case folding, tokenizing, stopword removeal, dan stemming. Setelah proses preprocessing maka tahap selanjutnya dilakukan perhitungan pembobotan TF-IDF dan nilai kemiripan menggunakan cosine similarity sehingga mendapatkan nilai persentase kemiripan. Berdasarkan hasil percobaan algoritma cosine similarity dan pembobotan TF-IDF mampu menghasilkan nilai kemiripan dari masing-masing dokumen pembanding
Employing Dependency Tree in Machine Learning Based Indonesian Factoid Question Answering System Irfan Afif; Ayu Purwarianti
Jurnal Linguistik Komputasional Vol 2 No 1 (2019): Vol. 2, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (577.209 KB) | DOI: 10.26418/jlk.v2i1.9

Abstract

We proposed the usage of dependency tree information to increase the accuracy of Indonesian factoid question answering. We employed MSTParser and Universal Dependency corpus to build the Indonesian dependency parser. The dependency tree information as the result of the Indonesian dependency parse is used in the answer finder component of Indonesian factoid question answering system. Here, we used dependency tree information in two ways: 1) as one of the features in machine learning based answer finder (classifying each term in the retrieved passage as part of a correct answer or not); 2) as an additional heuristic rule after conducting the machine learning technique. For the machine learning technique, we combined word based calculation, phrase based calculation and similarity dependency relation based calculation as the complete features. Using 203 data, we were able to enhance the accuracy for the Indonesian factoid QA system compared to related work by only using the phrase information. The best accuracy was 84.34% for the correct answer classification and the best MRR was 0.954.
Identifikasi Konten Kasar Pada Tweet Bahasa Indonesia Ahmad Fathan Hidayatullah; Aufa Aulia Fadila; Kiki Purnama Juwairi; Royan Abida Nayoan
Jurnal Linguistik Komputasional Vol 2 No 1 (2019): Vol. 2, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (497.009 KB) | DOI: 10.26418/jlk.v2i1.15

Abstract

This study aims to identify tweets containing abusive or offensive content. To do this, we performed five steps, such as, data collection, preprocessing, feature extraction, classification, and evaluation. We employed Multinomial Naïve Bayes and Support Vector Machine with linear kernel as our classification algorithm. Based on the experiment, it is known that the performance of the Support Vector Machine algorithm with linear kernel is superior overall compared to the Multinomial Naïve Bayes algorithm. It can be seen from the result of the values ​​of accuracy, precision, recall, and F1-score for the SVM algorithm, respectively 0.9928; 0.9914; 0.9946; and 0.9930. Whereas the value of accuracy, precision, recall, and F1-score of the Multinomial Naïve Bayes algorithm are 0.9834; 0.9912; 0.9762; and 0.9836. However, it can be concluded that the Support Vector Machine and Multinomial Naïve Bayes algorithm have almost the same performance. This is evidenced by the difference in performance achievements that are not too striking from both algorithm.
Analisis Morfologi untuk Menangani Out-of-Vocabulary Words pada Part-of-Speech Tagger Bahasa Indonesia Menggunakan Hidden Markov Model Febyana Ramadhanti; Yudi Wibisono; Rosa Ariani Sukamto
Jurnal Linguistik Komputasional Vol 2 No 1 (2019): Vol. 2, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1222.973 KB) | DOI: 10.26418/jlk.v2i1.13

Abstract

Part-of-speech (PoS) tagger is one of tasks in the field of natural language processing (NLP) as the process of part-of-speech tagging for each word in the inputed sentence. Hidden markov model (HMM) is a probabilistic based PoS tagger algorithm, so it really depends on the train corpus. The limited components in the train corpus and the breadth of words in the Indonesian language pose a problem called out-of-vocabulary (OOV) words. This research compared PoS tagger HMM using Morphological Analysis (AM) method and HMM PoS tagger without AM, using the same train and testing corpus. Testing corpus contains 30% OOV level out of 6,676 tokens or 740 sentences. The result obtained from the HMM system has 97.54% of accuracy, while the HMM system with morphological analysis method has 99.14% as it’s highest accuracy.
Sistem Identifikasi Bahasa Jawa dan Bahasa Indonesia Dokumen Teks Berbasis N-Gram Karakter Lucia Dwi Krisnawati; Fidelia Vera Sentosa; Aditya Wikan Mahastama
Jurnal Linguistik Komputasional Vol 2 No 1 (2019): Vol. 2, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (974.848 KB) | DOI: 10.26418/jlk.v2i1.16

Abstract

Identifikasi bahasa adalah sebuah proses yang mencoba menemukan bahasa yang digunakan dalam sebuah wacana secara otomatis. Sistem Identifikasi Bahasa (SIB) pada dasarnya dibedakan menjadi SIB lisan yang mengidentifikasi bahasa tutur lewat fitur akustik atau fonem, dan SIB berbasis fitur grafem dalam berbagai level dan kategori linguistiknya. Penelitian ini mencoba untuk membangun SIB yang dirancang untuk membedakan teks berbahasa Jawa dari bahasa Indonesia dan bahasa lainnya. Profil bahasa yang digunakan dibangun dari korpus yang diambil dari korpus Trawaca dan beberapa sumber daring dari berbagai topic. Tujuannya adalah untuk memperkaya kosa kata dan menignkatkan jumlah tipe kata. Profil bahasa tiap kategori diebntuk dari n-gram berbasis karakter dan diambil 100 n-gram dengan nilai CF tertinggi. Perhitungan jarak antara profil bahasa dengan dokumen uji dilakukan dengan menggunakan ukuran Out-Of-Place (OOP). Hasil pengujian menunjukkan bahwa Presisi idenfikasi dokumen berbahasa Jawa mencapai 0.96, sedangkan Presisi dokumen berbahasa Indonesia mencapai 0.86. Nilai Akurasi total identifikasi mencapai 0.85. Nilai Presisi identifikasi bahasa Indonesia jauh lebih rendah disbanding nilai Presisi identifikasi bahasa Jawa disebabkan diujikannya dokumen berbahasa Melayu-Malaysia yang tentu saja teridentifikasi sebagai dokumen berbahasa Indonesia.

Page 1 of 1 | Total Record : 5