Claim Missing Document
Check
Articles

Found 4 Documents
Search
Journal : Khazanah Informatika: Jurnal Ilmu Komputer dan Informatika

Meningkatkan Peran Model Bahasa dalam Mesin Penerjemah Statistik (Studi Kasus Bahasa Indonesia-Dayak Kanayatn) Sujaini, Herry
Khazanah Informatika Vol. 3 No. 2 Desember 2017
Publisher : Universitas Muhammadiyah Surakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.23917/khif.v3i2.4398

Abstract

Sistem terjemahan mesin berbasis statistik menggunakan kombinasi satu atau lebih model terjemahan dan model bahasa. Meskipun ada banyak penelitian yang membahas peningkatan model terjemahan, masalah mengoptimalkan model bahasa untuk tugas penerjemahan tertentu belum banyak mendapat perhatian. Biasanya, model trigram digunakan sebagai model bahasa standar dalam sistem terjemahan mesin statistik. Dalam tulisan ini kami menerapkan 4 strategi eksperimen untuk melihat peran model bahasa yang digunakan dalam mesin terjemahan Indonesia-Dayak Kanayatn dan menunjukkan perbaikan pada sistem baseline dengan model bahasa standar.
PERFORMANCE OF METHODS IN IDENTIFYING SIMILAR LANGUAGES BASED ON STRING TO WORD VECTOR Sujaini, Herry
Khazanah Informatika Vol. 6 No. 1 April 2020
Publisher : Universitas Muhammadiyah Surakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.23917/khif.v6i1.8199

Abstract

Indonesia has a large number of local languages that have cognate words, some of which have similarities among each other. Automatic identification within a family of languages faces problems, so it is necessary to learn the best performer of language identification methods in doing the task. This study made an effort to identification Indonesian local languages, which used String to Word Vector approach. A string vector refers to a collection of ordered words. In a string vector, a word is represented as an element or value, while the word becomes an attribute or feature in each numeric vector. Among Naïve Bayes, SMO, J48, and ZeroR classifiers, SMO is found to be the most accurate classifier with a level of accuracy at 95.7% for 10-fold cross-validation and 94.4% for 60%: 40%. The best tokenizer in this classification is Character N-Gram. All classifiers, except ZeroR shows increased accuracy when using Character N-Gram Tokenizer compared to Word Tokenizer. The best features of this system are the TriGram and FourGram Character. The TriGram is preferred because it requires smaller training data. The highest accuracy value in the combination experiment is 0.965 obtained at a combination of IDF = FALSE and WC = TRUE, regardless the conditions of the TF.
Meningkatkan Peran Model Bahasa dalam Mesin Penerjemah Statistik (Studi Kasus Bahasa Indonesia-Dayak Kanayatn) Herry Sujaini
Khazanah Informatika Vol. 3 No. 2 Desember 2017
Publisher : Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.23917/khif.v3i2.4398

Abstract

Sistem terjemahan mesin berbasis statistik menggunakan kombinasi satu atau lebih model terjemahan dan model bahasa. Meskipun ada banyak penelitian yang membahas peningkatan model terjemahan, masalah mengoptimalkan model bahasa untuk tugas penerjemahan tertentu belum banyak mendapat perhatian. Biasanya, model trigram digunakan sebagai model bahasa standar dalam sistem terjemahan mesin statistik. Dalam tulisan ini kami menerapkan 4 strategi eksperimen untuk melihat peran model bahasa yang digunakan dalam mesin terjemahan Indonesia-Dayak Kanayatn dan menunjukkan perbaikan pada sistem baseline dengan model bahasa standar.
Performance of Methods in Identifying Similar Languages Based on String to Word Vector Herry Sujaini
Khazanah Informatika Vol. 6 No. 1 April 2020
Publisher : Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.23917/khif.v6i1.8199

Abstract

Indonesia has a large number of local languages that have cognate words, some of which have similarities among each other. Automatic identification within a family of languages faces problems, so it is necessary to learn the best performer of language identification methods in doing the task. This study made an effort to identification Indonesian local languages, which used String to Word Vector approach. A string vector refers to a collection of ordered words. In a string vector, a word is represented as an element or value, while the word becomes an attribute or feature in each numeric vector. Among Naïve Bayes, SMO, J48, and ZeroR classifiers, SMO is found to be the most accurate classifier with a level of accuracy at 95.7% for 10-fold cross-validation and 94.4% for 60%: 40%. The best tokenizer in this classification is Character N-Gram. All classifiers, except ZeroR shows increased accuracy when using Character N-Gram Tokenizer compared to Word Tokenizer. The best features of this system are the TriGram and FourGram Character. The TriGram is preferred because it requires smaller training data. The highest accuracy value in the combination experiment is 0.965 obtained at a combination of IDF = FALSE and WC = TRUE, regardless the conditions of the TF.
Co-Authors Abang Wandi Syafutra, Abang Wandi Achmadi Achmadi Ade Elbani Ade Mirza Adesfiana, Zeny Novia Afriani Afriani Afrizal - Agustina Listiawati Ahmad Yani T Akbar, Khamsah Akbar, Khamsah Akmal, Wildan Aktris Nuryanti, Aktris Al-Abdaliah, Ulfat Alda Dwi Meilinda Aldi Setiawan, Aldi Alhadiansyah Aloysius Mering Andi Ihwan Andreas Christian Andri Hidayat, Andri Anggi Perwitasari Anggi Srimurdianti Sukamto, Anggi Srimurdianti Antonius Yonathan Ardiani, Lian Arif Bijaksana Putra Negara Arif Bijaksana Putra, Arif Bijaksana Asep Nursangaji Aswandi - Aunurrahman Aunurrahman Aunurrahman Ayusra, Nuraini Bijaksana Putra, Arif Bistari Bomo W. Sanjaya Darwis, Robby Darwis, Robby Dede Suratman Deni Ferliyansah, Deni Desepta Isna Ulumi Despitaria Despitaria, Despitaria Dharmawan, Eric Dian Prawira, Dian Doddi Aria Putra, Doddi Aria Dwi Zulfita Edy Suasono Elang Derdian Marindani Elly Suharlina Enda Esyudha Pratama Enriko Yudhistira Ramadhan Erni Djun Astuti Etsa, Muhammad Dwi Eva Dolorosa Eva Faja Ripanti Eva Faja Ripanti Faizal Feriyadi, Deri Fitri Imansyah Gerry Christofer, Gerry Gientry Rachma Ditami Glen Hizkia Oge Mangundap Gusman, Gusman Hadary, Ferry Hafiz Muhardi Hamdani - Haratua Tiur Maria Silitonga Haried Novriando Hariyadi, Firma Harry Luanda Sadewa, Harry Luanda Hartono, Seno Helen Sasty Pratiwi Helen Sasty Pratiwi, Helen Sasty Helen Sastypratiwi, Helen Helfi Nasution Hendra Robaintoro, Hendra Hendro Priyatman, Hendro Hengky Anra Heri Priyanto, Heri Hermanus Herawan Ica Khamisah, Ica Imam Ghozali Indri Astuti Irwan Adhi Prasetya Ismail Yusuf Ismail Yusuf, Ismail Ismawartati - Jada Ario Yustin, Jada Ario Januardi, Tri Jarob, Yosep Jemi Karlos, Jemi Juanda op, Juanda Kadek Yudhimas Septiyadi Putra, Kadek Yudhimas Septiyadi Kamel, Ahmad Khairiyah, Dian Khairul Hafidh Lo Bun San Luhur Wicaksono Madani Madani, Madani Mandau, M Yunus Mandira, Soni Mario Anggara, Mario Meiran Panggabean Memet Agustiar Mochammad Meddy Danial Muanuddin - Muhammad Azhar Irwansyah Muhammad Hasbiansyah, Muhammad Muhammad Saleh Muhammad Yusuf Muhsin Muhsin Mulyana Mulyana Mutammimah Mutammimah, Mutammimah Muthahari, Morteza Niken Candraningrum Ninda Fitria Pratiwi, Ninda Fitria Ningsih, Kurnia Ningtyas, Della Widya Novi Safriadi Novi Safriadi Nurmainah - Pertiwi, Anggi Pratama, Ramananda Priyo Saptomo Purwaningsih - Purwoharjono Purwoharjono Purwoharjono Putri, Galuh Kusuma Rachman Rohendi Rachmawati Rahmasari, Reza Rahmidiyani - Ramadhani Edo Saputra Ratna Herawatiningsih Redi R. Yacoub Ridho Prabowo Riduansyah - Romana Herlinda Rommy Patra Ronja, Ronja Rudy Dwi Nyoto Rudy Dwi Nyoto, Rudy Dwi Rudy Dwinyoto, Rudy Ryan Herwan Dwi Putra, Ryan Herwan Septiriana, Rina Setia Budi Setiawan, Sandra Permata Gea Sholva, Yus Silvia Uslianti Simanjuntak, Maya Salinka Siti Hadijah Siti Halidjah Sofhian Sofhian Stepanus Sahala Sitompul Stephanie Stephanie Steven Pragestu, Steven Sulistyawati Sulistyawati Surachman - Sy. Hasyim Azizurrahman Syaifurrahman Syaifurrahman Syamswisna , Syarifah Nurbaiti Tedy Rismawan Tri Apriani, Tri Try Wahyudinata, Try Tursina Tursina Tursina Tursina Tursina Tursina urai salam Uti Asikin Veithzal Rivai Zainal Venny Karolina Vivensius Mitra, Vivensius Vivi Bachtiar Wahyu Gunawan Wahyuni, Mirda Warneri . Wendy Windhu Putra Witarsa - Yohanes Gatot Sutapa Yulia Magdalena Yuline - Yulis Jamiah Zahra Nadira, Zahra Zubaidah R