Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : JOIN (Jurnal Online Informatika)

Multi Rule-based and Corpus-based for Sundanese Stemmer Sutedi, Ade; Nasrulloh, Muhammad Rikza; Elsen, Rickard
JOIN (Jurnal Online Informatika) Vol 7 No 2 (2022)
Publisher : Department of Informatics, UIN Sunan Gunung Djati Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15575/join.v7i2.846

Abstract

The purpose of this study is to develop a stemming method by involved several methods including morphological (with affix and pro-lexeme removal), syllable (canonical) pattern, and corpus data as a comparison of the final results of stemming. The algorithm checks a number of the string first and removes affixes, then check the syllable pattern according to the stripping result, then compares to the corpus data which determines the final stemming process. In this study, the corpus data was taken from Sundanese dictionary consists of a single word used for the root word and the extracted dataset from the online Sundanese magazine. The results showed that the stripping of affix and pro-lexeme can remove the corresponding affixes and pro-lexeme then compares words that have a syllable pattern then executes the basic words quickly and the use of corpus can improve accuracy and reduce the over-stemming problems that occur in the stemming process.
Sundanese Stemming using Syllable Pattern Sutedi, Ade; Elsen, Rickard; Nasrulloh, Muhammad Rikza
JOIN (Jurnal Online Informatika) Vol 6 No 2 (2021)
Publisher : Department of Informatics, UIN Sunan Gunung Djati Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15575/join.v6i2.812

Abstract

Stemming is a technique to return the word derivation to the root or base word. Stemming is widely used for data processing such as searching word indexes, translating, and information retrieval from a document in the database. In general, stemming uses a morphological pattern from a derived word to produce the original word or root word.  In the previous research, this technique faced over-stemming and under-stemming problems. In this study, the stemming process will be improved by the syllable pattern (canonical) based on the phonological rule in Sundanese. The stemming result for syllable patterns gets an accuracy of 89% and the execution of the test data resulted in 95% from all the basic words. This simple algorithm has the advantage of being able to adjust the position of the syllable pattern with the word to be stemmed. Due to some data shortage constraints (typo, loan-word, non-deterministic word with syllable pattern), we can improve to increase the accuracy such as adjusting words and adding reference dictionaries. In addition, this algorithm has a drawback that causes the execution to be over-stemming.
Co-Authors Agres Firdaus, Mochamad Agustin, Andieni Della Ahmad Ajiz, Rafi Nurkholiq Amalia, Aceu Emil Arselia, Seli Asri Mulyani Assidiq, Muhamad Imam Aulia, Husni Aulia, Wafa Ghaida Aulya, Sofa Tsuroya Aulya Baswardono, Wiyoga Dahman, Muhammad Deddy Supriatna, Asep Dede Kurniadi Deni Heryanto, Deni Deni Saputra Dewi Tresnawati Dini Destiani Siti Fatimah Elsen , Rickard Elsen, Rickard Eri Satria Erwin Gunadhi Fathori, Moch Zain Fauzan, Yasin Muhamad Fhadillah, Shelly Umayah Firginia, Rissky Firmansyah, Lukman Fitri Nuraeni Gopur, Muhammad Hakim, Arif Lukmanul Hanan, Muhamad Ilham Handiagi, Dede Herian, Akmal Muh Hermansyah, M.fikri Hermawan, Dani A Hestiyanti, Mita Ilyasa, Muhamad Indrakusumah, Muhamamad Rafi Indri Tri Julianto Iswandi, Wiky Asri Jaelani, Akmal Abdul Kodir Julia, Nurul Kamil, Zatnika Insan Khusaeri Dwi Putra, Rifaldi Al Latifah, Ayu Leni Fitriani, Leni Malik, Luthfi Abdul Manikam, Menur Maulana, Fajar Maulana, Ilham Ahmad Miftahul Hidayat, Miftahul Muhamad Gery Ms, Akbar Muhammad Zulfikar, Reza Muldani, Ilham Hikmah Muthoharoki, Oki Mutiara, Cahya Nashier, Luthfi Abdurrahman Nasrullah, Muhammad Rikza Nasrulloh, Muhammad Rikza Natasy, Gea Nur Ihsan, Irsyad Ahmad Nurbayinah, Siti Rahmah Nurfikri, Muhammad Ikhsan Nurlela Nurlela Nursyaban, Dzikri Permana, Dimas Satia Permana, Edwar Pratama, Aldy Gumelar Pratama, Guntur Eka Putra, Andre Pratama Rahadiansyah, Ardian Tri Ramadhani, Reski Ridwan Setiawan Rizal Setiawan Rizqulloh, Zidan Rakan Rosarina, Risa Rosmawati Rosmawati Sadiah, Rapiah Saifurrohman, Saifurrohman Santika, Sesti Saputra, Ali Noval Sautan Ali Arrozak Septiani, Anggi SRI RAHAYU Sriayuwahyuni, Putri Suryani, Isma Wawan Setiawan Yosep Septiana Zahrahaq, Efica Zahran, Fernuzuar Faiz