Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal Jurnal Teknik Informatika (JUTIF)

Yayat Sudaryat

Universitas Pendidikan Indonesia, Indonesia

Author-ID : 10196485

Computer Science & IT

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Constructing a Part-of-Speech Tagging based on Lexicon and Rule-based for Sundanese Corpus Ade Sutedi; Ayu Latifah; Novan Rodiansyah; Yayat Sudaryat
Jurnal Teknik Informatika (Jutif) Vol. 7 No. 3 (2026): JUTIF Volume 7, Number 3, June 2026
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2026.7.3.5361

Part-of-Speech (POS) Tagging is the process of annotating word classes (nouns, verbs, adjectives, etc.) in a sentence, which is used as a basis for natural language processing and artificial intelligence. In this study, a corpus of word classes and word class annotating rules for the Sundanese language, which has limited resources, was developed. The experiments were conducted on an annotated corpus consisting of 104,696 tokens collected from Sundanese dictionaries, Sundanese Literature (Carita Pondok, Guguritan, Mantra, Pupujian, Sisindiran, Sajak, and Wawacan), Babasan and Paribasa, and social media X (Twitter). The annotation process is carried out in several stages that combine manual annotation based on cross-lingual transfer from Indonesian POS to Sundanese POS, then adjusted based on the word class rules in Sundanese. The results of this study are a POS annotation corpus containing Sundanese word-tag pairs and a basic rule-based model compared to the HMM and CRF models. The rule-based model achieves an F1-score of 0.867, the CRF model achieves an F1-score of 0.889, while the HMM model attains the highest score with an F1-score of 1.000. Analysis of POS distributions reveals that nouns (KB) consistently dominate across all models, reflecting the noun-rich nature of Sundanese literary texts. It also highlights the challenges of handling unknown words and the need for richer annotated resources, which are related to tag interoperability with Universal POS standards. This research contributes to the development of NLP resources for low-resource languages and provides a methodological foundation for future Sundanese NLP applications.

Co-Authors Ade Sutedi Ayu Latifah Novan Rodiansyah

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search