Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

P-Index

This Author published in this journals

All Journal Data Science: Journal of Computing and Applied Informatics

Trienani Hariyanti

Universitas Teknologi Sumbawa

Author-ID : 4235619

Computer Science & IT

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Title

Samawa Part of Speech Tagging using Brill Tagger Trienani Hariyanti; Saori Aida; Hiroyuki Kameda
Data Science: Journal of Computing and Applied Informatics Vol. 3 No. 2 (2019): Data Science: Journal of Computing and Applied Informatics (JoCAI)
Publisher : Talenta Publisher

There exist 7,097 living languages in the world cited by Ethnologue. Most of them, however, do not exist on the Internet as the objects of research. It indicates the gap in language resources. One of them is Samawa language which has over 500,000 native speakers and is identified as endangered language by UNESCO. What we known about Samawa so far is a lack of information, tools, and resources to maintain its sustainability. This paper aims to contribute to NLP, a growing field of research, by exploring Samawa part of speech tagging problem using rule-based approach, i.e. Brill tagger. It has been trained on very limited data of Samawa corpus, which is 24,627 tokens including punctuation marks with 24 tags of our original tagset. K-fold cross-validation (k = 5 and k = 10) was applied to compare Brill’s performance with Unigram, HMM, and TnT. Brill tagger with the combination of default tagger, Unigram, Bigram and Trigram as baseline tagger achieve higher accuracy over 95% than others. It suggests that the Brill tagger can be used to extend Samawa corpus automatically.

Co-Authors Hiroyuki Kameda Saori Aida

Title Search

Found 1 Documents Search Journal : data science journal of computing and applied informatics

Abstract

Title

Found 1 Documents
Search
Journal : data science journal of computing and applied informatics