Building of Informatics, Technology and Science
Vol 4 No 2 (2022): September 2022

POS Tagger Improvisation with the Addition of Foreign Word Labels on Telkom University News

Winkie Setyono (Telkom University, Bandung)
Donni Richasdy (Telkom University, Bandung)
Mahendra Dwifebri Purbolaksono (Telkom University, Bandung)



Article Info

Publish Date
22 Sep 2022

Abstract

News is a medium of daily information usually obtained by the public. The news consists of a lot of information in it and is composed of sentence structures. Each language is unique with its own sentence structure, like Indonesian and other foreign languages. But nowadays, many media mix Indonesian with foreign languages, making the sentence structure different from Bahasa Indonesia. To classify these words, Part Of Speech Tagging needed to determine the class of words composed of sentences by learning from the Corpus of each language. With the new sentence structure, POS Tagger requires a larger Corpus to learn. The language structure can determine the results of tagging from the POS Tagger. If there are words that are not in the Corpus, it can reduce the accuracy of the POS Tagger. We conducted to enhance the research results by adding data with a different sentence structure from the Indonesian Language Corpus using sentences from online media. Added about 242 sentences with 7,043 tokens on Corpus focused on Foreign Word tags, which total 3819 tags. After doing some testing and scenarios, the results of the accuracy of POS Tagger show an accuracy of 94.7% using the Hidden Markov Model method with the F1-Score tag FW 78%.

Copyrights © 2022






Journal Info

Abbrev

bits

Publisher

Subject

Computer Science & IT

Description

Building of Informatics, Technology and Science (BITS) is an open access media in publishing scientific articles that contain the results of research in information technology and computers. Paper that enters this journal will be checked for plagiarism and peer-rewiew first to maintain its quality. ...