Anwar, Ndaru Syaiful
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

A MAX-MARGIN APPROACH TO SENTENCE BOUNDARY SEGMENTATION IN INDONESIAN PARAGRAPHS Prasetya, Agung; Sari, Yayak Kartika; Anwar, Ndaru Syaiful
JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika) Vol 10, No 4 (2025)
Publisher : STKIP PGRI Tulungagung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29100/jipi.v10i4.9563

Abstract

This study presents a max-margin–based approach for sentence boundary segmentation in Indonesian paragraphs, addressing a persistent challenge in Natural Language Processing applications. Conventional rule-based or sequential methods often struggle to distinguish ambiguous punctuation marks, particularly in contexts involving abbreviations, numerical expressions, hierarchical sentence structures, and direct quotations. To overcome these limitations, this research formulates sentence segmentation as a paragraph parsing task, enabling the model to capture both local boundary cues and global structural patterns within a paragraph. A manually annotated corpus of 12,000 paragraphs from news articles, public documents, and academic texts was developed to provide diverse linguistic structures and punctuation variations. The proposed model integrates local punctuation features, structural constraints from the Indonesian EYD standard, and global paragraph coherence through a max-margin discriminative parsing framework. Experimental results show that the model achieves strong performance on the test set, with a precision of 0.93, recall of 0.91, and F1-score of 0.92, significantly outperforming a rule-based baseline. Error analysis further highlights improvements in handling ambiguous cases such as abbreviations, numerical formatting, and direct quotations with nested punctuation. The findings demonstrate that a structured max-margin approach delivers more reliable sentence boundary segmentation and can enhance downstream NLP tasks requiring accurate sentence-level text processing.