Garuda - Garba Rujukan Digital

JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika)

Vol 10, No 4 (2025)

Prasetya, Agung (Unknown)
Sari, Yayak Kartika (Unknown)
Anwar, Ndaru Syaiful (Unknown)

Publish Date
14 Dec 2025

This study presents a max-margin–based approach for sentence boundary segmentation in Indonesian paragraphs, addressing a persistent challenge in Natural Language Processing applications. Conventional rule-based or sequential methods often struggle to distinguish ambiguous punctuation marks, particularly in contexts involving abbreviations, numerical expressions, hierarchical sentence structures, and direct quotations. To overcome these limitations, this research formulates sentence segmentation as a paragraph parsing task, enabling the model to capture both local boundary cues and global structural patterns within a paragraph. A manually annotated corpus of 12,000 paragraphs from news articles, public documents, and academic texts was developed to provide diverse linguistic structures and punctuation variations. The proposed model integrates local punctuation features, structural constraints from the Indonesian EYD standard, and global paragraph coherence through a max-margin discriminative parsing framework. Experimental results show that the model achieves strong performance on the test set, with a precision of 0.93, recall of 0.91, and F1-score of 0.92, significantly outperforming a rule-based baseline. Error analysis further highlights improvements in handling ambiguous cases such as abbreviations, numerical formatting, and direct quotations with nested punctuation. The findings demonstrate that a structured max-margin approach delivers more reliable sentence boundary segmentation and can enhance downstream NLP tasks requiring accurate sentence-level text processing.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika)

Website

Abbrev

Publisher

STKIP PGRI Tulungagung

Subject

Computer Science & IT Education

Description

JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika) e-ISSN: 2540 - 8984 was made to accommodate the results of scientific work in the form of research or papers are made in the form of journals, particularly the field of Information Technology. JIPI is a journal that is managed by the ...

Article Info

Abstract

A MAX-MARGIN APPROACH TO SENTENCE BOUNDARY SEGMENTATION IN INDONESIAN PARAGRAPHS

Article Info

Abstract