Garuda - Garba Rujukan Digital

Transaction on Informatics and Data Science

Vol. 2 No. 1 (2025)

Subowo, Edy (Unknown)
Bukhori, Imam (Unknown)
warto (Unknown)

Publish Date
07 Jun 2025

This study aims to develop an annotated corpus and a deep learning-based Named Entity Recognition (NER) model to identify legal entities in Indonesian corruption court rulings. The corpus was constructed from 450 Supreme Court documents related to the Anti-Corruption Laws (Laws No. 31/1999), collected via web scraping, with semi-automatic annotation (regex) and validation by legal experts. A total of 12,000 entities (Article, Laws, Sanctions) were tagged in IOB format, creating the first specialized dataset for Indonesian corruption laws. The NER model combines the IndoBERT (pre-trained language model) architecture with a CRF layer, fine-tuned to handle legal text complexities such as hierarchical article references (paragraphs, clauses) and amended laws citations (jo.). Evaluation using 10-fold cross-validation revealed that the model achieved an F1-score of 92.3%, outperforming standalone CRF (85.1%) and BiLSTM+CRF (88.7%), particularly in detecting ARTICLE entities (F1: 93.8%). Error analysis highlighted challenges in recognizing SANCTIONS entities (F1: 87.4%) due to sentence structure variability and conjunctions. The model’s implementation could accelerate judicial decision analysis, identify violation patterns, and support sanctions recommendation systems for laws enforcement. This research also provides legal entity annotation guidelines adaptable to other legal domains. Future work should expand to other laws (e.g., ITE Laws, Criminal Code) via transfer learning and integrate knowledge graphs to enhance entity relation detection.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Transaction on Informatics and Data Science

Website

Abbrev

tids

Publisher

Universitas Islam Negeri Profesor Kiai Haji Saifuddin Zuhri Purwokerto

Subject

Computer Science & IT Decision Sciences, Operations Research & Management Engineering

Description

Transactions on Informatics and Data Science (TIDS), with ISSN: 3064-1772 (online), is a scientific journal that publishes the latest research in the fields of informatics and data science, focusing on both theoretical advances and practical applications. Published by the Department of Informatics, ...

Article Info

Abstract

Corpus Development and NER Model for Identification of Legal Entities (Articles, Laws, and Sanctions) in Corruption Court Decisions in Indonesia

Article Info

Abstract