This study aims to develop an annotated corpus and a deep learning-based Named Entity Recognition (NER) model to identify legal entities in Indonesian corruption court rulings. The corpus was constructed from 450 Supreme Court documents related to the Anti-Corruption Laws (Laws No. 31/1999), collected via web scraping, with semi-automatic annotation (regex) and validation by legal experts. A total of 12,000 entities (Article, Laws, Sanctions) were tagged in IOB format, creating the first specialized dataset for Indonesian corruption laws. The NER model combines the IndoBERT (pre-trained language model) architecture with a CRF layer, fine-tuned to handle legal text complexities such as hierarchical article references (paragraphs, clauses) and amended laws citations (jo.). Evaluation using 10-fold cross-validation revealed that the model achieved an F1-score of 92.3%, outperforming standalone CRF (85.1%) and BiLSTM+CRF (88.7%), particularly in detecting ARTICLE entities (F1: 93.8%). Error analysis highlighted challenges in recognizing SANCTIONS entities (F1: 87.4%) due to sentence structure variability and conjunctions. The model’s implementation could accelerate judicial decision analysis, identify violation patterns, and support sanctions recommendation systems for laws enforcement. This research also provides legal entity annotation guidelines adaptable to other legal domains. Future work should expand to other laws (e.g., ITE Laws, Criminal Code) via transfer learning and integrate knowledge graphs to enhance entity relation detection.
                        
                        
                        
                        
                            
                                Copyrights © 2025