The digitization of judicial records has introduced challenges in handling large-scale data, which traditional legal research methods cannot adequately address. This paper outlines the development and evaluation of an automated data mining framework designed to collect judicial decisions from the Indonesian Supreme Court's public directory. The aim is to create a data pipeline for analyzing civil litigation trends. The approach involves a multi-stage data acquisition process using a custom Python script and a headless Selenium WebDriver to navigate complex, JavaScript-rendered websites and handle asynchronous pagination. The BeautifulSoup library is used for efficient HTML parsing and metadata extraction. Data is structured and stored in a CSV file, ensuring data integrity during interruptions. The system successfully mined 21,780 civil case records from the 2024 period, achieving an extraction rate of 12 decisions per minute with a 75% success rate. This success rate was influenced by the website's responsiveness, requiring a 120-second Read Timeout and persistent retries. Descriptive analysis using the Pandas library identified unlawful acts, breach of contract, and land disputes as the most prevalent civil litigation categories. This research provides a scalable model for legal informatics and offers foundational data for future analyses, such as Natural Language Processing (NLP) on judicial texts.
Copyrights © 2025