Judicial outcome analysis has attracted growing attention within legal artificial intelligence research; however, empirical studies focusing on Indonesian court decisions remain limited. This study presents an experimental evaluation of traditional machine learning and deep learning models for judicial outcome classification using Indonesian legal texts.The experiments were conducted on a curated dataset of 4,872 court decisions obtained from the official Direktori Putusan Mahkamah Agung Republik Indonesia (2018–2023). To prevent outcome leakage, all explicit ruling sections were removed prior to model training, and only the legal reasoning segments were used as input. Several models, including Logistic Regression, Support Vector Machine, Gradient Boosting, BiLSTM, and IndoBERT, were evaluated under identical experimental settings. The results show that ensemble-based methods, particularly Gradient Boosting, achieve strong and stable performance, while deep learning models demonstrate competitive but not consistently superior results under document length constraints. Error analysis indicates that misclassifications frequently arise from implicit judicial reasoning and outcome ambiguity. This study provides an empirical benchmark for judicial outcome classification in Indonesian courts and highlights methodological limitations related to document length, labeling granularity, and reproducibility in legal NLP research.