Sentiment analysis is a key method for deriving insights from user-generated content, particularly in evaluating public satisfaction with digital health services. This study conducts a comparative analysis of sentiment polarity classification models on 34,178 Indonesian-language reviews from SATUSEHAT Mobile, a national health application by the Indonesian Ministry of Health. The dataset was manually annotated into positive, neutral, and negative classes. Three model categories were evaluated: classical machine learning (Support Vector Machine, XGBoost), baseline neural networks (Multilayer Perceptron, Convolutional Neural Network), and pretrained transformer-based models (IndoBERT, XLM-RoBERTa). All models were trained using stratified 5-fold cross-validation and tested on a held-out set. Results show that transformer-based models significantly outperform others in all metrics. IndoBERT achieved the highest weighted F1-score (0.8555), followed closely by XLM-RoBERTa (0.8552). Despite the similar average performance, XLM-RoBERTa exhibited the lowest performance variance across folds, making it the most stable and effective model overall. Statistical validation using Friedman and Nemenyi tests confirmed these differences as significant. However, all models struggled with neutral sentiment detection due to data imbalance. Although computationally more expensive than IndoBERT, XLM-RoBERTa offers superior robustness for sentiment classification in Indonesian health-related text. These findings support the integration of transformer-based sentiment monitoring into public health dashboards to enable timely, data-driven service improvements