Early identification of stress and depression among university students is essential to support timely psychological intervention, yet traditional counseling methods often rely on manual, self-initiated reporting that may overlook students experiencing emotional distress. This study aimed to develop a text-based mental-health detection framework using transformer models supported by contextual labeling to analyze student-generated social-media content. The research was conducted through three stages: problem exploration with the Student Affairs Division, data collection from questionnaires and 993 social-media text entries, and comprehensive data preprocessing involving cleaning, normalization, deduplication, and lexicon-based weak labeling. The cleaned dataset was used to fine-tune two transformer architectures—RoBERTa for sequence classification and T5 for text-to-text classification—and to construct a majority-vote ensemble. Model performance was evaluated using accuracy, precision, recall, F1-score, and confusion matrices. The results showed that the T5 model achieved the most balanced performance across all categories, particularly in distinguishing neutral and stress expressions, while RoBERTa and the ensemble exhibited strong prediction bias toward a single class. The findings demonstrated that contextual preprocessing combined with transformer-based modeling effectively supported automated detection of student emotional states. This study concluded that transformer models, especially T5 with contextual labeling, offered a promising foundation for developing early-warning systems that can be integrated into university counseling services and further enhanced through expanded datasets, expert-validated annotations, and explainable-AI components.