Depression is a major mental health concern that requires early identification and timely intervention. Social media has become an important source of user-generated text that may reflect emotional distress, hopelessness, social withdrawal, and suicidal ideation. However, most existing depression detection studies focus on English or high-resource languages, while research on low-resource languages such as Urdu remains limited. This study investigates depression severity classification in Urdu social media text using multilingual and confidence-aware natural language processing approaches. The dataset consists of 4,000 Twitter/X posts collected between January 2024 and April 2025, annotated into four severity classes: none, mild, moderate, and severe. Each post is represented in three parallel textual forms: native Urdu script, Roman Urdu transliteration, and English translation. The dataset also includes label confidence scores, human verification indicators, cultural markers, and depression-related keywords. Several text representation scenarios were evaluated, including Urdu text, Roman Urdu text, English text, and combined multilingual features. Baseline machine learning models were developed using TF-IDF features with Logistic Regression, Linear Support Vector Machine, and Multinomial Naive Bayes. Confidence-aware learning was examined by incorporating label confidence scores as sample weights and by evaluating a high-confidence subset. The experimental results showed that all baseline models achieved perfect classification performance, with accuracy, macro F1-score, weighted F1-score, and Cohen’s Kappa values of 1.000 across the evaluated scenarios. These results indicate that the dataset contains highly separable linguistic patterns among depression severity classes. However, further inspection suggests that repeated or highly similar textual patterns may contribute to overly optimistic performance. Therefore, stricter validation using duplicate-free splitting, external datasets, and transformer-based models is recommended for future work. This study provides a preliminary benchmark for multilingual depression severity classification in low-resource Urdu text and highlights the potential of AI-driven mental health informatics as a supportive early-warning tool rather than a clinical diagnostic system