Tuberculosis (TB) is one of the world's deadliest infectious diseases, with Indonesia being among the countries with the highest TB burden. Semarang City, as an urban area with a dense population, faces significant challenges in controlling TB, particularly among vulnerable populations. This study identifies significant risk factors influencing TB incidence in Semarang City using a binary logistic regression model. Descriptive analysis reveals an imbalance in the data, with the majority of patients categorized as "not indicated for TB." Chi-Square tests show that variables such as shortness of breath, persistent fever for more than one month, diabetes mellitus, and household contact are significantly associated with TB incidence. The logistic regression model demonstrates overall significance (G statistic = 275.13; p-value = 1.23×10−55), with shortness of breath and diabetes mellitus emerging as major risk factors based on odds ratio interpretation. However, the model's performance in detecting the "indicated for TB" category is very low (Precision 36.36%; Recall 2.05%; F1-Score 3.88%), despite an overall accuracy of 87.25%. The poor performance in the "1" category and the Pseudo R2 value of 7% are likely related to data imbalance, where the number of cases in the "1" category is much smaller than in the "0" category, leading to bias toward the majority class. Additionally, the distribution of predictor variables that do not provide sufficient information to distinguish the "1" category from the "0" category further contributes to the model's limited ability to explain data variability overall.