Febrian Hikmah Nur Rohim
Department of Data Science, Universitas Muhammadiyah Semarang

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Analysis of Suspected Factors in Tuberculosis Cases in Semarang City Using a Logistic Regression Model Ihsan Fathoni Amri; Febrian Hikmah Nur Rohim; Muhammad Ivan Ardiansyah; Farid Sam Saputra; Supriyanto; Ariska Fitriyana Ningrum; Arman Mohammad Nakib
Scientific Journal of Computer Science Vol. 1 No. 1 (2025): January
Publisher : PT. Teknologi Futuristik Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.64539/sjcs.v1i1.2025.32

Abstract

Tuberculosis (TB) is one of the world's deadliest infectious diseases, with Indonesia being among the countries with the highest TB burden. Semarang City, as an urban area with a dense population, faces significant challenges in controlling TB, particularly among vulnerable populations. This study identifies significant risk factors influencing TB incidence in Semarang City using a binary logistic regression model. Descriptive analysis reveals an imbalance in the data, with the majority of patients categorized as "not indicated for TB." Chi-Square tests show that variables such as shortness of breath, persistent fever for more than one month, diabetes mellitus, and household contact are significantly associated with TB incidence. The logistic regression model demonstrates overall significance (G statistic = 275.13; p-value = 1.23×10−55), with shortness of breath and diabetes mellitus emerging as major risk factors based on odds ratio interpretation. However, the model's performance in detecting the "indicated for TB" category is very low (Precision 36.36%; Recall 2.05%; F1-Score 3.88%), despite an overall accuracy of 87.25%. The poor performance in the "1" category and the Pseudo R2 value of 7% are likely related to data imbalance, where the number of cases in the "1" category is much smaller than in the "0" category, leading to bias toward the majority class. Additionally, the distribution of predictor variables that do not provide sufficient information to distinguish the "1" category from the "0" category further contributes to the model's limited ability to explain data variability overall.