Scientific Journal of Computer Science
Vol. 1 No. 1 (2025): January

Analysis of Suspected Factors in Tuberculosis Cases in Semarang City Using a Logistic Regression Model

Ihsan Fathoni Amri (Department of Data Science, Universitas Muhammadiyah Semarang)
Febrian Hikmah Nur Rohim (Department of Data Science, Universitas Muhammadiyah Semarang)
Muhammad Ivan Ardiansyah (Department of Data Science, Universitas Muhammadiyah Semarang)
Farid Sam Saputra (Department of Data Science, Universitas Muhammadiyah Semarang)
Supriyanto (Department of Chemistry Education, Universitas Muhammadiyah Semarang)
Ariska Fitriyana Ningrum (Department of Data Science, Universitas Muhammadiyah Semarang)
Arman Mohammad Nakib (Artificial Intelligence, Nanjing University of Information Science &Technology)



Article Info

Publish Date
09 May 2025

Abstract

Tuberculosis (TB) is one of the world's deadliest infectious diseases, with Indonesia being among the countries with the highest TB burden. Semarang City, as an urban area with a dense population, faces significant challenges in controlling TB, particularly among vulnerable populations. This study identifies significant risk factors influencing TB incidence in Semarang City using a binary logistic regression model. Descriptive analysis reveals an imbalance in the data, with the majority of patients categorized as "not indicated for TB." Chi-Square tests show that variables such as shortness of breath, persistent fever for more than one month, diabetes mellitus, and household contact are significantly associated with TB incidence. The logistic regression model demonstrates overall significance (G statistic = 275.13; p-value = 1.23×10−55), with shortness of breath and diabetes mellitus emerging as major risk factors based on odds ratio interpretation. However, the model's performance in detecting the "indicated for TB" category is very low (Precision 36.36%; Recall 2.05%; F1-Score 3.88%), despite an overall accuracy of 87.25%. The poor performance in the "1" category and the Pseudo R2 value of 7% are likely related to data imbalance, where the number of cases in the "1" category is much smaller than in the "0" category, leading to bias toward the majority class. Additionally, the distribution of predictor variables that do not provide sufficient information to distinguish the "1" category from the "0" category further contributes to the model's limited ability to explain data variability overall.

Copyrights © 2025






Journal Info

Abbrev

sjcs

Publisher

Subject

Computer Science & IT

Description

The Scientific Journal of Computer Science (SJCS) (e-ISSN: 3110-3170) is a peer-reviewed and open-access scientific journal, managed and published by PT. Teknologi Futuristik Indonesia in collaboration with Universitas Qamarul Huda Badaruddin Bagu and Peneliti Teknologi Teknik Indonesia. The SJCS ...