Claim Missing Document
Check
Articles

Found 2 Documents
Search

Enhancing Interpretable Multiclass Lung Cancer Severity Classification using TabNet Norman, Maria Bernadette Chayeenee; Dewi, Ika Novita; Salam, Abu; Utomo, Danang Wahyu; Rakasiwi, Sindhu
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.11417

Abstract

Lung cancer poses a significant global mortality challenge, with early clinical detection hindered by non-specific symptoms making accurate diagnosis dependent on extracting subtle patterns from often complex medical tabular data. Traditional machine learning approaches often fall short in capturing intricate patterns within such heterogeneous datasets, hindering effective clinical decision support. This research introduces TabNet, an interpretable deep learning architecture, for multiclass lung cancer severity prediction (low, medium, high). Utilizing the Kaggle Lung Cancer dataset, our methodology leverages TabNet's unique attention-based feature selection for end-to-end processing of tabular data, enabling adaptive identification of key predictors and crucial model interpretability. To effectively assess its predictive capabilities and ensure robust performance, the model was trained with default configurations and validated through stratified 5-fold cross-validation, achieving outstanding performance on the test set: 98.50% accuracy, a 0.98 F1-score, and a 0.9996 macro-AUC-ROC. Beyond its robustness, confirmed by stable learning curves, interpretability analysis highlighted 'Genetic Risk' and 'Shortness of Breath' as dominant factors. Our results underscore TabNet's efficacy as a reliable, robust, and inherently interpretable solution, offering significant potential to improve the precision and transparency of lung cancer severity assessment in clinical practice.
Domain Adaptation of Bert Models for Biomedical Entity Extraction from Indonesian Health News Norman, Maria Bernadette Chayeenee; Dewi, Ika Novita; Ignasius, Darnell
Jurnal ELTIKOM : Jurnal Teknik Elektro, Teknologi Informasi dan Komputer Vol. 10 No. 1 (2026)
Publisher : P3M Politeknik Negeri Banjarmasin

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31961/eltikom.v10i1.2116

Abstract

Health-related news articles play an increasingly important role in public health monitoring. However, their unstructured linguistic style complicates the automatic extraction of biomedical information. Indonesian health news shows high lexical variation by combining medical terms, colloquial expressions, borrowed Eng-lish words, and culturally specific symptom descriptions. This condition creates challenges for Named Entity Recognition (NER). To address the limited availability of domain-specific resources, this study compares four Transformer-based models, namely BERT, IndoBERT, RoBERTa, and BioBERT, for biomedical NER in Indone-sian health news. A new BIO-annotated dataset consisting of 272 manually labeled articles was constructed and validated, achieving strong inter-annotator agreement (Cohen’s Kappa = 0.88). To reduce data limita-tions, an additional 103 articles were automatically annotated using the best-performing model, RoBERTa, through a semi-supervised approach. All models were fine-tuned under identical settings and evaluated at both BIO and entity levels. The results show that RoBERTa achieves the highest weighted F1-score (0.9543). Howev-er, its macro F1-score (0.3873) indicates uneven performance across entity classes because of severe label im-balance, with non-entity tokens dominating the dataset. This finding highlights the importance of emphasizing macro-level evaluation to better reflect entity recognition performance. RoBERTa consistently outperforms the other models, which may be explained by its robust architecture and adaptability to diverse linguistic patterns. In contrast, BioBERT underperforms because of cross-lingual and domain mismatch, as it is pretrained on Eng-lish biomedical corpora and optimized for scientific text rather than journalistic language. The error analysis further identifies boundary inconsistencies and under-detection of low-frequency entities, especially in the drug and symptom categories.