BAREKENG: Jurnal Ilmu Matematika dan Terapan
Vol 20 No 3 (2026): BAREKENG: Journal of Mathematics and Its Application

EXTRACTIVE CLINICAL NOTES SUMMARIZATION USING SINGLE MACHINE LEARNING, ENSEMBLE, AND STACKING APPROACHES

Junadhi Junadhi (Department Informatics Engineering, Universitas Sains dan Teknologi Indonesia,Indonesia)
Agustin Agustin (Department Informatics Engineering, Universitas Sains dan Teknologi Indonesia, Indonesia)
Deshinta Arrova Dewi (Center for Data Science and Sustainable Technologies, INTI International University, Malaysia)
Abhishek Saxena (Department of Computer Science & Technology, Manav Rachna University,India)



Article Info

Publish Date
08 Apr 2026

Abstract

Summarizing clinical notes is pivotal to supporting medical decision-making by presenting relevant information concisely and efficiently. However, the complexity of clinical language, the unstructured nature of the text, and the inherent class imbalance pose major challenges for the development of automatic summarization systems. This study develops a framework for extractive clinical notes summarization and compares the performance of single-model machine learning, simple ensembles, and stacking. A synthetic dataset comprising 2,000 clinical notes was segmented into 22,000 sentences, each labeled as important or not important according to a reference extractive summary. The methodology includes text preprocessing (normalization, expansion of medical abbreviations, tokenization, and stopword removal), feature extraction (TF-IDF, Named Entity Recognition, and structural features), and implementation of multiple models. Evaluation relies on Accuracy, Precision, Recall, and F1-score, complemented by Entity-F1, redundancy analysis, and latency per document. Experimental results show that the best single model, XGBoost, achieves an F1-score of 0.76, reflecting its ability to capture non-linear interactions among heterogeneous clinical text features under class imbalance, while simple ensembles further improve performance to 0.78. The most substantial gains are obtained with stacking, which reaches an F1-score of 0.80, precision of 0.83, and recall of 0.78. The confusion matrix indicates low false negatives, and the Precision–Recall curve (AP = 0.73) demonstrates consistent behavior under imbalanced data conditions. Overall, the findings establish stacking as the most effective approach for extractive summarization of clinical notes. Beyond theoretical relevance, the results carry practical implications for developing clinical decision support systems that are safe, efficient, and readily integrable into digital health services.

Copyrights © 2026






Journal Info

Abbrev

barekeng

Publisher

Subject

Computer Science & IT Control & Systems Engineering Economics, Econometrics & Finance Energy Engineering Mathematics Mechanical Engineering Physics Transportation

Description

BAREKENG: Jurnal ilmu Matematika dan Terapan is one of the scientific publication media, which publish the article related to the result of research or study in the field of Pure Mathematics and Applied Mathematics. Focus and scope of BAREKENG: Jurnal ilmu Matematika dan Terapan, as follows: - Pure ...