Claim Missing Document
Check
Articles

Found 2 Documents
Search

Addressing Intrinsic Data Characteristics Issues of Imbalance Medical Data Using Nature Inspired Percolation Clustering Siddavatam, Kaikashan; Shinde, Subhash
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 7 No 3 (2025): July
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v7i3.835

Abstract

Data on diseases are generally skewed towards either positive or negative cases, depending on their prevalence. The problem of imbalance can significantly impact the performance of classification models, resulting in biased predictions and reduced model accuracy for the underrepresented class. Other factors that affect the performance of classifiers include intrinsic data characteristics, such as noise, outliers, and within-class imbalance, which complicate the learning task. Contemporary imbalance handling techniques employ clustering with SMOTE (Synthetic Minority Oversampling Technique) to generate realistic synthetic data that preserves the underlying data distribution, generalizes unseen data and mitigates overfitting to noisy points. Centroid-based clustering methods (e.g., K-means) often produce synthetic samples that are too clustered or poorly spaced. At the same time, density-based methods (e.g., DBSCAN) may fail to generate sufficient meaningful synthetic samples in sparse regions. The work aims to develop nature-inspired clustering that, combined with SMOTE, generates synthetic samples that adhere to the underlying data distribution and maintain sparsity among the data points that enhance performance of classifier. We propose PC-SMOTE, which leverages Percolation Clustering (PC), a novel clustering algorithm inspired by percolation theory. The methodology of PC utilizes a connectivity-driven framework to effectively handle irregular cluster shapes, varying densities, and sparse minority instances. The experiment was designed using a hybrid approach to assess PC-SMOTE using synthetically generated data with variable spread and other parameters; second, the algorithm was evaluated on eight sets of real medical datasets. The results show that the PC-SMOTE method works excellently for the Breast cancer dataset, Parkinson's dataset, and Cervical cancer dataset, where AUC is in the range of 96% to 99%, which is high compared to the other two methods. This demonstrates the effectiveness of the PC-SMOTE algorithm in handling datasets with both low and high imbalance ratios and often demonstrates competitive or superior performance compared to K-means and DBSCAN combined with SMOTE in terms of AUC, F1-score, G-mean, and PR-AUC.
Unveiling anomalies in industrial control systems: a kernel SHAP-based approach with temporal convolution autoencoder Oswal, Sangeeta; Shinde, Subhash; Murli, Vijayalaksmi
International Journal of Advances in Applied Sciences Vol 14, No 4: December 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijaas.v14.i4.pp1420-1432

Abstract

Industrial control systems (ICS) are often the target of cyber-attacks, leading to undesirable consequences. ICSs operate without human supervision, making them vulnerable to adversaries. In recent years, numerous deep learning-based solutions have demonstrated their efficiency in detecting anomalies in ICSs. However, there is a lack of ability to pinpoint the sensors and actuators that contributed to the anomaly. In this research work, we use kernel Shapley additive explanations (SHAP) to explain anomalies detected by a temporal convolution autoencoder (TCAE). The proposed TCAE model handles the long-term dependency effectively and is computationally effective on a large dataset. A comprehensive explanation is provided, focusing on the feature that contributed to the anomaly for each identified attack. The SHAP values are extracted for each identified attack and visually depict the feature that contributed to the anomaly for each attack, helping the expert to handle the attack and build user trust.