Claim Missing Document
Check
Articles

Found 2 Documents
Search

Robust Few Shot Biological Pathology Classification via Optimized Contrastive MobileNetV2: A Transferable Model for Low Resource Medical Imaging Nurul Adi Prawira; Muhammad Firmansyah; Dhendra Marutho; Achraf Ouhab
Journal of Intelligent Computing & Health Informatics Vol 7, No 1 (2026): March
Publisher : Universitas Muhammadiyah Semarang Press

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26714/jichi.v7i1.20179

Abstract

Artificial intelligence has revolutionized computational diagnostics, however deploying reliable intelligent systems in extreme low-resource environments remains a critical structural challenge in health informatics. Conventional deep learning architectures, such as standard Convolutional Neural Networks (CNNs), are inherently data-hungry, making them prone to severe overfitting and catastrophic generalization failures when applied to rare biological pathologies. To overcome this limitation, we propose an Optimized Contrastive MobileNetV2 architecture embedded within a Few-Shot Learning (FSL) framework. By mathematically modifying the latent space representation using a contrastive loss function, the proposed model learns discriminative metric distances rather than relying on massive raw feature memorization. To rigorously validate the algorithm, we utilize a highly constrained dataset comprising merely 120 biological pathogen samples as a cross-domain proxy testbed, accurately simulating the extreme visual complexity and data scarcity typical of rare medical diagnostic scenarios. Extensive episodic evaluations demonstrate that the proposed methodology significantly outperforms conventional baselines. Under a 10-shot learning paradigm, the contrastive architecture achieved a macro-averaged accuracy of 89.2% and an F1-Score of 89.3%, remaining statistically robust against stochastic variations (p < 0.001). Furthermore, the integration of depthwise separable convolutions restricts the model complexity to approximately 3.4 × 10^6 parameters. Crucially, empirical evaluations confirm that this framework occupies merely 13.5 MB of physical storage and achieves an ultra-low inference latency of 12.5 ms per image. Ultimately, this study establishes a highly transferable, computationally efficient algorithmic model ready for seamless integration into intelligent clinical decision support systems and remote edge-computing health architectures.
Resource Efficient Semantic Retrieval Pipeline via Generative Captioning and Text-to-Text Transformers for Bridging the Modality Gap Muhammad Firmansyah; Dhendra Marutho; Irwansyah Saputra; Eleni Vogiatzi
Journal of Intelligent Computing & Health Informatics Vol 6, No 2 (2025): September
Publisher : Universitas Muhammadiyah Semarang Press

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26714/jichi.v6i2.19240

Abstract

The rapid expansion of multimodal digital content necessitates the development of robust information retrieval systems capable of bridging the semantic gap between visual and textual data. However, contemporary cross- modal models, such as CLIP, impose significant computational demands, rendering them impractical for real-time deployment in resource-limited environments. To address this efficiency challenge, this study introduces a novel lightweight retrieval pipeline that reconceptualizes cross-modal retrieval as a text-to-text task through generative transformation. The proposed methodology employs the Bootstrapped Language-Image Pretraining (BLIP) model to distill visual features into rich textual descriptions, which are subsequently encoded into dense semantic vectors using the T5 transformer architecture. Extensive experiments conducted on the MSCOCO and Flickr30K datasets demonstrate that the proposed pipeline achieves a Semantic Average Recall (SAR@5) of 0.561, significantly surpassing traditional lexical (BM25) and dense (SBERT) baselines. Notably, while the computationally intensive CLIP model retains a slight advantage in absolute accuracy, our approach delivers approximately 90% of CLIP’s semantic performance while enhancing inference throughput by 2.1× and reducing GPU memory consumption by 62%. These findings confirm that generative semantic distillation offers a scalable, cost-effective alternative to end-to-end multimodal systems, particularly for latency-sensitive applications requiring high semantic fidelity.