Claim Missing Document
Check
Articles

Found 16 Documents
Search

Analysis of diabetes mellitus gene expression data using two-phase biclustering method Rahmat Al Kafi; Alhadi Bustamam; Wibowo Mangunwardoyo
Jurnal Ilmiah Matematika Vol 8, No 2 (2021)
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/konvergensi.v0i0.22111

Abstract

The purpose of this research is to find bicluster from Type 2 Diabetes Mellitus genes expression data which samples are obese and lean people using two-phase biclustering. The first step is to use Singular Value Decomposition to decompose matrix gene expression data into gene and condition based matrices. The second step is to use K-means to cluster gene and condition based matrices, forming several clusters from each matrix. Furthermore, the silhouette method is applied to determine the number of optimum clusters and measure the accuracy of grouping results. Based on the experimental results, Type 2 Diabetes Mellitus dataset with 668 selected genes produced optimal biclusters, with six biclusters. The obtained biclusters consist of 2 clusters on the gene-based matrix and 3 clusters on the sample-based matrix with silhouette values, respectively, are 0.7361615 and 0.7050163.
CONV1D-LSTM-BASED QSAR CLASSIFICATION MODEL FOR BACE1 INHIBITORS: A COMPREHENSIVE APPROACH WITH DESALTING, PAINS FILTERING AND DRUG-LIKENESS ANALYSIS Nugroho, Trianto Haryo; Bustamam, Alhadi
Multidiciplinary Output Research For Actual and International Issue (MORFAI) Vol. 5 No. 3 (2025): Multidiciplinary Output Research For Actual and International Issue
Publisher : RADJA PUBLIKA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.54443/morfai.v5i3.3023

Abstract

In recent years, the discovery of Beta-Secretase 1 (BACE1) enzyme inhibitors for more effective Alzheimer’s therapy has become a major focus, making in silico research to identify new inhibitors with minimal side effects increasingly essential. Ligand-Based Virtual Screening (LBVS) using Quantitative Structure–Activity Relationship (QSAR) methods offers a fast and cost-effective alternative to experimental assays. In this study, we propose a Conv1D-LSTM-based QSAR model as a novel approach for classifying BACE1 enzyme inhibitors, where Conv1D is employed for encoding molecular data and LSTM is used to classify compounds as active or inactive. The model is complemented by drug-likeness analysis based on Lipinski's Rule of Five to evaluate the therapeutic potential of candidate molecules. The dataset used includes 711 molecular structures, consisting of 278 active and 433 inactive compounds. Experimental results demonstrate that our model achieves a classification accuracy of 79.13%, with a sensitivity of 73.02%, specificity of 83.08%, and a Matthews Correlation Coefficient (MCC) of 56.38%.
Reconstruction of the Phi-2 Method for Question-Answering Related to Diabetes Disease Using the MedAlpaca Dataset Ridho, Muhammad; Bustamam, Alhadi; Adnan, Risman
Jambura Journal of Biomathematics (JJBM) Volume 6, Issue 3: September 2025
Publisher : Department of Mathematics, Universitas Negeri Gorontalo

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.37905/jjbm.v6i3.30506

Abstract

This  study  focuses on the reconstruction of the Phi-2  method  for text-based question-answering systems  related to diabetes  using the MedAlpaca dataset.   The  aim  is to enhance  the accuracy in  diabetes  question-answering applications.   We  leverage LoRA  techniques   to fine-tune  the model,  thereby  improving its  ability to handle complex medical queries.  The integration of the MedAlpaca dataset, which contains  a diverse range of medical questions  and answers,  provides a robust  foundation for training and testing the model.  The results  reveal  that fine-tuning  with   MedAlpaca  significantly  enhances   the  model’s   performance,  achieving  higher   accuracy compared to the base Phi-2  model,  achieving a performance increase  from  14.81% to 49.37% on MedMCQA, reaching  92.83%  on  PubMedQA, and  38.78%  on  MedQA. It  also  surpasses  other  leading  models   such  as BioBERT  (89.90%)   and   GatorTron  (90.87%).        The   results    highlight  the   effectiveness    of   incorporating domain-specific datasets  like  MedAlpaca to boost model  performance.  This  advancement points  to promising directions  for  future  research,   including  expanding datasets  and  refining fine-tuning techniques   to  further improve automated  medical question-answering systems.
COMPARISON OF MISSING VALUE IMPUTATION USING MEAN, BAYESIAN KNN, AND NON-BAYESIAN KNN ON TEP GENE EXPRESSION DATA Mastika, Mastika; Siswantining, Titin; Bustamam, Alhadi
MEDIA STATISTIKA Vol 18, No 1 (2025): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.18.1.61-72

Abstract

Analysis of gene expression data, particularly in cancer data, often faces challenges due to the presence of missing values. One approach to overcome this is data imputation. This study evaluates the performance of three imputation methods, namely mean imputation, K-Nearest Neighbors (KNN), and KNN with Bayesian optimization using Gaussian Process modeling, on Tumor Educated Platelets (TEP) gene expression data. Missing values were introduced using Missing Completely at Random (MCAR) gradually at levels of 5%, 10%, 15%, and up to 60%, and performance was evaluated using three metrics: Mean Absolute Error (MAE), Mean Squared Error (MSE), and Normalized Root Mean Squared Error (NRMSE). The results show that the three methods produce relatively similar performance, with differences in MAE, MSE, and NRMSE values only at a small decimal scale. Although Bayesian Optimization is expected to improve the accuracy of KNN, the resulting improvement on this dataset is not significant. These findings indicate that simple imputation such as the average and KNN-based methods still provide competitive results on TEP data with data characteristics that have 14,020,496 zeros out of a total of 16,512,496 existing values, which is approximately 84.91% of the total data.
Evaluation And Selection Of Optimal Deep Learning Architecture For Predicting The Endpoint In High Shear Wet Granulation For Antacid Tablet Production Maulana, Irvan; Yanuar, Arry; Sutriyo, Sutriyo; Bustamam, Alhadi
Eduvest - Journal of Universal Studies Vol. 4 No. 6 (2024): Journal Eduvest - Journal of Universal Studies
Publisher : Green Publisher Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.59188/eduvest.v4i5.1274

Abstract

Objective: The purpose of this research was to evaluate and select the best architecture among native convolutional neural network (CNN), MobileNetV2, ResNet50V2, and EfficientNetB0 for predicting the endpoint of the high shear wet granulation process, with accuracy as the main evaluation metric. Methods: The dataset was captured from an industrial camera using static image analysis and was manually labeled as “NOT READY” and “READY” according to the traditional endpoint method based on the mixer’s ampere point in the granulator. The dataset contained a total of 180 images, which were split between training and validation sets. Native CNN and TensorFlow Keras application programming interface (API) were utilized with MobileNetV2, EfficientNetB0, and ResNet50V2 as base feature encoders. Hyperparameters, such as final Fully Connected (FC) layer width, dropout rate, and learning rate, were optimized for binary classification using Keras hyper tuning. Results: The best was the native CNN, it was also the fastest among the three other models, taking only 20-30 ms per step for inference during runtime, though it requires 9000 ms time for training, the longest time among the models. It achieved an accuracy of 98%, and a validation accuracy of 97%. Conclusion: The system was able to determine when a wet granulation process has reached its endpoint based on live images from a camera after being trained on previously labeled data. The native CNN was the best model, offering the fastest runtime performance and the highest accuracy.
Analysis of diabetes mellitus gene expression data using two-phase biclustering method Kafi, Rahmat Al; Bustamam, Alhadi; Mangunwardoyo, Wibowo
Jurnal Ilmiah Matematika Vol 8, No 2 (2021)
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/konvergensi.v0i0.22111

Abstract

The purpose of this research is to find bicluster from Type 2 Diabetes Mellitus genes expression data which samples are obese and lean people using two-phase biclustering. The first step is to use Singular Value Decomposition to decompose matrix gene expression data into gene and condition based matrices. The second step is to use K-means to cluster gene and condition based matrices, forming several clusters from each matrix. Furthermore, the silhouette method is applied to determine the number of optimum clusters and measure the accuracy of grouping results. Based on the experimental results, Type 2 Diabetes Mellitus dataset with 668 selected genes produced optimal biclusters, with six biclusters. The obtained biclusters consist of 2 clusters on the gene-based matrix and 3 clusters on the sample-based matrix with silhouette values, respectively, are 0.7361615 and 0.7050163.