Moi, Sim Hiew
Unknown Affiliation

Published : 3 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 3 Documents
Search

The Effects of Imbalanced Datasets on Machine Learning Algorithms in Predicting Student Performance Sujon, Khaled Mahmud; Hassan, Rohayanti; Khairudin, Alif Ridzuan; Moi, Sim Hiew; Mohd Shafie, Muhammad Luqman; Saringat, Zainuri; Erianda, Aldo
JOIV : International Journal on Informatics Visualization Vol 8, No 3-2 (2024): IT for Global Goals: Building a Sustainable Tomorrow
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.8.3-2.2449

Abstract

Predictive analytics technologies are becoming increasingly popular in higher education institutions. Students' grades are one of the most critical performance indicators educators can use to predict their academic achievement. Academics have developed numerous techniques and machine-learning approaches for predicting student grades over the last several decades. Although much work has been done, a practical model is still lacking, mainly when dealing with imbalanced datasets. This study examines the impact of imbalanced datasets on machine learning models' accuracy and reliability in predicting student performance. This study compares the performance of two popular machine learning algorithms, Logistic Regression and Random Forest, in predicting student grades. Secondly, the study examines the impact of imbalanced datasets on these algorithms' performance metrics and generalization capabilities. Results indicate that the Random Forest (RF) algorithm, with an accuracy of 98%, outperforms Logistic Regression (LR), which achieved 91% accuracy. Furthermore, the performance of both models is significantly impacted by imbalanced datasets. In particular, LR struggles to accurately predict minor classes, while RF also faces difficulties, though to a lesser extent. Addressing class imbalance is crucial, notably affecting model bias and prediction accuracy. This is especially important for higher education institutes aiming to enhance the accuracy of student grade predictions, emphasizing the need for balanced datasets to achieve robust predictive models.
Transformer in mRNA Degradation Prediction Yit, Tan Wen; Hassan, Rohayanti; Zakaria, Noor Hidayah; Kasim, Shahreen; Moi, Sim Hiew; Khairuddin, Alif Ridzuan; Amnur, Hidra
JOIV : International Journal on Informatics Visualization Vol 7, No 2 (2023)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.7.2.1165

Abstract

The unstable properties and the advantages of the mRNA vaccine have encouraged many experts worldwide in tackling the degradation problem. Machine learning models have been highly implemented in bioinformatics and the healthcare fieldstone insights from biological data. Thus, machine learning plays an important role in predicting the degradation rate of mRNA vaccine candidates. Stanford University has held an OpenVaccine Challenge competition on Kaggle to gather top solutions in solving the mentioned problems, and a multi-column root means square error (MCRMSE) has been used as a main performance metric. The Nucleic Transformer has been proposed by different researchers as a deep learning solution that is able to utilize a self-attention mechanism and Convolutional Neural Network (CNN). Hence, this paper would like to enhance the existing Nucleic Transformer performance by utilizing the AdaBelief or RangerAdaBelief optimizer with a proposed decoder that consists of a normalization layer between two linear layers. Based on the experimental result, the performance of the enhanced Nucleic Transformer outperforms the existing solution. In this study, the AdaBelief optimizer performs better than the RangerAdaBelief optimizer, even though it possesses Ranger’s advantages. The advantages of the proposed decoder can only be shown when there is limited data. When the data is sufficient, the performance might be similar but still better than the linear decoder if and only if the AdaBelief optimizer is used. As a result, the combination of the AdaBelief optimizer with the proposed decoder performs the best with 2.79% and 1.38% performance boost in public and private MCRMSE, respectively.
An Improved Approach of Iris Biometric Authentication Performance and Security with Cryptography and Error Correction Codes Moi, Sim Hiew; Yong, Pang Yee; Hassan, Rohayanti; Asmuni, Hishammuddin; Mohamad, Radziah; Weng, Fong Cheng; Kasim, Shahreen
JOIV : International Journal on Informatics Visualization Vol 6, No 2-2 (2022): A New Frontier in Informatics
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.6.2-2.1091

Abstract

One of the most challenging parts of integrating biometrics and cryptography is the intra variation in acquired identifiers between the same users. Due to noise in the environment or different devices, features of the iris may differ when it is acquired at different time periods. This research focuses on improving the performance of iris biometric authentication and encrypting the binary code generated from the acquired identifiers. The proposed biometric authentication system incorporates the concepts of non-repudiation and privacy. These concepts are critical to the success of a biometric authentication system. Iris was chosen as the biometric identifier due to its characteristics of high accuracy and the permanent presence throughout an individual’s lifetime. This study seeks to find a method of reducing the noise and error associated with the nature of dissimilarity acquired by each biometric acquisition.  In our method, we used Reed Solomon error-correction codes to reduce dissimilarities and noise in iris data. The code is a block-based error correcting code that can be easily decoded and has excellent burst correction capabilities. Two different distance metric measurement functions were used to measure the accuracy of the iris pattern matching identification process, which are Hamming distance and weighted Euclidean distance. The experiments were conducted with the CASIA 1.0 iris database. The results showed that the False Acceptance Rate is 0%, the False Rejection Rate is 1.54%, and the Total Success Rate is 98.46%. The proposed approach appears to be more secure, as it is able to provide a low rate of false rejections and false acceptances.