Rohayanti Hassan, Rohayanti
Unknown Affiliation

Published : 11 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 9 Documents
Search
Journal : JOIV : International Journal on Informatics Visualization

A Hybrid Approach for Malicious URL Detection Using Ensemble Models and Adaptive Synthetic Sampling Sujon, Khaled Mahmud; Hassan, Rohayanti; Zainodin, Muhammad Edzuan; Salamat, Mohamad Aizi; Kasim, Shahreen; Alanda, Alde
JOIV : International Journal on Informatics Visualization Vol 9, No 5 (2025)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.9.5.4627

Abstract

Malicious URLs pose a significant cybersecurity threat, often leading to phishing attacks, malware infections, and data breaches. Early detection of these URLs is crucial for preventing security vulnerabilities and mitigating potential losses. In this paper, we propose a novel approach for malicious URL detection by combining ensemble learning methods with ADASYN-based oversampling to address the class imbalance typically found in malicious URL datasets. We evaluated three popular machine learning classifiers, including XGBoost, Random Forest, and Decision Tree, and incorporated ADASYN (Adaptive Synthetic Sampling) to handle the class-imbalanced nature of our selected dataset. Our detailed experiments demonstrate that the application of ADASYN can significantly increase the performance of the predictive model across all metrics. For instance, XGBoost saw a 2.2% improvement in accuracy, Random Forest achieved a 1.0% improvement in recall, and Decision Tree displayed a 3.0% improvement in F1-score. The Decision Tree model, in particular, showed the most substantial improvements, particularly in recall and F1-score, indicating better detection of malicious URLs. Finally, our findings in this research highlighted the potential of ensemble learning, enhanced by ADASYN, for improving malicious URL detection and demonstrated its applicability in real-world cybersecurity applications.
The Effects of Imbalanced Datasets on Machine Learning Algorithms in Predicting Student Performance Sujon, Khaled Mahmud; Hassan, Rohayanti; Khairudin, Alif Ridzuan; Moi, Sim Hiew; Mohd Shafie, Muhammad Luqman; Saringat, Zainuri; Erianda, Aldo
JOIV : International Journal on Informatics Visualization Vol 8, No 3-2 (2024): IT for Global Goals: Building a Sustainable Tomorrow
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.8.3-2.2449

Abstract

Predictive analytics technologies are becoming increasingly popular in higher education institutions. Students' grades are one of the most critical performance indicators educators can use to predict their academic achievement. Academics have developed numerous techniques and machine-learning approaches for predicting student grades over the last several decades. Although much work has been done, a practical model is still lacking, mainly when dealing with imbalanced datasets. This study examines the impact of imbalanced datasets on machine learning models' accuracy and reliability in predicting student performance. This study compares the performance of two popular machine learning algorithms, Logistic Regression and Random Forest, in predicting student grades. Secondly, the study examines the impact of imbalanced datasets on these algorithms' performance metrics and generalization capabilities. Results indicate that the Random Forest (RF) algorithm, with an accuracy of 98%, outperforms Logistic Regression (LR), which achieved 91% accuracy. Furthermore, the performance of both models is significantly impacted by imbalanced datasets. In particular, LR struggles to accurately predict minor classes, while RF also faces difficulties, though to a lesser extent. Addressing class imbalance is crucial, notably affecting model bias and prediction accuracy. This is especially important for higher education institutes aiming to enhance the accuracy of student grade predictions, emphasizing the need for balanced datasets to achieve robust predictive models.
Exploring Current Methods and Trends in Text Summarization: A Systematic Mapping Study Ahmad Raddi, Muhammad Faris Faisal; Hassan, Rohayanti; Zakaria, Noor Hidayah; Sahid, Mohd Zanes; Omar, Nurul Aswa; Firosha, Ardian
JOIV : International Journal on Informatics Visualization Vol 8, No 3-2 (2024): IT for Global Goals: Building a Sustainable Tomorrow
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.8.3-2.1654

Abstract

This paper presents a systematic mapping study of the current methods and trends in text summarization, a challenging task in natural language processing that aims to condense information from one or multiple documents into a concise and coherent summary. The paper focuses on applying text summarization for the Malay language, which has received less attention than other languages. The paper employs a three-phased quality assessment procedure to filter and analyze 27 peer-reviewed publications from seven prominent digital libraries, covering 2016 to 2024. The paper addresses two research questions: (1) What is the extent of research on text summarization, especially for the Malay language and the education domain? and (2) What are the current methods and approaches employed in text summarization, with a focus on addressing specific problems and language contexts? The paper synthesizes and discusses the findings from the literature review and provides insights and recommendations for future research directions in text summarization. The paper contributes to advancing knowledge and understanding of the state-of-the-art techniques and challenges in text summarization, particularly for the Malay language.
Entropy Based Method for Malicious File Detection Edzuan Zainodin, Muhammad; Zakaria, Zalmiyah; Hassan, Rohayanti; Abdullah, Zubaile
JOIV : International Journal on Informatics Visualization Vol 6, No 4 (2022)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.6.4.1265

Abstract

Ransomware is by no means a recent invention, having existed as far back as 1989, yet it still poses a real threat in the 21st century. Given the increasing number of computer users in recent years, this threat will only continue to grow, affecting more victims as well as increasing the losses incurred towards the people and organizations impacted in a successful attack. In most cases, the only remaining courses of action open to victims of such attacks were the following: either pay the ransom or lose their data. One commonly shared behavior by all crypto ransomware strains is that there will be attempts to encrypt the victims’ files at a certain point during the ransomware execution. This paper demonstrates a technique that can identify when these encrypted files are being generated and is independent of the strain of the ransomware. Previous research has highlighted the difficulty in differentiating between compressed and encrypted files using Shannon entropy, as both file types exhibit similar values. Among the experiments described in this study, one showed a unique characteristic for the Shannon entropy of encrypted file header fragments, which was used to differentiate between encrypted files and other high entropy files such as archives. The Shannon entropy of encrypted file header fragments has a unique characteristic in one of the tests discussed in this study. This property was used to distinguish encrypted files from other files with high entropy, such as archives. To overcome this drawback, this study proposed an approach for test case generation by enhancing the entropy-based threat tree model, which would improve malicious file identification. The file identification was enhanced by combining three entropy algorithms, and the test case was generated based on the threat tree model. This approach was then evaluated using accuracy measurements: True Positive, True Negative, False Positive, False Negative. A promising result is expected. This method solves the challenge of leveraging file entropy to distinguish compressed and archived files from ransomware-encrypted files in a timely manner.
Transformer in mRNA Degradation Prediction Yit, Tan Wen; Hassan, Rohayanti; Zakaria, Noor Hidayah; Kasim, Shahreen; Moi, Sim Hiew; Khairuddin, Alif Ridzuan; Amnur, Hidra
JOIV : International Journal on Informatics Visualization Vol 7, No 2 (2023)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.7.2.1165

Abstract

The unstable properties and the advantages of the mRNA vaccine have encouraged many experts worldwide in tackling the degradation problem. Machine learning models have been highly implemented in bioinformatics and the healthcare fieldstone insights from biological data. Thus, machine learning plays an important role in predicting the degradation rate of mRNA vaccine candidates. Stanford University has held an OpenVaccine Challenge competition on Kaggle to gather top solutions in solving the mentioned problems, and a multi-column root means square error (MCRMSE) has been used as a main performance metric. The Nucleic Transformer has been proposed by different researchers as a deep learning solution that is able to utilize a self-attention mechanism and Convolutional Neural Network (CNN). Hence, this paper would like to enhance the existing Nucleic Transformer performance by utilizing the AdaBelief or RangerAdaBelief optimizer with a proposed decoder that consists of a normalization layer between two linear layers. Based on the experimental result, the performance of the enhanced Nucleic Transformer outperforms the existing solution. In this study, the AdaBelief optimizer performs better than the RangerAdaBelief optimizer, even though it possesses Ranger’s advantages. The advantages of the proposed decoder can only be shown when there is limited data. When the data is sufficient, the performance might be similar but still better than the linear decoder if and only if the AdaBelief optimizer is used. As a result, the combination of the AdaBelief optimizer with the proposed decoder performs the best with 2.79% and 1.38% performance boost in public and private MCRMSE, respectively.
Cardio-Respiratory Motion Prediction Analysis: A Systematic Mapping Study Mohd Fuaad, Nur Atiqah; Hassan, Rohayanti; Ahmad, Johanna; Kasim, Shahreen; Erianda, Aldo
JOIV : International Journal on Informatics Visualization Vol 9, No 6 (2025)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.9.6.4814

Abstract

Cardio-respiratory motion prediction analysis is a crucial medical application for enhancing the precision and effectiveness of medical imaging and patient diagnosis, particularly in the cardiac and respiratory context. This systematic mapping study reviews 23 selected research papers to provide a comprehensive overview of emerging trends and future directions in the field, which also highlights challenges and limitations frequently encountered in cardio-respiratory motion prediction and identifies key machine learning, deep learning, and computational paradigm methodologies examining their application frequencies. In addition, the study analyses the number of performance metrics used alongside validation techniques, which are essential for assessing the accuracy and reliability of the predictive models. Furthermore, it explores the most utilized data types and imaging modalities in this domain, such as X-ray, CT, MRI, and ultrasound, discussing their respective advantages and limitations. Ethical considerations, including patient privacy, data security, informed consent, and the potential for bias, are also addressed. This study aims to deepen the understanding of the landscape of cardio-respiratory motion prediction, guiding future research and the development of more effective, reliable predictive models to enhance medical imaging and patient care, providing valuable insights for researchers, practitioners, and technologists in the field.
An Improved Approach of Iris Biometric Authentication Performance and Security with Cryptography and Error Correction Codes Moi, Sim Hiew; Yong, Pang Yee; Hassan, Rohayanti; Asmuni, Hishammuddin; Mohamad, Radziah; Weng, Fong Cheng; Kasim, Shahreen
JOIV : International Journal on Informatics Visualization Vol 6, No 2-2 (2022): A New Frontier in Informatics
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.6.2-2.1091

Abstract

One of the most challenging parts of integrating biometrics and cryptography is the intra variation in acquired identifiers between the same users. Due to noise in the environment or different devices, features of the iris may differ when it is acquired at different time periods. This research focuses on improving the performance of iris biometric authentication and encrypting the binary code generated from the acquired identifiers. The proposed biometric authentication system incorporates the concepts of non-repudiation and privacy. These concepts are critical to the success of a biometric authentication system. Iris was chosen as the biometric identifier due to its characteristics of high accuracy and the permanent presence throughout an individual’s lifetime. This study seeks to find a method of reducing the noise and error associated with the nature of dissimilarity acquired by each biometric acquisition.  In our method, we used Reed Solomon error-correction codes to reduce dissimilarities and noise in iris data. The code is a block-based error correcting code that can be easily decoded and has excellent burst correction capabilities. Two different distance metric measurement functions were used to measure the accuracy of the iris pattern matching identification process, which are Hamming distance and weighted Euclidean distance. The experiments were conducted with the CASIA 1.0 iris database. The results showed that the False Acceptance Rate is 0%, the False Rejection Rate is 1.54%, and the Total Success Rate is 98.46%. The proposed approach appears to be more secure, as it is able to provide a low rate of false rejections and false acceptances.
Investigation on Java Mutation Testing Tools Abbas, Sara Tarek ElSayed; Hassan, Rohayanti; Halim, Shahliza Abd; Kasim, Shahreen; Ramlan, Rohaizan
JOIV : International Journal on Informatics Visualization Vol 6, No 2-2 (2022): A New Frontier in Informatics
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.6.2-2.1090

Abstract

Software Testing is one of the most significant phases within the software development life cycle since software bugs can be costly and traumatic. However, the traditional software testing process is not enough on its own as some undiscovered faults might still exist due to the test cases’ inability to detect all underlying faults. Amidst the various proposed techniques of test suites’ efficiency detection comes mutation testing, one of the most effective approaches as declared by many researchers. Nevertheless, there is not enough research on how well the mutation testing tools adhere to the theory of mutation or how well their mutation operators are performing the tasks they were developed for. This research paper presents an investigative study on two different mutation testing tools for Java programming language, namely PIT and µJava. The study aims to point out the weaknesses and strengths of each tool involved through performing mutation testing on four different open-source Java programs to identify the best mutation tool among them. The study aims to further identify and compare the mutation operators of each tool by calculating the mutation score. That is, the operators’ performance is evaluated with the mutation score, with the presumption that the more prominent the number of killed mutants is, the higher the mutation score, thus the more effective the mutation operator and the affiliated tool. 
A Microarray Data Pre-processing Method for Cancer Classification Hui, Tay Xin; Kasim, Shahreen; Md Fudzee, Mohd Farhan; Abdullah, Zubaile; Hassan, Rohayanti; Erianda, Aldo
JOIV : International Journal on Informatics Visualization Vol 6, No 4 (2022)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.6.4.1523

Abstract

The development of microarray technology has led to significant improvements and research in various fields. With the help of machine learning techniques and statistical methods, it is now possible to organize, analyze, and interpret large amounts of biological data to uncover significant patterns of interest. The exploitation of microarray data is of great challenge for many researchers. Raw gene expression data are usually vulnerable to missing values, noisy data, incomplete data, and inconsistent data. Hence, processing data before being applied for cancer classification is important. In order to extract the biological significance of microarray gene expression data, data pre-processing is a necessary step to obtain valuable information for further analysis and address important hypotheses. This study presents a detailed description of pre-processing data method for cancer classification. The proposed method consists of three phases: data cleaning, transformation, and filtering. The combination of GenePattern software tool and Rstudio was utilized to implement the proposed data pre-processing method. The proposed method was applied to six gene expression datasets: lung cancer dataset, stomach cancer dataset, liver cancer dataset, kidney cancer dataset, thyroid cancer dataset, and breast cancer dataset to demonstrate the feasibility of the proposed method for cancer classification. A comparison has been made to illustrate the differences between the dataset before and after data pre-processing.