Rohayanti Hassan, Rohayanti
Unknown Affiliation

Published : 11 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 11 Documents
Search

Automatic construction of generic stop words list for hausa text Bichi, Abdulkadir Abubakar; Samsudin, Ruhaidah; Hassan, Rohayanti
Indonesian Journal of Electrical Engineering and Computer Science Vol 25, No 3: March 2022
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v25.i3.pp1501-1507

Abstract

Stop-words are words having the highest frequencies in a document without any significant information. They are characterized by having common relations within a cluster. They are the noise of the text that are evenly distributed over a document. Removal of stop words improve the performance and accuracy of information retrieval algorithms and machine learning at large. It saves the storage space by reducing the vector space dimension, and helps in effective documents indexing. This research generated a list of Hausa stop words automatically using aggregated method by combining frequency and statistics methods. The experiments are conducted using a primarily collected Hausa corpus consisting of 841 Hausa news articles of size 646862 words and finally a list of distinct 81 Hausa stop words is generated.
Real-time smart driver sleepiness detection by eye aspect ratio using computer vision Kai Yuen, Simon Chong; Zakaria, Noor Hidayah Binti; Eg Su, Goh; Hassan, Rohayanti; Kasim, Shahreen; Sutikno, Tole

Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v34.i1.pp677-686

Abstract

The purpose of this study is to determine the optimal eye aspect ratio (EAR) for a prototype capable of using computer vision techniques to detect driver sleepiness based on eyelid size changes. The prototype, developed with Raspberry Pi and OpenCV, provides a real-time evaluation of the driver's level of alertness. The prototype can accurately determine the onset of sleepiness by monitoring and detecting instances of prolonged eyelid closure. Due to the fact that the eye aspect ratios of different individuals vary in size, the system's accuracy may be compromised. For the first experiment, the research focuses on determining the optimal EAR threshold of the proposed prototype using a sample of 20 participants ranging in age from 20 to 30, 31 to 40, and 41 to 50 years old. The study also examines the effects of various environmental conditions, such as dark or nighttime settings and the use of spectacle. The optimal EAR threshold value, as dedicated by the first experiment, is 0.225 after testing 20 participants with and without eyeglasses in low and bright lighting and 7 participants with a 0.225 EAR threshold in dark and bright lighting environments. The result shows 100% precision.
A Hybrid Approach for Malicious URL Detection Using Ensemble Models and Adaptive Synthetic Sampling Sujon, Khaled Mahmud; Hassan, Rohayanti; Zainodin, Muhammad Edzuan; Salamat, Mohamad Aizi; Kasim, Shahreen; Alanda, Alde
JOIV : International Journal on Informatics Visualization Vol 9, No 5 (2025)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.9.5.4627

Abstract

Malicious URLs pose a significant cybersecurity threat, often leading to phishing attacks, malware infections, and data breaches. Early detection of these URLs is crucial for preventing security vulnerabilities and mitigating potential losses. In this paper, we propose a novel approach for malicious URL detection by combining ensemble learning methods with ADASYN-based oversampling to address the class imbalance typically found in malicious URL datasets. We evaluated three popular machine learning classifiers, including XGBoost, Random Forest, and Decision Tree, and incorporated ADASYN (Adaptive Synthetic Sampling) to handle the class-imbalanced nature of our selected dataset. Our detailed experiments demonstrate that the application of ADASYN can significantly increase the performance of the predictive model across all metrics. For instance, XGBoost saw a 2.2% improvement in accuracy, Random Forest achieved a 1.0% improvement in recall, and Decision Tree displayed a 3.0% improvement in F1-score. The Decision Tree model, in particular, showed the most substantial improvements, particularly in recall and F1-score, indicating better detection of malicious URLs. Finally, our findings in this research highlighted the potential of ensemble learning, enhanced by ADASYN, for improving malicious URL detection and demonstrated its applicability in real-world cybersecurity applications.
The Effects of Imbalanced Datasets on Machine Learning Algorithms in Predicting Student Performance Sujon, Khaled Mahmud; Hassan, Rohayanti; Khairudin, Alif Ridzuan; Moi, Sim Hiew; Mohd Shafie, Muhammad Luqman; Saringat, Zainuri; Erianda, Aldo
JOIV : International Journal on Informatics Visualization Vol 8, No 3-2 (2024): IT for Global Goals: Building a Sustainable Tomorrow
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.8.3-2.2449

Abstract

Predictive analytics technologies are becoming increasingly popular in higher education institutions. Students' grades are one of the most critical performance indicators educators can use to predict their academic achievement. Academics have developed numerous techniques and machine-learning approaches for predicting student grades over the last several decades. Although much work has been done, a practical model is still lacking, mainly when dealing with imbalanced datasets. This study examines the impact of imbalanced datasets on machine learning models' accuracy and reliability in predicting student performance. This study compares the performance of two popular machine learning algorithms, Logistic Regression and Random Forest, in predicting student grades. Secondly, the study examines the impact of imbalanced datasets on these algorithms' performance metrics and generalization capabilities. Results indicate that the Random Forest (RF) algorithm, with an accuracy of 98%, outperforms Logistic Regression (LR), which achieved 91% accuracy. Furthermore, the performance of both models is significantly impacted by imbalanced datasets. In particular, LR struggles to accurately predict minor classes, while RF also faces difficulties, though to a lesser extent. Addressing class imbalance is crucial, notably affecting model bias and prediction accuracy. This is especially important for higher education institutes aiming to enhance the accuracy of student grade predictions, emphasizing the need for balanced datasets to achieve robust predictive models.
Exploring Current Methods and Trends in Text Summarization: A Systematic Mapping Study Ahmad Raddi, Muhammad Faris Faisal; Hassan, Rohayanti; Zakaria, Noor Hidayah; Sahid, Mohd Zanes; Omar, Nurul Aswa; Firosha, Ardian
JOIV : International Journal on Informatics Visualization Vol 8, No 3-2 (2024): IT for Global Goals: Building a Sustainable Tomorrow
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.8.3-2.1654

Abstract

This paper presents a systematic mapping study of the current methods and trends in text summarization, a challenging task in natural language processing that aims to condense information from one or multiple documents into a concise and coherent summary. The paper focuses on applying text summarization for the Malay language, which has received less attention than other languages. The paper employs a three-phased quality assessment procedure to filter and analyze 27 peer-reviewed publications from seven prominent digital libraries, covering 2016 to 2024. The paper addresses two research questions: (1) What is the extent of research on text summarization, especially for the Malay language and the education domain? and (2) What are the current methods and approaches employed in text summarization, with a focus on addressing specific problems and language contexts? The paper synthesizes and discusses the findings from the literature review and provides insights and recommendations for future research directions in text summarization. The paper contributes to advancing knowledge and understanding of the state-of-the-art techniques and challenges in text summarization, particularly for the Malay language.
Entropy Based Method for Malicious File Detection Edzuan Zainodin, Muhammad; Zakaria, Zalmiyah; Hassan, Rohayanti; Abdullah, Zubaile
JOIV : International Journal on Informatics Visualization Vol 6, No 4 (2022)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.6.4.1265

Abstract

Ransomware is by no means a recent invention, having existed as far back as 1989, yet it still poses a real threat in the 21st century. Given the increasing number of computer users in recent years, this threat will only continue to grow, affecting more victims as well as increasing the losses incurred towards the people and organizations impacted in a successful attack. In most cases, the only remaining courses of action open to victims of such attacks were the following: either pay the ransom or lose their data. One commonly shared behavior by all crypto ransomware strains is that there will be attempts to encrypt the victims’ files at a certain point during the ransomware execution. This paper demonstrates a technique that can identify when these encrypted files are being generated and is independent of the strain of the ransomware. Previous research has highlighted the difficulty in differentiating between compressed and encrypted files using Shannon entropy, as both file types exhibit similar values. Among the experiments described in this study, one showed a unique characteristic for the Shannon entropy of encrypted file header fragments, which was used to differentiate between encrypted files and other high entropy files such as archives. The Shannon entropy of encrypted file header fragments has a unique characteristic in one of the tests discussed in this study. This property was used to distinguish encrypted files from other files with high entropy, such as archives. To overcome this drawback, this study proposed an approach for test case generation by enhancing the entropy-based threat tree model, which would improve malicious file identification. The file identification was enhanced by combining three entropy algorithms, and the test case was generated based on the threat tree model. This approach was then evaluated using accuracy measurements: True Positive, True Negative, False Positive, False Negative. A promising result is expected. This method solves the challenge of leveraging file entropy to distinguish compressed and archived files from ransomware-encrypted files in a timely manner.
Transformer in mRNA Degradation Prediction Yit, Tan Wen; Hassan, Rohayanti; Zakaria, Noor Hidayah; Kasim, Shahreen; Moi, Sim Hiew; Khairuddin, Alif Ridzuan; Amnur, Hidra
JOIV : International Journal on Informatics Visualization Vol 7, No 2 (2023)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.7.2.1165

Abstract

The unstable properties and the advantages of the mRNA vaccine have encouraged many experts worldwide in tackling the degradation problem. Machine learning models have been highly implemented in bioinformatics and the healthcare fieldstone insights from biological data. Thus, machine learning plays an important role in predicting the degradation rate of mRNA vaccine candidates. Stanford University has held an OpenVaccine Challenge competition on Kaggle to gather top solutions in solving the mentioned problems, and a multi-column root means square error (MCRMSE) has been used as a main performance metric. The Nucleic Transformer has been proposed by different researchers as a deep learning solution that is able to utilize a self-attention mechanism and Convolutional Neural Network (CNN). Hence, this paper would like to enhance the existing Nucleic Transformer performance by utilizing the AdaBelief or RangerAdaBelief optimizer with a proposed decoder that consists of a normalization layer between two linear layers. Based on the experimental result, the performance of the enhanced Nucleic Transformer outperforms the existing solution. In this study, the AdaBelief optimizer performs better than the RangerAdaBelief optimizer, even though it possesses Ranger’s advantages. The advantages of the proposed decoder can only be shown when there is limited data. When the data is sufficient, the performance might be similar but still better than the linear decoder if and only if the AdaBelief optimizer is used. As a result, the combination of the AdaBelief optimizer with the proposed decoder performs the best with 2.79% and 1.38% performance boost in public and private MCRMSE, respectively.
Cardio-Respiratory Motion Prediction Analysis: A Systematic Mapping Study Mohd Fuaad, Nur Atiqah; Hassan, Rohayanti; Ahmad, Johanna; Kasim, Shahreen; Erianda, Aldo
JOIV : International Journal on Informatics Visualization Vol 9, No 6 (2025)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.9.6.4814

Abstract

Cardio-respiratory motion prediction analysis is a crucial medical application for enhancing the precision and effectiveness of medical imaging and patient diagnosis, particularly in the cardiac and respiratory context. This systematic mapping study reviews 23 selected research papers to provide a comprehensive overview of emerging trends and future directions in the field, which also highlights challenges and limitations frequently encountered in cardio-respiratory motion prediction and identifies key machine learning, deep learning, and computational paradigm methodologies examining their application frequencies. In addition, the study analyses the number of performance metrics used alongside validation techniques, which are essential for assessing the accuracy and reliability of the predictive models. Furthermore, it explores the most utilized data types and imaging modalities in this domain, such as X-ray, CT, MRI, and ultrasound, discussing their respective advantages and limitations. Ethical considerations, including patient privacy, data security, informed consent, and the potential for bias, are also addressed. This study aims to deepen the understanding of the landscape of cardio-respiratory motion prediction, guiding future research and the development of more effective, reliable predictive models to enhance medical imaging and patient care, providing valuable insights for researchers, practitioners, and technologists in the field.
An Improved Approach of Iris Biometric Authentication Performance and Security with Cryptography and Error Correction Codes Moi, Sim Hiew; Yong, Pang Yee; Hassan, Rohayanti; Asmuni, Hishammuddin; Mohamad, Radziah; Weng, Fong Cheng; Kasim, Shahreen
JOIV : International Journal on Informatics Visualization Vol 6, No 2-2 (2022): A New Frontier in Informatics
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.6.2-2.1091

Abstract

One of the most challenging parts of integrating biometrics and cryptography is the intra variation in acquired identifiers between the same users. Due to noise in the environment or different devices, features of the iris may differ when it is acquired at different time periods. This research focuses on improving the performance of iris biometric authentication and encrypting the binary code generated from the acquired identifiers. The proposed biometric authentication system incorporates the concepts of non-repudiation and privacy. These concepts are critical to the success of a biometric authentication system. Iris was chosen as the biometric identifier due to its characteristics of high accuracy and the permanent presence throughout an individual’s lifetime. This study seeks to find a method of reducing the noise and error associated with the nature of dissimilarity acquired by each biometric acquisition.  In our method, we used Reed Solomon error-correction codes to reduce dissimilarities and noise in iris data. The code is a block-based error correcting code that can be easily decoded and has excellent burst correction capabilities. Two different distance metric measurement functions were used to measure the accuracy of the iris pattern matching identification process, which are Hamming distance and weighted Euclidean distance. The experiments were conducted with the CASIA 1.0 iris database. The results showed that the False Acceptance Rate is 0%, the False Rejection Rate is 1.54%, and the Total Success Rate is 98.46%. The proposed approach appears to be more secure, as it is able to provide a low rate of false rejections and false acceptances.
Investigation on Java Mutation Testing Tools Abbas, Sara Tarek ElSayed; Hassan, Rohayanti; Halim, Shahliza Abd; Kasim, Shahreen; Ramlan, Rohaizan
JOIV : International Journal on Informatics Visualization Vol 6, No 2-2 (2022): A New Frontier in Informatics
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.6.2-2.1090

Abstract

Software Testing is one of the most significant phases within the software development life cycle since software bugs can be costly and traumatic. However, the traditional software testing process is not enough on its own as some undiscovered faults might still exist due to the test cases’ inability to detect all underlying faults. Amidst the various proposed techniques of test suites’ efficiency detection comes mutation testing, one of the most effective approaches as declared by many researchers. Nevertheless, there is not enough research on how well the mutation testing tools adhere to the theory of mutation or how well their mutation operators are performing the tasks they were developed for. This research paper presents an investigative study on two different mutation testing tools for Java programming language, namely PIT and µJava. The study aims to point out the weaknesses and strengths of each tool involved through performing mutation testing on four different open-source Java programs to identify the best mutation tool among them. The study aims to further identify and compare the mutation operators of each tool by calculating the mutation score. That is, the operators’ performance is evaluated with the mutation score, with the presumption that the more prominent the number of killed mutants is, the higher the mutation score, thus the more effective the mutation operator and the affiliated tool.Â