Claim Missing Document
Check
Articles

Found 3 Documents
Search
Journal : Journal of Applied Data Sciences

Early Detection of Female Type-2 Diabetes using Machine Learning and Oversampling Techniques Al-Dabbas, Lana; Abu-Shareha, Ahmad Adel
Journal of Applied Data Sciences Vol 5, No 3: SEPTEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i3.298

Abstract

Early diabetes prediction is crucial as it can save numerous lives and prevent diabetes-related complications. The experiments conducted on diabetes prediction are keen on the limited samples of diabetes and non-diabetes cases provided in the available dataset. Various techniques have been implemented, focusing on the classification technique to improve the accuracy of prediction results. As a significant technique, oversampling has been implemented using SMOTE, which improved the results yet posed limitations due to its naïve technique. In this paper, a framework for diabetes prediction is developed, integrating an advanced oversampling technique using SVMSMOTE with various machine-learning algorithms to achieve the best performance. The proposed framework aims to overcome the problem of inaccurate data and limited samples using preprocessing and oversampling techniques. Besides, these techniques are integrated with other data mining and machine learning algorithms to improve the performance of diabetes prediction. The framework consists of four main stages: data exploration, data preprocessing, data oversampling, and classification. The experiments were conducted on the Pima Indian diabetes dataset, which comprises 768 samples and 9 columns. The results showed that the proposed framework achieved an accuracy of 91%, which improved the accuracy compared to using classification without oversampling, which achieved an accuracy of 90%. In comparison, the best results addressed in the literature were an accuracy of 85.5%. As such, the proposed framework improves the results by approximately 6.4% compared to the existing frameworks. Besides, the proposed framework achieved the best f-measure using the XGBoost classifier and SVMSMOTE, equal to 0.879. The best recall was achieved using RF and SVMSMOTE, which was 0.931. Finally, the best precision was achieved using FR without oversampling, with a value of 0.918.
ARP Spoofing Attack Detection Model in IoT Network using Machine Learning: Complexity vs. Accuracy Alsaaidah, Adeeb; Almomani, Omar; Abu-Shareha, Ahmad Adel; Abualhaj, Mosleh M; Achuthan, Anusha
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.374

Abstract

Spoofing attacks targeting the address resolution protocol, or the so-called ARP, are common cyber-attacks in IoT environments. In such an attack, the attacker sends a fake message over a local area network to spoof the users and interfere with the communication transferred from and into these users. As such, to detect such attacks, there is a need to check the network gateways and routers continuously to capture and analyze the transmitted traffic. However, there are three major problems with such traffic data: 1) there are substantial irrelevant data to the ARP attacks, 2) there are massive patterns in the way by which the spoof can be implemented, and 3) there is a need for fast processing of such data to reduce any delay resulting from the processing stage. Accordingly, this paper proposes a detection approach using supervised machine learning algorithms. The focus of this paper is to show the tradeoff between speed and accuracy to offer various solutions based on the demanded quality. Various algorithms were tested to find a solution that balanced time requirements and accuracy. As such, the results using all features and with various feature selection techniques were reported. Besides, the results using simple classifiers and ensemble learning algorithms were also reported. The proposed approach is evaluated on an IoT network intrusion dataset (IoTID20) collected from different IoT devices. The results showed that the highest accuracy is obtained using the RF classifier with a subset of features produced by the wrapper technique. In such a case, the accuracy obtained was 99.74%, with running time equal to 305 milliseconds. However, If time is more critical for a given application, then DT can be used with the whole feature set. In such a case, the accuracy was 99.41%, with running time equal to 11  milliseconds.
A Framework for Diabetes Detection Using Machine Learning and Data Preprocessing Abu-Shareha, Ahmad Adel; Qutaishat, Haneen; Al-Khayat, Asma
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.363

Abstract

People with diabetes are at an increased risk of developing other complications, such as heart disease and nerve damage. Therefore, diabetes prediction is crucial to reduce the severe consequences of this disease. This study proposed a comprehensive framework for diabetes prediction to maximize the information from available diabetes datasets, which include historical records, laboratory tests, and demographic data. The proposed framework implements a data imputation technique for filling in missing values and adopts feature selection methods to remove less important features for better diabetes classification. An oversampling technique and a parameter tuning approach were used to increase the samples and fine-tune the parameters for training the machine learning algorithms. Various machine learning algorithms, including Neural Networks, Logistic Regression, Support Vector Machines, and Random Forest, were used for the prediction. These algorithms were evaluated using both train-test split and cross-validation techniques. The experiments were conducted on the Pima Indian Diabetes dataset using various evaluation metrics, including accuracy, precision, recall, and F-measure. The results showed that the Random Forest algorithm, particularly when fine-tuned with Grid Search Cross Validation, outperformed other algorithms, achieving an impressive accuracy of 0.99. This demonstrates the robustness and effectiveness of the proposed framework, which outperformed the accuracy of state-of-the-art approaches.