OYELAKIN, Akinyemi
Unknown Affiliation

Published : 3 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 3 Documents
Search

Tree-based Machine Learning Ensembles and Feature Importance Approach for the Identification of Intrusions in UNR-IDD Dataset OYELAKIN, Akinyemi
INDONESIAN JOURNAL ON DATA SCIENCE Vol 2 No 1 (2024): Indonesian Journal on Data Science
Publisher : Lembaga Penelitian dan Pengabdian Kepada Masyarakat Universitas Achmad Yani Yogyakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30989/ijds.v2i1.1302

Abstract

Detection of intrusions from network data with the use of machine learning techniques has gained great attention in the past decades. One of the key problems in the network security domain is the availability of representative datasets for testing and evaluation purposes. Despite several efforts by researchers to release datasets that can be used for benchmarking attack detection models, some of the released datasets still suffer from one limitation or the other. Thus, some researchers at the University of Nevada released a dataset named UNR-IDD dataset which was argued to be free from some of the limitations of the past datasets. This study proposed Tree-based ensemble approaches for building binary intrusion identification models from the UNR-IDD dataset. Decision Tree algorithms are used as base classifiers in the Extra Trees, Random Forest and AdaBoost-based intrusion detection models. The results of the experimental analyses carried out indicated that the three ensembles performed excellently when feature selection was used compared to when all features were applied. For instance, Extra Trees model achieved an accuracy of 0.97, precision of 0.98, recall of 0.98 and f1-score of 0.98. Similarly, Random Forest model achieved an accuracy of 0.98, precision of 0.98, recall of 0.99 and f1-score of 0.98. Adaboost-based model had an accuracy of 0.96, precision of 0.96, recall of 0.99 and f1-score of 0.98. It was deduced that Random Forest intrusion classification model achieved slight overall best results when compared to the other models built. It is concluded that the three homogeneous ensemble models achieved very promising results while feature importance was used as attribute selection method.
Tree-based Machine Learning Ensembles and Feature Importance Approach for the Identification of Intrusions in UNR-IDD Dataset OYELAKIN, Akinyemi
INDONESIAN JOURNAL ON DATA SCIENCE Vol. 2 No. 1 (2024): Indonesian Journal on Data Science
Publisher : Lembaga Penelitian dan Pengabdian Kepada Masyarakat Universitas Achmad Yani Yogyakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30989/ijds.v2i1.1302

Abstract

Detection of intrusions from network data with the use of machine learning techniques has gained great attention in the past decades. One of the key problems in the network security domain is the availability of representative datasets for testing and evaluation purposes. Despite several efforts by researchers to release datasets that can be used for benchmarking attack detection models, some of the released datasets still suffer from one limitation or the other. Thus, some researchers at the University of Nevada released a dataset named UNR-IDD dataset which was argued to be free from some of the limitations of the past datasets. This study proposed Tree-based ensemble approaches for building binary intrusion identification models from the UNR-IDD dataset. Decision Tree algorithms are used as base classifiers in the Extra Trees, Random Forest and AdaBoost-based intrusion detection models. The results of the experimental analyses carried out indicated that the three ensembles performed excellently when feature selection was used compared to when all features were applied. For instance, Extra Trees model achieved an accuracy of 0.97, precision of 0.98, recall of 0.98 and f1-score of 0.98. Similarly, Random Forest model achieved an accuracy of 0.98, precision of 0.98, recall of 0.99 and f1-score of 0.98. Adaboost-based model had an accuracy of 0.96, precision of 0.96, recall of 0.99 and f1-score of 0.98. It was deduced that Random Forest intrusion classification model achieved slight overall best results when compared to the other models built. It is concluded that the three homogeneous ensemble models achieved very promising results while feature importance was used as attribute selection method.
ON THE COMPREHENSIVE ANALYSES OF CTU-13 BOTNET DATASET FOR CYBER SECURITY RESEARCHES Gbenga, Jimoh Rasheed; OYELAKIN, Akinyemi
INDONESIAN JOURNAL ON DATA SCIENCE Vol. 3 No. 1 (2025): Indonesian Journal On Data Science
Publisher : Lembaga Penelitian dan Pengabdian Kepada Masyarakat Universitas Achmad Yani Yogyakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30989/ijds.v3i1.1554

Abstract

Attackers use malware to launch attacks in the internet and corporate networks. Over the years, machine learning techniques have been found promising for the classification of these attacks because they have the ability to identify unknown threats.  Botnets are networks of compromised devices and have been found to be powerful threat vectors that are used against modern systems because they use command and control (C2) characteristics which make their detection very difficult.  Generally, to build attack detection models, intrusion datasets are employed. Comprehensive study of the benchmarking datasets used in intrusion detection researches can provide different actionable insights to other researchers. There have been studies that investigated the analyses of datasets for building intrusion detection systems. However, there has been less focus on the analysis of intrusion detection datasets that are used specifically for botnets detection. This study reported an overview of a popular botnet dataset named CTU-13. Thereafter, the work carried out detailed exploratory analysis of the dataset. The study equally sought to identify if the dataset is representative enough for Machine Learning based botnet detection studies. All the thirteen scenarios in the dataset were used for the experimentations. The exploratory analyses were carried out on each of the thirteen scenarios of the dataset with a view to gaining better understanding of the patterns and characteristics of data in each of them. The information obtained from the overview and exploratory analyses provided actionable insights on how to better use the datasets for improved botnet classification. The challenges of using the captures of the dataset were also identified. In particular, the exploratory investigation of the thirteen captures of the CTU-13 dataset revealed that it has very complex patterns, contain mixed data types and suffers from high class imbalance problem. The results of the exploratory analyses can guide the decision of future cyber security researches. Thus, improved machine learning-based botnet detection models can be built by attending to the issues in the dataset.