Claim Missing Document
Check
Articles

Found 5 Documents
Search

How Effective are Different Machine Learning Algorithms in Predicting Legal Outcomes in South Africa? Khosa, Joe; Mashao, Daniel; Olanipekun, Ayorinde; Harley, Charis
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.215

Abstract

This study examines the effectiveness of different machine learning algorithms in predicting legal outcomes in South Africa's Judiciary system. Considering the advancement of artificial intelligence in the legal sector, this research aims to assess the effectiveness of various machine learning algorithms within the legal domain. Text classification is done using machine learning algorithms, including Logistic Regression, Random Forest, and K-Nearest Neighbours, with datasets obtained from a state legal firm in South Africa. The datasets undergo diligent data cleansing and pre-processing methods, encompassing tokenization and lemmatization techniques. This study evaluates these models' applications through accuracy metrics. The findings demonstrate that the Logistic Regression model attained an accuracy rate of 75.05%, whereas the Random Forest algorithm achieved an accuracy rate of 75.08%. On the other hand, the K-Nearest Neighbours algorithm exhibited no optimal performance, as evidenced by its accuracy rate of 62.76%. This study provides valuable insights for legal professionals by addressing a specific research question about the successful application of machine learning in South Africa's legal sector. The results indicate the possibility of using machine learning to predict the outcomes of criminal legal cases. Additionally, this study highlights the significance of responsibly and ethically implementing machine learning within the legal field. The results of this study enhance our comprehension of the prediction of legal outcomes, establishing a foundation for future investigations in this dynamic area of study. A limitation of this study is that the data was obtained from a single law firm in South Africa.
An Improved Prediction of Transparent Conductor Formation Energy using PyCaret: An Open-Source Machine Learning Library Olanipekun, Ayorinde Tayo; Mashao, Daniel
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.202

Abstract

Designing innovative materials is necessary to solve vital energy, health, environmental, social, and economic challenges. Transparent conductors are compounds that combine low absorption visible range and good electrical conductivity, which are essential properties for conductors. Technological devices such as photovoltaic cells, transistors, photovoltaic cells and sensors majorly rely on combining the two properties due to their relevancy in an optoelectronic application. Meanwhile, fewer compounds exhibit both outstanding conductivity and transparency suitable for their application in transparent conducting materials. Kaggle hosted an open big-data competition organized by novel material discovery (NOMAD) to address the importance of finding new material with the ideal functionality. The competition was organized to identify the best machine learning (ML) to predict formation enthalpy (indicating stability) for 3000  (AlxGaylnz)2NO3Ncompounds datasets; where x, y, and z can vary from the constraints x+y+z=1. Here we present a prediction using an open-source machine learning library in Python called PyCaret to summarise top-ranked ML algorithms. The gradient boosting regressor (GBR) model performed best with MAE 0.0281, MSE 0.0018 and R2 0.84. The research shows that Machine learning can significantly accelerate the discovery and optimization of materials while reducing cost of computation and required time. Low code tools like PyCaret were used to enhance the machine learning applications in materials science, paving way for more efficient materials discovery processes.
Sentimental Analysis of Legal Aid Services: A Machine Learning Approach Khosa, Joe; Mashao, Daniel; Olanipekun, Ayorinde
Journal of Applied Data Sciences Vol 6, No 2: MAY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i2.521

Abstract

Legal Aid services in South Africa, administered by Legal Aid South Africa (SA), aim to provide essential legal representation to vulnerable individuals lacking financial resources. Despite its significant role, there is a pervasive perception among the public that the quality of these state-funded services is substandard, often leading to negative attitudes towards the organization. This research employs sentiment analysis to evaluate client perceptions of Legal Aid SA's services, using a dataset of 5,246 entries from Twitter and the Internal client feedback system between 2019 and 2024. The study utilizes various machine learning algorithms, including Naive Bayes, Stochastic Gradient Descent (SGD), Random Forest, Support Vector Classification (SVC), Logistic Regression, and Extreme Gradient Boosting (XGBoost), to analyze sentiment polarity and classify feedback into positive, neutral, and negative sentiments. The accuracy, precision, recall, and F1 scores assessed model performance. The SVC and XGBoost models demonstrated superior performance, achieving testing accuracies of 90.10% and 90.00%, respectively. In contrast, Naive Bayes and Logistic Regression lagged, with test accuracies of 82.00% and 85.00%, respectively. The findings reveal that most responses are either neutral or positive, suggesting a predominantly favourable impression of Legal Aid services. This research not only aims to enhance Legal Aid SA's service offerings but may also provide valuable insights for similar organizations globally.
Analyzing the Impact of Company Location, Size, and Remote Work on Entry-Level Salaries a Linear Regression Study Using Global Salary Data Khosa, Joe; Mashao, Daniel; Subekti, Fajar
International Journal of Informatics and Information Systems Vol 7, No 3: September 2024
Publisher : International Journal of Informatics and Information Systems

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/ijiis.v7i3.215

Abstract

This research explores the key factors influencing entry-level salaries in the global labor market of 2024, emphasizing the roles of company location, organizational size, and the extent of remote work in shaping compensation levels. Drawing on the Global Salary 2024 dataset from Kaggle, which comprises over 5,600 observations across multiple industries and geographic regions, the study applies a multiple linear regression model executed in Python via Google Colab to quantitatively examine salary disparities. The results indicate that company location and size significantly affect entry-level earnings, underscoring how regional economic contexts, cost-of-living variations, and organizational capacity continue to drive wage formation. Conversely, the remote work ratio exhibits a negligible and statistically insignificant effect, implying that flexibility in work arrangements has yet to translate into measurable financial value for early-career professionals. Furthermore, introducing job title as a control variable enhances the model’s explanatory power, reaffirming the influence of individual skill specialization and job function in determining compensation outcomes. These findings reinforce human capital theory while extending it by incorporating contextual and organizational dimensions relevant to the digital labor economy. For job seekers, the study offers data-driven insights to guide career decisions and salary expectations across regions, while employers may utilize the results to formulate fair and competitive pay strategies in an increasingly interconnected workforce. Ultimately, this study provides a comprehensive understanding of how structural and individual factors interact to shape entry-level salary dynamics in the modern digital era.
Cyber Attack Pattern Analysis Based on Geo-location and Time: A Case Study of Firewall and IDS/IPS Logs Mashao, Daniel; Harley, Charis
Journal of Current Research in Blockchain Vol. 2 No. 1 (2025): Regular Issue March
Publisher : Bright Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jcrb.v2i1.26

Abstract

Cyber attacks are a growing concern for organizations worldwide, requiring continuous monitoring and analysis to detect patterns and anticipate future threats. This study explores the temporal and geographical patterns of cyber attacks using log data from firewall and IDS/IPS systems, with a focus on understanding attack trends based on severity levels and monthly variations. The analysis revealed an almost even distribution of attacks, with 13,183 low severity, 13,435 medium severity, and 13,382 high severity incidents. This emphasizes the need for holistic defense strategies that address all levels of threats. Through time-series analysis, including the ARIMA model, we forecasted future attack trends, highlighting the consistency of cyber threats over time and identifying potential periods of increased activity. The monthly trend analysis showed fluctuations, with a notable peak of 906 attacks in March 2020 and a decrease to 825 attacks in April 2020, suggesting the influence of external factors such as global events. The ARIMA model provided accurate forecasts, indicating a steady rate of future attacks and underscoring the importance of continuous vigilance. While the ARIMA model captured linear trends effectively, future work should explore non-linear models, such as Long Short-Term Memory (LSTM) networks, to uncover deeper, more complex patterns in the data. This research provides critical insights into the nature of cyber attacks, offering organizations a data-driven approach to improving their cybersecurity measures. Future studies should focus on enhancing forecasting models and integrating real-time data to better anticipate emerging threats.