Claim Missing Document
Check
Articles

Found 3 Documents
Search
Journal : Journal of Applied Data Sciences

Unveiling Criminal Activity: a Social Media Mining Approach to Crime Prediction Armoogum, Sheeba; Dewi, Deshinta Arrova; Armoogum, Vinaye; Melanie, Nicolas; Kurniawan, Tri Basuki
Journal of Applied Data Sciences Vol 5, No 3: SEPTEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i3.350

Abstract

Social media platforms have become breeding grounds for abusive comments, necessitating the use of machine learning to detect harmful content. This study aims to predict abusive comments within a Mauritian context, focusing specifically on comments written in Mauritian Kreol, a language with limited natural language processing tools. The objective was to build and evaluate four machine learning models—Decision Tree, Random Forest, Naïve Bayes, and Support Vector Machine (SVM)—to accurately classify comments as abusive or non-abusive. The models were trained and tested using k-fold cross-validation, and the Decision Tree model outperformed others with 100% precision and recall, while Random Forest followed with 99% accuracy. Naïve Bayes and SVM, although achieving 100% precision, had lower recall rates of 35% and 16%, respectively, due to imbalanced data in the training set. Pre-processing steps, including stop-word removal and a custom Kreol spell checker, were key in enhancing model performance. The study provides a novel contribution by applying machine learning in a Mauritian context, demonstrating the potential of AI in detecting abusive language in underrepresented languages. Despite limitations such as the absence of a Kreol lemmatization tool and incomplete coverage of Kreol spelling variations, the models show promise for wider application in social media crime detection. Future research could explore expanding this approach to other languages and domains of social media crimes.
Clustering the Unlabeled Data Using a Modified Cat Swarm Optimization Dewi, Deshinta Arrova; Kurniawan, Tri Basuki; Zakaria, Mohd Zaki; Armoogum, Sheeba
Journal of Applied Data Sciences Vol 5, No 3: SEPTEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i3.349

Abstract

This paper presents a modified version of the Cat Swarm Optimization (CSO) algorithm aimed at addressing the limitations of traditional clustering methods in handling complex, high-dimensional datasets. The primary objective of this research is to improve clustering accuracy and stability by eliminating the mixture ratio (MR), setting the counts of dimensions to change (CDC) to 100%, and incorporating a new search equation in the tracing mode of the CSO algorithm. To evaluate the performance of the modified algorithm, five classic datasets from the UCI Machine Learning Repository—namely Iris, Cancer, Glass, Wine, and Contraceptive Method Choice (CMC)—were used. The proposed algorithm was compared against K-Means and the original CSO. Performance metrics such as intra-cluster distance, standard deviation, and F- measure were used to assess the quality of clustering. The results demonstrated that the modified CSO consistently outperformed the competing algorithms. For example, on the Iris dataset, the modified CSO achieved a best intra-cluster distance of 96.78 and an F-measure of 0.786, compared to 97.12 and 0.781 for K-Means. Similarly, for the Wine dataset, the modified CSO reached a best intra-cluster distance of 16399, surpassing K-Means which recorded 16768. In conclusion, the modifications introduced to the CSO algorithm significantly enhance its clustering performance across diverse datasets, producing tighter and more accurate clusters with improved stability. These findings suggest that the modified CSO is a robust and effective tool for data clustering tasks, particularly in high-dimensional spaces. Future work will focus on dynamic parameter tuning and testing the scalability of the algorithm on larger and more complex datasets.
Breast Cancer Prediction Using Metrics-Based Classification Armoogum, Sheeba; Dewi, Deshinta Arrova; Kezhilen, Motean; Trinawarman, Dedi
Journal of Applied Data Sciences Vol 5, No 3: SEPTEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i3.351

Abstract

Breast cancer remains the most prevalent form of cancer among women, with rising mortality rates worldwide. Early detection and accurate classification are crucial for improving patient outcomes, but manual detection methods are often time-consuming, complex, and prone to inaccuracies. This study aims to develop a machine learning (ML)-based desktop application to automate the detection and classification of breast cancer, thereby improving the efficiency and accuracy of diagnosis. Various ML algorithms, including Random Forest, Decision Tree, Support Vector Machine, Logistic Regression, Gaussian Naive Bayes, and K-nearest Neighbors, were employed to build classification models. The Wisconsin Diagnostic Breast Cancer (WDBC) dataset was used, and pre-processing techniques such as data cleaning, over-sampling, and feature selection were applied to optimize model performance. Experimental results demonstrate that the Random Forest classifier outperformed the other models, achieving an accuracy of 95.54%, precision of 96.72%, recall (sensitivity) of 95.16%, specificity of 96%, and an F1-score of 95.93%. These results highlight the potential of ML techniques in enhancing breast cancer diagnosis by offering a more reliable and efficient classification process. Future work could focus on improving feature selection techniques and applying the model to more diverse datasets for broader applicability.