Journal of Applied Data Sciences
Vol 5, No 3: SEPTEMBER 2024

Unveiling Criminal Activity: a Social Media Mining Approach to Crime Prediction

Armoogum, Sheeba (Unknown)
Dewi, Deshinta Arrova (Unknown)
Armoogum, Vinaye (Unknown)
Melanie, Nicolas (Unknown)
Kurniawan, Tri Basuki (Unknown)



Article Info

Publish Date
23 Sep 2024

Abstract

Social media platforms have become breeding grounds for abusive comments, necessitating the use of machine learning to detect harmful content. This study aims to predict abusive comments within a Mauritian context, focusing specifically on comments written in Mauritian Kreol, a language with limited natural language processing tools. The objective was to build and evaluate four machine learning models—Decision Tree, Random Forest, Naïve Bayes, and Support Vector Machine (SVM)—to accurately classify comments as abusive or non-abusive. The models were trained and tested using k-fold cross-validation, and the Decision Tree model outperformed others with 100% precision and recall, while Random Forest followed with 99% accuracy. Naïve Bayes and SVM, although achieving 100% precision, had lower recall rates of 35% and 16%, respectively, due to imbalanced data in the training set. Pre-processing steps, including stop-word removal and a custom Kreol spell checker, were key in enhancing model performance. The study provides a novel contribution by applying machine learning in a Mauritian context, demonstrating the potential of AI in detecting abusive language in underrepresented languages. Despite limitations such as the absence of a Kreol lemmatization tool and incomplete coverage of Kreol spelling variations, the models show promise for wider application in social media crime detection. Future research could explore expanding this approach to other languages and domains of social media crimes.

Copyrights © 2024






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...