Computer Science and Information Technologies
Vol 4, No 2: July 2023

An ensemble approach for the identification and classification of crime tweets in the English language

Tooba Siddiqui (Department of Computer Science NED University, Karachi)
Saman Hina (Department of Computer Science NED University, Karachi)
Raheela Asif (Department of Software Engineering NED University, Karachi)
Saad Ahmed (Department of Computer Science IQRA University)
Munad Ahmed (Research Department MSM360.pk)



Article Info

Publish Date
01 Jul 2023

Abstract

Twitter is a famous social media platform, which supports short posts limited to 280 characters. Users tweet about many topics like movie reviews, customer service, meals they just ate, and awareness posts. Tweets carrying information about some crime scenes are crime tweets. Crime tweets are crucial and informative and separate classification is required. Identification and classification of crime tweets is a challenging task and has been the researcher’s latest interest. The researchers used different approaches to identify and classify crime tweets. This research has used an ensemble approach for the identification and classification of crime tweets. Tweepy and Twint libraries were used to collect datasets from Twitter. Both libraries use contrasting methods for extracting tweets from Twitter. This research has applied many ensemble approaches for the identification and classification of crime tweets. Logistic regression (LR), support vector machine (SVM), k-nearest neighbor (KNN), decision tree (DT), and random forest (RF) Classifier assigned with the weights of 1,2,1,1 and 1 respectively ensemble together by a soft weighted Voting classifier along with term frequency – inverse document frequency (TF-IDF) vectorizer gives the best performance with an accuracy of 96.2% on the testing dataset.

Copyrights © 2023






Journal Info

Abbrev

csit

Publisher

Subject

Computer Science & IT Engineering

Description

Computer Science and Information Technologies ISSN 2722-323X, e-ISSN 2722-3221 is an open access, peer-reviewed international journal that publish original research article, review papers, short communications that will have an immediate impact on the ongoing research in all areas of Computer ...