This research explores digital interactions, analyzing toxicity, sentiment, and network dynamics using the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology. Understanding and managing these elements are crucial for effective digital strategies with the rise of user-generated content. Leveraging machine learning, including Support Vector Machines (SVM) and Synthetic Minority Over-sampling Technique (SMOTE), toxicity analysis and sentiment classification are conducted. Data preprocessing involves text cleaning and feature engineering, aligning with the CRISP-DM data preparation phase. Toxicity levels are measured using various toxicity metrics, including Toxicity, Severe Toxicity, Identity Attack, Insult, Profanity, and Threat. Sentiment analysis employs SVM to classify sentiment polarity, while SMOTE addresses class imbalance as part of the CRISP-DM modeling phase. Social Network Analysis (SNA) techniques are also applied to study network structures following the CRISP-DM modeling phase. Network data are processed to compute key SNA metrics such as Diameter, Density, Reciprocity, Centralization, and Modularity. Findings reveal a toxicity level of 0.06194 and severe toxicity at 0.00730. Identity Attack stands at 0.01107, while insults and profanity are at 0.03803 and 0.04905, respectively. The threat is observed at 0.01359. The sentiment analysis indicates an accuracy of 97.94%, with a precision and recall of 98.07% and 99.86%, respectively, for the positive class. The f-measure for the positive class is 98.96%. The SNA metrics show a diameter of 4, a density of 0.000266, and a reciprocity of 0.000000. Centralization is calculated at 0.001468, while modularity stands at 0.999400.
Copyrights © 2024