Suicide is a pressing public health concern that affects both young people and adults. The widespread use of mobile devices and social networking has facilitated the gathering of data, allowing academics to assess patterns, concepts, emotions, and opinions expressed on these platforms. This study is to detect suicidal inclinations using Reddit online dataset. It allows for the identification of people who express thoughts of suicide by analyzing their postings. The method addresses and evaluates different machine learning classification models, namely linear SVC, random forest, and ensemble learning, along with feature extraction approaches such as TF-IDF, Bag of Words, and VADER. This study utilised a voting classifier in our ensemble model, where the projected class output is selected by the class with the highest probability. This approach, typically known as a "voting classifier," employs voting to forecast results. The results collected suggest that employing ensemble learning with the TF-IDF 2-grams approach yields the highest F1-score, specifically 0.9315. The efficacy of TF-IDF 2-grams can be determined to their capacity to capture a greater amount of contextual information and maintain the order of words.
Copyrights © 2024