Insomnia is a sleep disorder that is widely experienced by the public and has a significant impact on physical and mental health, as well as productivity. However, early detection of insomnia remains a challenge because its symptoms are difficult to identify directly. This study uses historical data of 13,950 tweets from 4,286 Twitter accounts (January 1–April 30, 2025) to predict potential insomnia using Natural Language Processing (NLP) and machine learning methods. Insomnia labels are determined through an expert-verified keyword-based approach, followed by preprocessing, temporal analysis, and sentiment analysis. Two classification models are used: Support Vector Machine (SVM), which excels at separating classes in high-dimensional data, and Long Short-Term Memory (LSTM), which excels at capturing sequential patterns and temporal context. Preliminary results showed that SVM had 89% accuracy and was superior in the non-insomnia class (precision 0.80, recall 0.97) but suboptimal in insomnia (precision 0.92, recall 0.82), while LSTM had 90% accuracy and was better in insomnia (precision 0.98, recall 0.86) but slightly inferior in non-insomnia (precision 0.81, recall 0.96). Since each model had different strengths, they were combined with a probabilistic ensemble averaging method which resulted in 92% accuracy with balanced improvements in both classes (non-insomnia: precision 0.82, recall 0.99; insomnia: precision 1.00, recall 0.88), making it more reliable than a single model in detecting potential insomnia.
Copyrights © 2025