The rapid expansion of digital communication platforms has increased the circulation of phishing messages, disinformation, and extreme opinions, creating urgent challenges for cybersecurity and social stability. This study proposes a hybrid CNN–BiLSTM–Transformer framework for the early detection of harmful digital text. The model integrates convolutional feature extraction, sequential dependency learning, and self-attention mechanisms to capture local lexical patterns, contextual relations, and long-range semantic dependencies. Experimental evaluation was conducted using accuracy, precision, recall, F1-score, and ROC analysis, with CNN, LSTM, and RoBERTa used as baseline models. The proposed hybrid model achieved the highest classification accuracy of 95.0%, outperforming CNN (86.0%), LSTM (88.0%), and RoBERTa (91.0%). In addition, the model obtained 90.0% precision, 93.0% recall, and 91.5% F1-score, indicating a balanced ability to reduce false positives while maintaining strong detection sensitivity. Robustness testing further showed that the F1-score remained stable across normal, noisy, and adversarial text conditions, decreasing from 95.0% under normal conditions to 92.0% and 90.0% under noisy and adversarial settings, respectively. These findings demonstrate that the proposed hybrid Transformer-based architecture provides an effective and robust approach for supporting automated Cyber Early Warning Systems in detecting harmful digital content.