Cybersecurity is a crucial aspect in maintaining the integrity and availability of information systems, especially on web servers which are vulnerable to various types of attacks and anomalies. This research aims to investigate the application of transfer learning in the classification of cyber attacks and anomalies on web servers. Transfer learning, a powerful deep learning approach, enables pre-trained models to adapt to new tasks with limited data, offering an efficient solution for detecting malicious activities and unusual patterns in web server logs. The goal is to improve detection accuracy while reducing the time and resources required to train models from scratch. This study uses a bi-layer classification approach with pre-trained Transformer models, RoBERTa and BERT, through transfer learning to detect cyber attacks and anomalies in web server log data. The process includes preprocessing the log data, extracting relevant features, and fine-tuning BERT to classify known attacks in the first layer, followed by RoBERTa in the second layer to detect unusual or unknown behaviors. Model performance is evaluated using accuracy, precision, recall, and F1-score, and results are compared with traditional deep learning methods like RoBERTa and BERT to highlight the advantages of this bi-layer transfer learning approach. The result of this proposed bi-layer classification method is improved performance in detecting cyber attacks and anomalies compared to using RoBERTa and BERT individually. By combining both models, the system is anticipated to achieve higher accuracy, better precision in identifying true threats, improved recall for detecting a wider range of attacks, and a more balanced F1-score. This layered approach leverages the strengths of both RoBERTa and BERT, enabling more robust and reliable threat detection, with reduced false positives and false negatives compared to single-model implementations.