JEECS (Journal of Electrical Engineering and Computer Sciences)
Vol. 11 No. 1 (2026): JEECS (Journal of Electrical Engineering and Computer Sciences) - In press

Self-Supervised Log Anomaly Detection with LogBERT-Style Transformers: Full Empirical Evaluation on a Reproducible SynHDFS Benchmark

Xin, Qi (Unknown)



Article Info

Publish Date
14 May 2026

Abstract

Log-based anomaly detection is a core problem in AIOps because system logs provide fine-grained evidence of failures, performance regressions, and security incidents. Recent work has shown that self-supervised sequence modeling substantially improves generalization compared with purely frequency-based detectors, especially when labeled anomalies are scarce. This paper presents a LogBERT-style transformer framework for session-level log anomaly detection and reports a complete, reproducible experimental evaluation. Due to download constraints of large archived log datasets in this environment, we construct a faithful fallback benchmark, SynHDFS-6k, which mimics HDFS-style block workflows by composing normal execution patterns and injecting five realistic anomaly types. SynHDFS-6k contains 6000 sessions with a fixed 5.0% anomaly rate and a vocabulary of 20 event templates. We train a two-layer transformer encoder with masked language modeling on normal sessions only and derive an anomaly score using pseudo log-likelihood (PLL) computed by masking each token position once. We compare against unigram and bigram probabilistic models, PCA reconstruction error, one-class SVM, isolation forest, a DeepLog-style GRU next-event predictor, and a supervised logistic regression upper bound. On the SynHDFS-6k test split, the proposed LogBERT-PLL achieves Precision=0.615, Recall=0.533, F1=0.571, ROC-AUC=0.898, and PR-AUC=0.594. We additionally analyze transformer scoring strategies (PLL mean, PLL top-k, PLL max, random masking, and CLS Mahalanobis), report runtime and model capacity trade-offs, and quantify per-anomaly-type detection behavior. The study provides an end-to-end blueprint for transformer-based self-supervised log anomaly detection under a fully specified protocol, and it highlights strengths and limitations that inform deployment on real-world HDFS logs.

Copyrights © 2026






Journal Info

Abbrev

jeecs

Publisher

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering

Description

We aims to promote high-quality Electrical Engineering and Computer Sciences research among academics and practitioners alike, including power system, electrical engineering, industry automation, mechatronics, computer sciences, informatics, and information system. This journal is dedicated for the ...