Zhang, Hanqi
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

LLM-Driven CI Failure Diagnosis and Automated Repair: From GitHub Actions Logs to Patch Recommendation Zhang, Hanqi
Journal of Technology Informatics and Engineering Vol. 4 No. 1 (2025): APRIL | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v4i1.484

Abstract

Continuous Integration (CI) pipelines surface regressions early but also produce long, noisy logs. Diagnosing a failing GitHub Actions run and drafting a safe repair patch can be time-consuming, especially when dealing with dependency drift or configuration errors. We study a practical CI-repair pipeline decomposed into three measurable tasks: (1) coarse failure-type classification, (2) retrieval-based repair (log similarity  reuse the closest historical fix diff), and (3) constrained patch generation that emits a unified diff via template+slot filling. The pipeline follows the schema and task framing of JetBrains-Research’s lca-ci-builds-repair dataset from Long Code Arena (212 samples). Because runtime restrictions in our environment prevent downloading the original Hugging Face-hosted parquet files, all quantitative results in this paper are evaluated on a locally generated proxy dataset, CI-Repair-Sim212, which matches the benchmark’s field schema and evaluation protocol. On CI-Repair-Sim212, failure-type classification reaches a ceiling (Macro-F1=1.000), whereas repair-pattern prediction remains harder (Macro-F1=0.796 with log+workflow). For patch recommendation, retrieval achieves Token-F1@1=0.898 and Pattern@1=0.783 when combining logs with workflow context, and constrained generation further improves diff similarity to Token-F1=0.923. Across tasks, adding workflow YAML context yields consistent gains, motivating hybrid CI assistants that prioritize retrieval when near-duplicate failures exist and fall back to constrained generation when close matches are absent.