Claim Missing Document
Check
Articles

Found 2 Documents
Search

Analyzing Systemic Failures in IT Incident Management: Insights from Post-Mortem Analysis Arifiansyah, Faris; Handayati, Yuanita
Eduvest - Journal of Universal Studies Vol. 5 No. 4 (2025): Eduvest - Journal of Universal Studies
Publisher : Green Publisher Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.59188/eduvest.v5i4.51031

Abstract

The reliability of IT systems is critical for fintech companies, where service disruptions can lead to significant financial losses and reputational damage. Despite established incident management frameworks, recurring IT incidents persist, indicating systemic weaknesses in prevention, detection, and response. This study aims to identify the root causes of significant IT incidents, assess detection and resolution challenges, and provide actionable recommendations to enhance incident management. Using a qualitative approach, the research analyzed 26 post-mortem reports from an Indonesian fintech company (August 2023–2024), employing thematic analysis to categorize systemic failures. Findings revealed that 80% of incidents stemmed from internal changes, primarily due to inadequate testing, weak deployment controls, and misconfigured production settings, while 69% lacked proactive alerts, delaying detection. Incident response inefficiencies, such as slow escalations and insufficient post-fix monitoring, further prolonged resolution times. The study highlights the need for stricter change validation, standardized alerting mechanisms, and automated deployment checks to mitigate disruptions. These insights offer practical guidance for fintech and technology companies to reduce incident frequency, improve detection capabilities, and optimize response efficiency. The research contributes to the broader IT incident management field by empirically validating failure patterns in fintech environments and proposing data-driven solutions. Future research could explore AI-driven automation and organizational factors influencing incident handling.
Analyzing Systemic Failures in IT Incident Management: Insights from Post-Mortem Analysis Arifiansyah, Faris
Eduvest - Journal of Universal Studies Vol. 5 No. 5 (2025): Eduvest - Journal of Universal Studies
Publisher : Green Publisher Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.59188/eduvest.v5i5.51192

Abstract

The reliability of IT systems is crucial for technology-driven businesses, as service disruptions can lead to financial losses, operational inefficiencies, and customer dissatisfaction. Despite having an incident management framework, organizations still experience recurring IT incidents, indicating systemic weaknesses in incident prevention, detection, and response. To identify the systemic root causes of major IT incidents and assess challenges in incident detection and resolution. By identifying recurring failure patterns, the research seeks to provide insights into improving IT incident management processes. The study uses a qualitative approach, utilizing thematic analysis on post-mortem reports of 26 major IT incidents that occurred at PT INUSA, a fintech company in Indonesia, between August 2023 and August 2024. Tags were assigned to categorize systemic failure points, and patterns were extracted to highlight deficiencies in software operations and incident management processes. Findings show that 80% of incidents were triggered by internal changes, with recurring issues such as insufficient testing, ineffective deployment and change control processes, and missing or misconfigured production settings. Additionally, 69% of incidents lacked proactive alerts, particularly on transaction success rates, CPU utilization, and system health metrics, leading to delayed detection. Incident response inefficiencies, including delayed incident reporting and slow debugging processes, further prolonged recovery times. The study highlights critical weaknesses in IT incident management and recommends improvements such as enhanced automated testing, stricter deployment validation, and standardized monitoring mechanisms. These insights provide guidance for fintech and technology companies to reduce incident frequency, improve detection capabilities, and optimize response efficiency.