The rapid development of information technology has led to the exponential growth of data generated from various sectors, such as healthcare services, social media, information systems, and other digital activities. This condition has given rise to the concept of big data, which cannot be optimally processed using conventional data processing technologies. Therefore, distributed computing platforms are required to efficiently handle large-scale data storage and processing. Apache Hadoop is one of the widely used big data technologies due to its distributed architecture that supports scalability, parallel processing, and fault tolerance. This study aims to analyze the architecture of Apache Hadoop and explain the role of each of its components in supporting large-scale data processing. The research method employed is a qualitative literature study, conducted through the review of books, scientific articles, and related publications on Hadoop. The results indicate that Hadoop consists of three main components: the Hadoop Distributed File System as a distributed storage system, MapReduce as a programming model for parallel data processing, and Yet Another Resource Negotiator, which functions in cluster resource management and scheduling. The integration of these components enables Hadoop to manage large-scale data in a reliable and distributed manner. However, Hadoop has limitations related to its batch-based processing model, which is less suitable for real-time processing needs, thus requiring consideration of complementary technologies according to application requirements.