Hypertension is one of the major public health problems with a continuously increasing prevalence and is widely discussed on social media platform X. The dynamic and continuously flowing nature of social media data requires a Big Data-based processing approach capable of operating in real-time and in a scalable manner. This study aims to implement a streaming-based Big Data architecture (Kappa Architecture) using Apache Kafka and Apache Spark to process and analyze conversations about hypertension on the social media platform X in real-time. The proposed system integrates the X API as the data source, Apache Kafka as the immutable event log and streaming backbone, Apache Spark Structured Streaming as the real-time data processing engine, and MongoDB as the serving layer. The research methodology includes a literature review, system design, streaming-based data collection, real-time text cleaning and feature extraction, and performance evaluation using throughput, latency, and success rate parameters. A total of 10,000 tweets were collected over a two-month period and processed through a unified streaming pipeline. The implementation results show that the system successfully established a consistent end-to-end processing workflow, enabling real-time data ingestion and processing without separating batch and speed layers. The system achieved an average throughput of 19.23 tweets per second, a latency of approximately 520 seconds, and a success rate of 100%. This study concludes that the Kappa Architecture is effective, stable, and scalable for real-time processing and analysis of social media data in monitoring public health issues such as hypertension.
Copyrights © 2026