Machine learning (ML) systems are increasingly deployed in cloud-native environments where scalability, portability, and resource efficiency are essential. There are many scenarios in which the Docker and Kubernetes containerization solution are the best solution for machine learning inferencing services as the application scales, moves, and seeks every efficiency. However, the performance of machine learning inferencing services within a containerized cloud environment still needs to be explored. What is the performance of machine learning inferencing services within a containerized cloud environment? The performance of machine learning inferencing services within a containerized cloud environment needs to be explored. The aim of the exploration is to understand the performance of various machine learning models within a containerized cloud environment and to determine the major factors affecting the performance of machine learning inferencing services. Several machine learning models are implemented using Python-based frameworks and deployed as microservices in Docker containers. The experiments are performed by sending simultaneous prediction requests from multiple users to the deployed models. The study establishes baseline benchmarks, which demonstrate the impact of containerization on inference speed and efficiency. This provides useful and practical knowledge for building scalable AI systems and establishes the foundation for future work, such as optimizing ML deployment pipelines, incorporating privacy-preserving inference techniques, and improving container orchestration for AI workloads.
Copyrights © 2025