Tuberculosis (TB) continues to be a major global health issue, especially in areas with limited resources where diagnostic tools are often insufficient. Traditional TB detection methods are slow and lack sensitivity, particularly for early-stage or low bacterial load cases. This study introduces a new multimodal deep learning model that integrates sputum image segmentation across RGB, hue, saturation, and value (HSV), and CIELAB color channels, using the YOLOv8 model for real-time detection and segmentation. The model uses the International Union Against Tuberculosis and Lung Disease (IUATLD) grading scale for accurate Mycobacterium tuberculosis (MTB) classification. Our approach shows high accuracy (92.24%) and precise forecasting (mean absolute percent error (MAPE) of 0.23%), greatly enhancing diagnostic speed and reliability. This research offers a novel method for classifying MTB using a multimodal deep learning model that integrates sputum image segmentation across RGB, HSV, and CIELAB color channels. By using the YOLOv8 model for real-time bounding box detection and segmentation, and the IUATLD grading scale for classification, our method achieves high accuracy and precision in identifying TB bacteria. Our findings indicate that this multimodal deep learning approach significantly improves diagnostic accuracy and speed, providing a reliable tool for early TB detection.