The rapid expansion of large-scale Internet of Things (IoT) ecosystems has generated massive volumes of heterogeneous multimodal data, creating new challenges related to scalability, data integration, privacy protection, and real-time intelligence. Traditional centralized learning architectures struggle with communication bottlenecks, privacy regulations, and the complexity of processing diverse data modalities such as sensor signals, audio, video, text, and location streams. Although federated learning (FL) provides a decentralized alternative, existing FL models remain limited in handling multimodal inputs, managing non-IID data distributions, and ensuring strong resilience to adversarial threats. This study proposes a Federated Multimodal Learning Framework that combines probabilistic representation encoding, hierarchical mixture-of-experts fusion, cross-modal consistency regularization, and communication-efficient update scheduling. The framework enables distributed IoT devices to collaboratively learn multimodal representations without sharing raw data, thereby maintaining compliance with GDPR, HIPAA, and other privacy legislation. A probabilistic multimodal embedding mechanism reduces information leakage while supporting dynamic and reliable cross-modal interactions, even under missing or imbalanced modality conditions. Experimental results show that the proposed framework significantly outperforms existing multimodal FL approaches. It achieves higher model accuracy, reduces communication costs by 40-70%, maintains strong privacy protection with minimal performance degradation, and demonstrates enhanced robustness against adversarial attacks. Furthermore, the model provides superior multimodal fusion quality, effectively aligning heterogeneous data streams within federated constraints. Overall, this research delivers a scalable, privacy-preserving, and highly adaptive solution for intelligent computing in modern IoT environments, offering a stronger foundation for real-world applications in smart cities, industrial automation, healthcare monitoring, and next-generation distributed AI systems.