Abstract: Rice drying is a crucial post-harvest stage that affects the quality, shelf life, and economic value of rice. Conventional methods, such as sun drying and timer-based systems, are still predominantly used but are less adaptive to weather changes, often resulting in reduced product quality. This study developed an automated rice drying control system based on multimodal deep learning by integrating visual images and weather sensor data. The YOLOv5 model was used to detect grain conditions with 95% accuracy, while sensor analysis using LSTM and Transformer achieved accuracies of 90% and 93%, respectively. Multimodal integration improved control accuracy to 96% through an automatic roof opening/closing mechanism responsive to weather conditions and grain moisture status. Test results show that this system is more efficient than the baseline method, with an average drying time of 12 hours, moisture content accuracy of ±96%, and 30% lower yield loss. These findings highlight the potential of multimodal deep learning in supporting precision agriculture and modernizing post-harvest processes in Indonesia, while also opening opportunities for developing similar systems for other food commodities to support sustainable food security. Keywords: Rice Drying, Intelligent Control System, Multimodal Deep Learning, Sensor Data, Visual Imagery Abstrak: Pengeringan padi merupakan tahap krusial pascapanen yang memengaruhi mutu, daya simpan, dan nilai ekonomis gabah. Metode konvensional, seperti penjemuran matahari dan sistem berbasis timer, masih dominan digunakan namun kurang adaptif terhadap perubahan cuaca, sehingga sering menurunkan kualitas hasil. Penelitian ini mengembangkan sistem kendali pengeringan padi otomatis berbasis multimodal deep learning dengan mengintegrasikan citra visual dan data sensor cuaca. Model YOLOv5 digunakan untuk mendeteksi kondisi gabah dengan akurasi 95%, sedangkan analisis sensor menggunakan LSTM dan Transformer menghasilkan akurasi masing-masing 90% dan 93%. Integrasi multimodal meningkatkan akurasi kendali menjadi 96% melalui mekanisme buka–tutup atap otomatis yang responsif terhadap kondisi cuaca dan status kekeringan gabah. Hasil uji menunjukkan sistem ini lebih efisien dibandingkan metode baseline, dengan waktu pengeringan rata-rata 12 jam, akurasi kadar air ±96%, serta kehilangan hasil 30% lebih rendah. Temuan ini menegaskan potensi penerapan multimodal deep learning dalam mendukung pertanian presisi dan modernisasi proses pascapanen di Indonesia, sekaligus membuka peluang pengembangan sistem serupa pada komoditas pangan lain untuk mendukung ketahanan pangan berkelanjutan. Kata Kunci: Pengeringan Padi, Sistem Kendali Cerdas, Multimodal Deep Learning, Data Sensor, Citra Visual