Store sales forecasting based on historical data has been widely studied; however, most conventional approaches remain limited to single time series data and are less capable of capturing the complex influence of external factors. Existing knowledge suggests that deep learning can improve forecasting accuracy compared to traditional statistical methods, but what remains unclear is the extent to which multimodal integration—combining time series, economic, and categorical data—can enhance predictive performance in a dynamic retail context. This study aims to develop and evaluate a multimodal deep learning model using the Keras Functional API for store sales forecasting. The methodology involves collecting and processing daily transaction data, oil prices, holidays, and store information, followed by preprocessing, feature engineering, normalization, and time-window construction stages. Four architectures were tested—LSTM, 1D CNN, CNN+RNN, and Multiscale CNN—with performance evaluation conducted using Mean Absolute Error (MAE). The results indicate that multimodal integration yields a significant improvement compared to single-source data, with the 1D CNN model achieving the best performance at an MAE of 57,4318. The discussion highlights that integrating external variables such as oil prices and holidays enhances the robustness of predictions, while the main challenges remain in high computational requirements and limited model interpretability. This study concludes that the multimodal deep learning approach provides a scientific contribution by enriching the literature on sales forecasting while offering practical implications for the retail sector in inventory management, promotional planning, and data-driven decision-making.