Forest fires have become a serious global threat, significantly impacting ecosystems, communities, and economies. Although remote sensing technology shows potential, limitations such as time delays, limited sensor coverage, and low resolution reduce its effectiveness for real-time forest fire detection. Additionally, social media can serve as a multimodal sensor, presenting multilingual text data with rapid and global coverage. However, it may encounter challenges in obtaining location and time information on forest fires due to limitations in datasets and model generalization. This study aims to develop a multilingual named entity recognition (NER) model to identify location and time entities of forest fires in social media texts such as tweets. Utilizing a transfer learning approach with the XLM-RoBERTa architecture, fine-tuning was performed using the general-purpose Nergrit corpus dataset containing 19 entities, which were relabeled into 3 main entities to detect location, date, and time entities from tweets. This approach significantly improves the model's ability to generalize to disaster domains across multiple languages and noisy social media texts. With a fine-tuning accuracy of 98.58% and a maximum validation accuracy of 96.50%, the model offers a novel capability for disaster management agencies to detect forest fires in a scalable, globally inclusive manner, enhancing disaster response and mitigation efforts.