Large language models (LLMs) have undergone rapid evolution and are highly effective in tasks such as text generation, question answering, and context-driven analysis. However, the unique requirements of Islamic studies, where textual authenticity, diverse jurisprudential interpretations, and deep semantic nuances are critical, present challenges for general LLMs. This article reviews the evolution of neural language models by comparing the historical progression of general LLMs with emerging Islamic-specific LLMs. We discuss the technical foundations of modern Transformer architectures and examine how recent advancements, such as GPT-4, DeepSeek, and Mistral, have expanded LLM capabilities. The paper also highlights the limitations of standard evaluation metrics like perplexity and BLEU in capturing doctrinal, ethical, and interpretative accuracy. To address these gaps, we propose specialized evaluation metrics to assess doctrinal correctness, internal consistency, and overall reliability. Finally, we outline a research roadmap aimed at developing robust, ethically aligned, and jurisprudentially precise Islamic LLMs.
Copyrights © 2025