Claim Missing Document
Check
Articles

Found 3 Documents
Search

Enhancing Autonomous GIS with DeepSeek-Coder: an open-source large language model approach Nguyen, Kim-Son; Nguyen, The-Vinh; Nguyen, Van-Viet; Thi, Minh-Hue Luong; Nguyen, Huu-Khanh; Nguyen, Duc-Binh
International Journal of Electrical and Computer Engineering (IJECE) Vol 16, No 1: February 2026
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v16i1.pp423-436

Abstract

Large language models (LLMs) have paved a way for geographic information system (GIS) that can solve spatial problems with minimal human intervention. However, current commercial LLM-based GIS solutions pose many limitations for researchers, such as proprietary APIs, high operational costs, and internet connectivity requirements, making them inaccessible in resource-constrained environments. To overcome this, this paper introduced the LLM-Geo framework with the DS-GeoAI platform, integrating the DeepSeek-Coder model (the open-source, lightweight version deepseek-coder-1.3b-base) running directly on Google Colab. This approach eliminates API dependence, thus reducing deployment costs, and ensures data independence and sovereignty. Despite having only 1.3 billion parameters, DeepSeek-Coder proved to be highly effective: generating accurate Python code for complex spatial analysis, achieving a success rate comparable to commercial solutions. After an automated debugging step, the system achieved 90% accuracy across three case studies. With its strong error- handling capabilities and intelligent sample data generation, DS-GeoAI proves highly adaptable to real-world challenges. Quantitative results showed a cost reduction of up to 99% compared to API-based solutions, while expanding access to advanced geo-AI technology for organizations with limited resources.
Automated data exploration with mutual information in natural language to visualization Luong, Hue Thi-Minh; Nguyen, Vinh-The; Nguyen, Van-Viet; Nguyen, Kim-Son; Nguyen, Huu-Khanh
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 15, No 1: February 2026
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v15.i1.pp129-139

Abstract

Transcribing natural language to visualization (NL2VIS) has been investigated for years but still suffer from several fundamental limitations (e.g., feature selection). Although large language models (LLMs) are good candidates but they incur computation cost and hard to trace their made decisions. To alleviate this problem, we introduced an alternative information-theoretic framework that utilized mutual information (MI) to quantify the statistical relationship between utterances and database features. In our approach, kernel density estimation (KDE) and neural estimation techniques were utilized to estimate MI, and to optimize a diversity-promoting objective balancing feature relevance and redundancy. We also introduced the information coverage ratio (ICR) to quantify the amount of information content preserved in feature selection decisions. In our experiments, we found that the proposed approach improved information-theoretic metrics, with F1-score of 0.863 and an ICR of 0.891. We observed that these improvements did not come at the cost of traditional benchmarks: validity reached 88.9%, legality 85.2%, and chart-type accuracy 87.6%. Moreover, significance tests (p < 0.001) and large effect sizes (Cohen’s d > 0.8) further supported that these improvements were meaningful for feature selection. Thus, this study provides a mathematical framework for applications requiring analytical validity that extends beyond NL2VIS to other machine learning contexts.
From Feature Description to UML Architecture: A Novel Framework for Automated Reasoning and Multimodal Evaluation of Component and Deployment Diagram Nguyen, Van-Viet; Nguyen, Huu-Khanh; Nguyen, Kim-Son; Luong, Thi Minh-Hue; Bui, Anh-Tu; Vu, Duc-Quang; Nguyen, The-Vinh
Journal of Information Systems Engineering and Business Intelligence Vol. 12 No. 1 (2026): February
Publisher : Universitas Airlangga

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Background: Unified Modeling Language (UML) is fundamental to software architecture, yet the automated generation of high-level diagrams remains underexplored. Specifically, Component and Deployment diagrams pose significant challenges due to their high abstraction and complex architectural dependencies, which are difficult to infer from natural language descriptions alone. Objective: This study aimed to develop and validate a novel, end-to-end framework to bridge the gap between natural language feature descriptions and executable UML architectural diagrams. The primary goal was to fully automate the pipeline, from requirement generation to robust, multimodal validation of the final visual outputs. Methods: A quantitative study was conducted using a three-stage automated pipeline. First, LLaMA 3.2-1B-Instruct generated diverse feature descriptions. Second, DeepSeek-R1-Distill-Qwen-32B performed advanced reasoning to synthesize executable PlantUML code for Component and Deployment diagrams. Finally, a novel multimodal validation framework was introduced, employing an ensemble of three vision-language models—Qwen2.5-VL-3B, LLaMA-3.2-11B-Vision, and Aya-Vision-8B—to quantitatively assess the fidelity of the generated diagrams against their source descriptions. Results: Our framework demonstrated high fidelity in accurately capturing both system modularity (Component diagrams) and runtime allocation (Deployment diagrams). The reasoning-driven synthesis by DeepSeek-R1 significantly outperformed baseline models in generating architecturally correct diagrams. The multimodal evaluation pipeline effectively reduced scoring bias by integrating diverse validation perspectives. A key outcome is the creation of a systematically generated benchmark dataset of architectural diagrams. Conclusion: This study successfully establishes the viability of an AI-driven pipeline for automated UML architecture generation and validation. It provides three key contributions: the first fully automated pipeline for this task, a novel multimodal validation method, and a public benchmark dataset. This work lays a foundation for practical, AI-powered software architecture modeling. Future work should extend this framework to encompass behavioral UML diagrams.