The rapid acceleration of Industry 4.0 has fundamentally reshaped industrial competency demands, resulting in the "skill mismatch" phenomenon and contributing to structural unemployment in Indonesia. Effective labor market analysis is required, but traditional analyses often rely on rigid, retrospective survey methodologies that fail to capture these fast-paced dynamics in real time. This study addresses this gap by introducing a novel data-driven pipeline that validates 2,688 web-scraped job advertisements against official national manufacturing registries: Statistics Indonesia (BPS) and the Mandatory Labor Report (WLKP). This registry-based validation ensures data integrity by filtering out 51.7% of unverified postings, guaranteeing that the analysis is derived exclusively from legitimate firms within the verified manufacturing sector. A semantic approach using the Gemini-based Large Language Model (LLM) was implemented to extract, normalize unstructured data into the ESCO taxonomy, and categorize it. Unlike traditional NLP metrics that often fail to maintain functional relevance, the LLM-based approach successfully preserves professional context. While automated exact matching with the rigid ESCO framework yielded low accuracy (24.3% for titles; 9.8% for skills), expert validation confirmed high semantic accuracy of 81.5% and 85%, respectively. Strategic insights reveal a dual-track workforce structure: vocational graduates require technical dexterity for operational roles, while higher education graduates are sought for strategic oversight. Analysis reveals a dominant focus on operational excellence, with specialized digital demand varying by sector, such as CATIA for high-precision engineering in the automotive sector and Optitex for 3D-digital workflows in the apparel industry. This framework serves as an industrial demand blueprint for curriculum-industry alignment, while offering a synthesized scientific interpretation of the underlying labor market patterns.
Copyrights © 2026