In response to the growing importance of English for Specific Purposes (ESP) in vocational higher education, multimodal teaching strategies have emerged as essential approaches to address diverse learning needs in the 21st century. This study explores the preparedness of English instructors in Indonesian vocational colleges to integrate multimodal strategies into their instructional practices. Using a convergent mixed-method research design, the study collected data from ESP instructors across various majors via a semi-structured online questionnaire. The findings reveal that while most instructors demonstrate a positive disposition toward multimodal teaching and recognize its value in enhancing learner engagement, motivation, and 21st-century skills, significant challenges persist. These include limited access to digital tools, time constraints in lesson preparation, uneven student digital literacy, and difficulty in assessing multimodal outputs. Moreover, despite alignment with the Merdeka Curriculum, some instructors remain uncertain about the curriculum’s articulation of multimodal literacy. The study also highlights a preparedness gap stemming from inadequate training and institutional support. Nevertheless, instructors reported using a range of multimodal techniques, such as interactive media, group discussions, and digital projects, to foster communicative and task-based learning. The results underscore the necessity for structured professional development, institutional investment in digital infrastructure, and context-sensitive pedagogical models to enable sustainable multimodal integration. This research contributes to the growing body of literature on multimodal pedagogy by offering insights into instructors’ perceptions, challenges, and the enabling conditions needed to promote effective multimodal English teaching in vocational education settings.