The increasing multimodal design of English as a Foreign Language (EFL) textbooks has not been accompanied by sufficient empirical investigation into how visual elements systematically construct meaning and how students interpret these elements in classroom contexts. This study aims to examine how multimodal literacy is realized through visual grammar in the English Grade X textbook used at MAN 2 Deli Serdang, Indonesia, and to explore how students interpret these multimodal features. This research employed a qualitative descriptive design. Data were collected through multimodal content analysis of selected textbook units using Kress and van Leeuwen’s (2006) visual grammar framework and semi-structured interviews with ten EFL students. The analysis focused on representational, interpersonal, and compositional meanings constructed through visual, linguistic, and spatial modes. The findings reveal that the textbook systematically constructs meaning through narrative and conceptual visual representations, strategic use of gaze and social distance to build interpersonal relations, and compositional arrangements that highlight information value and salience. Students reported that these multimodal elements facilitated comprehension of abstract concepts, increased engagement, and enhanced perceived relevance of the learning materials. However, variations in interpretation indicate that visual meaning-making is influenced by students’ prior knowledge and literacy experience. These results suggest that multimodal design plays a significant pedagogical role in EFL textbook effectiveness and should be purposefully integrated into instructional material development. The study contributes to multimodal literacy research by integrating visual grammar analysis with students’ interpretive perspectives in the Indonesian EFL context.
Copyrights © 2026