The digitalization of elementary education demands that reading and writing learning not only focus on verbal texts but also integrate visual, auditory, spatial, and interactive modes. This study aims to synthesize the direction of the transformation of Indonesian language learning media in elementary schools and formulate an integrative framework for cognitive-friendly multimodal literacy. This study uses a Systematic Literature Review by adapting the PRISMA flow. Literature was searched through Google Scholar, ERIC, ScienceDirect, and SINTA between 2021 and 2026 using a combination of keywords: multimodal literacy, reading and writing, elementary school, and digital media. Of the 86 initial records, 21 peer-reviewed articles met the inclusion criteria and were analyzed through thematic content analysis. The synthesis reveals four main findings: the shift of media from tools to environments for the production of meaning, the importance of aligning modes with literacy goals, the need for cognitive load management in multimedia design, and the role of teachers as pedagogical mediators and curators of digital resources. The integration of multimodal social semiotics and the Cognitive Theory of Multimedia Learning provides the basis for balancing representational richness with students' information-processing capacity. These findings also demonstrate that the success of multimodal media is influenced by teacher readiness, device access, and the relevance of the school's local context. The transformation of reading and writing media needs to be directed toward critical, creative, inclusive, and measurable learning designs so that students become not only media users but also producers of multimodal digital texts.