Attention has been given to multimodal texts to investigate their potential meaning affordances that facilitate learning and raise awareness of ideological meanings. However, how learners learn to make meaning by integrating intermodal relations involving language and visual images, especially in the context of learning English as a foreign language (EFL), has not been researched much. This study investigated the emerging process of meaning-making in digital factual storytelling practices in a senior high school in Indonesia. Fifty-six students participated in this study and each produced a digital storytelling video (DST). The DST videos were analyzed using an intersemiotic complementary framework to reveal the occurrences and typical intermodal meaning-making practices, as shown in the student digital storytelling (DST) videos on autobiography. The analysis focused on describing the emerging modes of making experiential meanings of the intermodal verbal-visual relations. It was found that the process of meaning-making was dominantly constructed in an exposition manner involving the verbs within clauses for identifying and describing. The emerging meanings resulting from the multimodal affordances allow storytellers to track with the potential meanings projected by the images. The selection of digital images helps students explore words when telling a story, shifting the conventional genre of autobiography. This study indicates the need to emphasize the purpose of text and to afford multimodal features to support the achievement of communication purposes.