ENGAGING THE AUDIENCE IN ONLINE YOUTUBE SCIENCE DISSEMINATION VIDEOS: A LOOK AT THE UPTAKE OF MULTIMODAL ENGAGEMENT STRATEGIES

This study deals with YouTube science dissemination videos as a pedagogical tool for ESP and EMI courses. These videos maximize the reach of scientific content bringing it closer to the general public. However, in this process, YouTube science video need to be adapted to non-specialized audiences to make their content more accessible, entertaining, and engaging. The introductions to these videos are paramount for their success, as they tend to contain many engagement techniques. In this study we explore the potential of multimodal engagement strategies by looking at the audience uptake. Seven introductions to YouTube science dissemination videos were selected and annotated with the software Multimodal Video Analysis for engagement strategies, embodied and filmic modes. The selected clips were then presented to a group of higher education students to elicit their opinion regarding their engagement potential. As a second step, engagement was related to the use of semiotic modes, thus triggering a reflection on the relevance of multimodal literacy. Our results suggest that filmic modes become an essential component in the creation of successful fast-paced attention-getting videos. They also bring to the fore the importance of multimodal awareness in science communication and the potential of well-orchestrated ensembles to engage online audiences. Implications for ESP and EMI courses are discussed .


INTRODUCTION
Influenced by a general globalization of the economy and knowledge, Higher Education (HE) has undergone a process of internationalization in the last two decades resulting in many universities opening up to recruiting international students and to promoting exchange programs (Fortanet-Gómez, 2013). This process has been of particular significance in Europe (Orduna-Nocito & Sánchez-García, 2022), and digital learning has been identified as a key asset for the future of student and teacher mobility (de Wit & Hunter, 2015). Together with the access to technology, de Wit and Hunter (2015) also identify language proficiency as one of the barriers to be removed for the success of HE internationalization. In this scenario, many researchers have described the emergence of English as a Medium of Instruction (EMI) (Pérez-Llantada, 2018) and the consolidation of English for Specific Purposes (ESP) (Aguilar, 2018) in European universities. This has led to an "Englishization" of education and institutions (Lanvers & Hultgren, 2018). This educational context has brought with it new challenges and the need to rethink current teacher training programs. In the case of EMI, content lecturers may now find themselves teaching in English to international and intercultural students for the first time (Dafouz, 2018); and for both EMI and ESP lecturers, present research calls for a digitalization of lectures that acknowledges the importance of multimodal discourse and raises awareness of the intricacies in the conveyance of meaning. In this sense, recent studies have explored YouTube science dissemination videos as complex multimodal resources (Valeiras-Jurado & Bernad-Mechó, 2022) with great engagement potential (Bernad-Mechó & Valeiras-Jurado, 2023) which can be employed in the ESP and EMI classrooms as a way to make real scientific content accessible to undergraduate students (Girón-García & Fortanet-Gómez, 2023). Thus, this paper sets out to investigate YouTube science dissemination videos as an engaging educational resource to be introduced in HE, both in ESP and EMI contexts, and contribute to the multimodal digitalization of the classroom. To do so, a multimodal approach is followed, employing concepts from different schools of thought: following Multimodal Discourse Analysis (O'Halloran, 2021), the semiotic resources (modes) used in these videos and their combinations in multimodal ensembles are explored. Furthermore, the concept of modal density from Multimodal Interaction Analysis (Bernad-Mechó, 2022;Norris, 2022) is applied. Modal density refers to the number of modes employed at a given moment and/or their intensity. Thus, a fragment will be modally dense if many modes are used simultaneously or if any of them takes on a salient role over the rest of modes employed in such fragment. Moreover, this investigation is carried out from a perspective that has frequently been neglected: the audience uptake in regards to the engaging potential of the materials, i.e., how capable they are of retaining the attention of the students.
In this scenario, online videos have proved to be one of the preferred options to disseminate science (Erviti & Stengler, 2016;Kousha et al., 2012;León & Bourk, 2018). These videos bring about the challenge of adapting content to a nonspecialized audience, a process that has been referred to as recontextualization in previous literature included ( Luzón Marco, 2019). In the case of science popularization through online videos, recontextualization means using new multimodal ensembles, i.e., combination of semiotic modes, to convey scientific content in a way that is both accessible and engaging for their audience. This process of recontextualization can be realized by means of different strategies. Luzón Marco (2019) identifies four groups of these strategies, each with a different function: i) building credibility; ii) building persuasive arguments; iii) tailoring information to the assumed knowledge of the audience; and iv) engaging the audience. This last group is the one that is particularly relevant to this study, since this paper focuses on engagement uptake in online science videos. Carter-Thomas and Rowley-Jolivet (2020) further investigated such engagement strategies in the genre of Three Minute Theses presentations. They find, among other things, catchy titles, visual impact, various personalization devices, questions, humor and "street cred" (i.e., a common framework based on shared cultural values rather than scientific knowhow) as engagement devices. As Luzón Marco (2019) states, "the scientific discourse of formal academic genres is recontextualized in online science videos, harnessing the multimodal affordances of digital video" (Luzón Marco, 2019: 171). Online science videos bring academic language closer to the general audience using a number of recontextualization strategies which not only facilitate understanding but also make content engaging. In this sense, as Girón-García and Fortanet-Gómez (2023) indicate, online science videos can be particularly useful as pedagogical materials. According to these authors, science popularization videos are, in fact, frequently introduced in the ESP classroom as they trigger previous knowledge on scientific topics, increase motivation and engagement, and contribute to the digitalization of the lectures.

YouTube as a platform for science dissemination
Among the many digital platforms for the dissemination of knowledge, YouTube has recently become a stage for educational purposes and science communication, transcending its initial entertaining and commercial purposes (Allgaier, 2020;Geipel, 2018). In fact, Osterrieder (2013) argues that YouTube affords for a fast and amplified dissemination of scientific content, thus reaching wider audiences. In this line, Snickars and Vonderau (2009) define YouTube as a database to archive knowledge, thus becoming an entity fostering participatory culture (Boy et al., 2020;Burgess & Green, 2009). This expansion in the availability of and access to science, however, has also brought a rich offer of videos and, consequently fierce competition for views. In this regard, much research has been conducted to discern the aspects that make YouTube science videos attractive. Welbourne and Grant (2016), for instance, look at factors affecting the popularity of videos. They argue that professionally-generated videos usually receive more views than user-generated ones; however, the line between amateur and professional productions is blurry (Allgaier, 2020) and content creators seem to favor direct approaches that are closer to the audience and give a more amateur impression (Muñoz Morcillo et al., 2016) even if they are professionally produced. Welbourne and Grant (2016), on the other hand, argue that higher rates of speech, i.e., speakers who speak faster, increase views. Muñoz Morcillo et al. (2016) look at the engagement potential of science dissemination presenters and bring to the fore the importance of narrative features, such as charismatic first-person accounts, the use of suspense, or climaxes; and non-verbal features controlled both by the speaker -such as direct gaze to the camera -and the production company -such as the type of editing, types of shots, lighting and visual and sound prompts. Valeiras-Jurado and Bernad-Mechó (2022) refer to the latter as 'filmic modes'. They explore the complexity of multimodal ensembles in YouTube science dissemination videos and highlight the central role played by these modes in the conveyance of meaning.
From a more statistical point of view, Saurabh and Gautam (2019) have explored analytic metrics on YouTube educational channels. Essentially, the number of subscribers, likes, playlists in which videos are included, shares, and comments are essential for the success of these channels. They also argue that most videos are between 7 and 20 minutes long and, although the length of the video seems to be relevant, one more important aspect is the percentage of the video that was viewed. According to these authors, most successful videos retain attention during an average of 30-40% of the video. Attention retention is, therefore, a key aspect to be taken into account by producers. Sabich and Steinberg (2017) distinguish three main windows of opportunity to engage the audience: in the opening of the video, through the development and at the end. They also distinguish a number of strategies (both verbal and non-verbal) to achieve this such as a direct gaze to the camera, the use of gestures, or the ways in which the main topic is presented, among others. In the same line, Muñoz Morcillo reflect on the importance of catchy introductions to grasp the attention of the audience in the very first seconds. Because research closely looking at introductions is scarce -with the exception of the aforementioned studies -it is precisely the engagement potential of introductions that is the main focus of the present paper.

The use of digital videos as pedagogical materials
The growing consumption of science on the Internet implies that the ways by which knowledge is transmitted have evolved, and this is especially true in the field of higher education (León & Bourk, 2018;Querol-Julián & Crawford Camiciottoli, 2019). Consequently, greater accessibility of teaching materials and resources in open repositories (i.e., Open Educational Resources, OER) has become necessary. These include, among others, web-based text resources, video recorded lectures, online books and academic journals, and YouTube videos (Girón-García & Fortanet-Gómez, 2023). In the present study we focus on the latter. According to Erviti and León (2016), the combination of science and technology has become remarkably popular on YouTube. For this reason, it has attracted the attention of recent scholarly research that wants to unveil how science popularization takes place online, what the pedagogical potential of these online materials is, and the ways in which they can be used in the teaching of ESP in HE.
One of the studies looking into the potential of online science popularization is Muñoz Morcillo et al.'s (2016), where the authors identify different types of online videos, such as edited talk, documentary, reportage, essay film, animation/cartoon, live drawing, fictional film and mockumentary. Among the recurrent resources that they find in these videos they cite the use of a narrator as voice-over or the use of close-ups as personalizing devices. Furthermore, Boy et al. (2020) suggest a different classification of videos, focusing on their modal configurations: • presentation films, in which the speaker talks to the camera in a medium closeup shot and answers a number of scientific questions. In addition to embodied modes produced with the body, such as spoken language gestures and facial expressions, these videos show filmic modes such as background images, and visual effects; • expert films, in which an expert person discusses a topical field of research. These videos combine the speaker's image with moving image material; • animation films showing artificial moving images to illustrate a process, a problem, an issue, or a scientific theory; and • narrative explanatory films, which may contain elements from the previous types, combining narrative and informative elements. They use mainly moving image material and have the highest number of cuts. The videos that concern the present study could be classified in between the presentation and the narrative type. Being able to understand how (multimodal) communication is created within these types of videos as well as being able to replicate them should be one of the concerns of present-day lecturers. As Jewitt (2008) states, literacy is no longer a merely linguistic skill, and being multimodally literate is key to communicating successfully in the digital era. Thus, a multimodally literate lecturer should have the skills to understand the multimodal ensembles at play in online science discourse, and they should be able to orchestrate and employ those semiotic resources themselves. In other words, they should understand the meaning potential and the affordances that are intrinsic to each mode in the digital content production (O'Halloran & Lim, 2011). Furthermore, online science videos, in all their variety, provide an invaluable tool to meet the needs of nowadays ESP students, who need to find the content in these videos engaging and motivating. Exploring how this is made possible through the use of multimodal engagement strategies is a great step forward in this line of research and would contribute to training lecturers to become aware of the multimodal nature of discourse and to be ready for present-day teaching.

Engagement
Getting and maintaining the attention of YouTube audiences, as argued above, is one of the main objectives for any content creator to make sure their videos are successful. This is even more the case in science dissemination: when communicating to non-specialized audiences, there is not only a knowledge gap, but also an interest gap, both of which need to be bridged by communicators (Luzón Marco, 2019). Applied to lecturers, motivating students is an essential skill in their teaching process. In all these cases, engagement becomes central. Hyland (2005) defines engagement as an alignment dimension of interaction where writers acknowledge and connect to others, recognizing the presence of their readers, pulling them along with their argument, focusing their attention, acknowledging their uncertainties, including them as discourse participants and guiding them to interpretations. (Hyland, 2005: 176) Engagement has been widely researched in digital academic settings such as Three Minute Theses (Carter-Thomas & Rowley-Jolivet, 2020; Jiang & Qiu, 2021), TED talks (Xia & Hafner, 2021) and even YouTube (Tafesse, 2020). In the present paper, our take on engagement is related to the capabilities of producers to attract and retain audiences' attention. In other words, we consider a YouTube science dissemination video as engaging if it motivates the viewers to keep on watching. Mechó and Valeiras-Jurado (2023) devise a taxonomy for the study of engagement in science dissemination videos based on five main strategies: -Emphasis: parts of the message are highlighted, for example, through intonation, repetition of some words, use of visuals, etc. -Attention getting: speakers and producers aim to obtain and maintain the attention of the audience. Devices like visual and sound effects, or even gestures may be used. -Dialogic involvement: audiences are involved in a real or fictional dialog with speakers or presenters. This is done by referring directly to them, either verbally or using gestures and/or other non-verbal resources. -Humor: jokes, irony and other devices are employed. Non-verbal resources such as paralanguage, facial expression or visual and sound effects may contribute to the humorous effect. -Control of responses: potential responses of the audience are predicted by speakers and steered in the desired direction. This is done, for example, by presenting information as assumed using intonation, facial expression, head movements, gestures, etc.
Furthermore, these authors also describe how these strategies are applied systematically in a YouTube video and how engagement is intrinsically multimodal, as combinations of embodied and filmic modes are skillfully constructed to realize these engagement strategies. All these papers focus mainly on describing how engagement occurs in online contexts. Still, research that considers audiences directly is scarce. Khan (2017) surveys 1,143 YouTube users and describes two main types of engaged users: active users of YouTube as a source of relaxing entertainment who comment, share, like and dislike, and passive content consumers, also driven by relaxing entertainment motives, who may read comments looking for information but who do not interact. All in all, Khan's (2017) study looks at engagement in terms of interaction and social participation in the platform. However, the reasons leading the audience to maintaining attention on videos are not explored. The present paper aims to fill that gap by looking at the audience uptake and exploring the audience's perspective on how they are engaged into watching science dissemination videos. Furthermore, the multimodal identity of communication is taken into account to obtain a full overview of how engagement occurs. Against this background, two research questions were formulated to guide the study: RQ1: What semiotic resources and engagement strategies contribute to making science dissemination videos appealing to HE students? RQ2: How do the results of the reported audience uptake relate to those of the multimodal discourse analysis of engagement?

METHODOLOGY
To address these research questions, seven introductions to YouTube science dissemination videos were selected to be analyzed. As argued above, introductions are essential in getting and maintaining audience's attention. In the case of these introductions, they are clearly marked narrative-wise in the videos and they are commonly separated from the rest of the videos by opening bumpers. The data all belonged to medium-length (7-to-12 minute) episodes uploaded to different YouTube channels managed by Public Broadcasting Service (PBS) and SciShow. PBS is an American non-profit organization with a marked educational aim, which produces audio-visual content to be broadcast both on free television and online. SciShow is an audiovisual production company that manages a number of YouTube channels and podcasts to popularize science. Both entities are partially Patreonfunded 1 and produce similar types of science dissemination videos aimed at a general audience. The specific clips were selected because of their engagement potential shown in previous studies ( Once the dataset was selected, the introductions were mutimodally annotated using the software Multimodal Analysis Video (MAV) (O'Halloran et al., 2012). This software allows researchers to mark occurrences of given semiotic resources across the dataset and then obtain quantitative data on their use. For the annotation, we created a new library using Valeiras-Jurado and Bernad-Mechó's (2022) framework for the analysis of multimodality in online videos (see Table 2). Two main types of modes seem to be of relevance in YouTube science dissemination videos: on the one hand, presenters need to be charismatic (Muñoz Morcillo et al., 2016) and this may be achieved multimodally through embodied modes. Although the notion of embodiment is not a clear-cut one (Norris, 2004), for the purposes of this paper, embodied modes refer to those semiotic instantiations which are realized using the presenters' bodies; in this case: spoken language, paralanguage, gestures, gaze, proxemics, head movement, and facial expression. Furthermore, as argued in the introduction, YouTube videos may use a range of visuals (Boy et al., 2020) and editing resources (Muñoz Morcillo et al., 2016) as essential intrinsic features. Thus, filmic modes (semiotic resources that are controlled in the production and postproduction processes) were also considered: type of shot, angle, mise-en-scène (use of background), use of cuts, music, visual prompts, sound and visual effects. Finally, assumed engagement potential was also annotated using Bernad-Mechó and Valeiras-Jurado's (2023) framework for the analysis of engagement strategies in online videos (see Table 3). In particular, the strategies emphasis, attention getting, dialogic involvement, humor, and control of responses were taken into account. After the annotation of the dataset, the tool State Machine within MAV was used to obtain quantitative data regarding the modal density for each of the embodied and filmic resources, i.e., percentages of presence of a given mode within the duration of the clip, as well as frequency, i.e., number of occurrences within a clip. Finally, percentages for the occurrence of engagement strategies were also worked out. After the initial multimodal analysis, a task was carried out following Ruiz-Madrid and Valeiras-Jurado's (2020) methodology to discern HE students' uptake on the effective engagement potential of the introductions. These authors asked students to evaluate research pitches through activities that would foster a reflection on the use of semiotic modes and modal density. In this way, information related to the modal preferences for students was obtained.   In this paper, twenty-three fourth year students from the BA in English at Universitat Jaume I, Spain took part in the study. These students were all between 20 and 25 years old and were enrolled in the class "Usages of Written English" in which notions of multimodal discourse analysis are introduced, so they already knew, for instance, about embodied and filmic modes. The students were asked to fill out a questionnaire consisting of two parts (see Appendix 1). First, the students were shown the seven introductions in the dataset and they were asked to order them according to how likely they were to keep on watching the videos. They were also asked to briefly reason their choices. This part was done individually. Next, the students worked together in eight groups (self-arranged in previous classes) and they were asked to reach a consensus about the engagement potential of the three best-voted introductions. From a qualitative point of view, the students were asked to openly reflect about the modes contributing to making the introductions engaging and about the ways in which engagement occurred. Next, to obtain quantitative results, the students rated the importance they attached to each of the embodied and filmic modes under scrutiny using a 1-5-point Likert scale. The session was followed by a brief explanation of each of the engagement strategies considered in the multimodal analysis (Table 3). Finally, also using a Likert scale, the students rated the extent to which the engagement strategies were present in the introductions and were asked in an open question about the relationship between modes and engagement strategies, i.e., which modes make certain strategies happen.
With these data, several analyses were carried out: the most engaging introductions for the students were discerned; word clouds were created and examples provided to describe the reasons for choosing the most engaging introductions and the use of modes and engagement in the clips; quantitative data were obtained in relation to the relevance of the use of embodied and filmic modes and engagement strategies. Lastly, the students' uptake was compared to the multimodal analyses conducted using MAV and a discussion is provided. To do this, the modal density obtained with MAV is compared to the quantitative data arising from the average results using the Likert scales.

RESULTS AND DISCUSSION
In this section the results of the questionnaire distributed among the students are presented and interpretations of students' answers are provided. The students' answers contained in this questionnaire are further compared with the results of the initial multimodal analyses carried out on the selected introductions, in which embodied modes, filmic modes and engagement strategies were annotated. This comparison offers new insights into the audience's uptake of what is engaging. In other words, the results shed light on the way multimodal engagement strategies are perceived.

ENGAGING THE AUDIENCE IN ONLINE YOUTUBE SCIENCE DISSEMINATION VIDEOS: A LOOK AT THE UPTAKE OF MULTIMODAL ENGAGEMENT STRATEGIES
Vol. 11(2)(2023): 302-327

Ethnographic analysis: questionnaire
The first two questions inquired into what videos were more engaging and why. The students answered these questions individually. The video ranked as the most engaging was the one dealing with Linguistics, followed by the video about Anthropology and the one about Biology in the third place. Among the reasons provided by the students, the most recurrent ones were 1) that the topic is familiar (which explains why Linguistics was ranked the highest) and 2) that the video is visually appealing. Other frequently mentioned reasons were: -the video was easy to understand -it is dynamic -it has an active presenter -it features a great deal of storytelling -it leaves the viewer with the feeling of wanting to know more (i.e., a cliffhanger) The next questions were answered in groups, and at this point the students were asked to focus on the three videos ranked as the most engaging. The first question in this section of the questionnaire inquired about the use of modes and their contribution to making the video engaging. The students' answers showed a clear preference for filmic modes. Interestingly, as the following representative quotes illustrate, they also valued coherence across modes (i.e., that the combination of modes works seamlessly towards the expected communicative aim): GROUP 1: "The drawings, the sounds, moving images, and colors. The fact that they connect each other with the presenter" GROUP 2: "background has movement, not static" The second question asked students to identify the engagement strategies that were used in the three videos. It is important to note that at this point students were familiar with Multimodal Discourse Analysis, but they had not received any explicit training on engagement strategies yet. Students were not provided with a framework for engagement strategies to avoid limiting their perceptions of what is engaging. No intervention from the authors was considered necessary as the students would later on be introduced to the framework of engagement strategies. As a consequence, it was sometimes difficult for them to distinguish clearly between what was a mode and what was a strategy. Be that as it may, their answers emphasized a preference for filmic modes and modal coherence. In addition, they also valued content that was tailored to the assumed knowledge of the audience, as indicated in the following answers: GROUP 2: "keep in mind who your public is (to use one kind of terminologies or others)" GROUP 6: "Easy language to make the viewer understand the video." GROUP 7: "Use of examples to make the topic feel close to the audience." Furthermore, they appreciated a fast pace to retain attention and efforts to interact with the audience: These trends are visually represented in the Word Cloud in Figure 2.

Figure 2. Word Cloud representing the students' view on engagement strategies
The following two questions had quantitative answers, and asked students to quantify the use of modes and strategies respectively, ranking the frequency on a Likert scale where 1 equals never and 5 equals always. Students were provided with the analysis framework introduced in the methodology (Tables 2 and 3) by means of a list of modes and strategies for this quantification. Table 4 below summarizes the results for each of the videos. Interestingly, the video that was ranked as featuring the most intensive use of both modes and strategies is the one dealing with Anthropology (4.5/5 in embodied modes, 4.11/5 in filmic modes, and 4.3/5 in the use of engagement strategies). The video on Linguistics was ranked second, and the video that was ranked third was Biology. These results suggest a strong correlation between modal density (i.e., a more intensive use of modes) and perceived engagement: the videos that were voted as first and second place in the qualitative questions are also ranked as featuring an intensive use of modes in the quantitative questions. In addition, they also reveal a positive bias towards Linguistics as a discipline, which was ranked as the most engaging in the qualitative questions even if it does not feature the highest modal density when it comes to the quantitative results. This is not surprising, since the students are majoring in English Studies and they naturally feel this topic as closer to their interests. Likewise, a certain lack of multimodal awareness can be perceived in the fact that modal density does not seem to have a determining effect in students' initial appreciations of the videos and is only noticed after explicit training (i.e., once students are provided with an analysis framework). This is notable in the fact that embodied modes are quantified as relatively salient in the quantitative results (averaging 4.5/5 in the case of 315  Table 4. Importance attached to embodied and filmic modes and engagement strategies for the three best ranked introductions The final question asked students to reflect on the modal realizations of strategies. They seem to easily link certain strategies to certain modes, noticing that some modes are somehow specialized for specific strategies. In other words, they show awareness of the affordances of each semiotic mode. For example, they associate the strategy attention getting with filmic modes, the strategy emphasis with both filmic and embodied modes, while the strategies dialogic involvement, control of responses and humor are linked to embodied modes. The following quote from the questionnaire illustrates this finding: GROUP 2: "To emphasize we use embodied modes, but we can also use sounds and visual effects. For getting attention filmic modes are so relevant, such as visual prompts and sound and visual effects. Dialogic involvement and control of responses are aspects covered by the spoken language, as humour. However, humour can also be effected by facial expressions." In fact, these views are very much aligned with previous multimodal studies of engagement in science dissemination videos (Bernad-Mechó & Valeiras-Jurado, 2023;Xia & Hafner, 2021) and with the multimodal analysis of the excerpts carried out by the authors. A detailed interpretation of the results of this analysis taking into account the students' views follows in the next section. Table 5 summarizes the results in terms of modal density of the multimodal analysis carried out on the two excerpts evaluated by students as the most engaging: Linguistics and Anthropology. Based on the annotations with MAV, the density of use of each mode and each strategy is calculated as a percentage relative to the total duration of the clip. Also, for some modes that cannot be quantified in time (tempo, cuts, sound effects and visual effects), the number of instances per minute are considered.  4 Please note that many categories within embodied and filmic modes and within engagement strategies may overlap. For example, it is possible to raise eyebrows and smile and the same time, or have a text-based and an image-based visual prompts at the same time. Similarly, a given instance may be both humorous and emphatic. Consequently, total percentages do not refer to the sum of each of the categories but to the total amount of time in which one specific mode is used.  Table 5. MAV analysis of modal density in the use of embodied and filmic modes, and engagement strategies

01-LING
The results show that the video featuring the highest modal density is the one on Anthropology. This is consistent with the answers provided by the students, who rated the use of embodied modes in this video at 4.5/5 and the use of filmic modes at 4.1/5. Overall, the students rated the use of all modes as more relevant than our multimodal analysis shows in terms of density. Even acknowledging the fact that the multimodal analysis allows for more accuracy than a 1-5 Likert scale, some differences are remarkable. For example, the multimodal analysis shows a 41.25% use of gestures in the Anthropology video, while the students rate it as 4.5/5 (which would correspond to 90% in terms of how relevant they are). Similarly, according to the multimodal analysis, head movements are used 21.25% of the time, while the students rate it again as 4.5/5. Facial expression follows a similar trend (37.97% vs. 4.6/5). This seems to indicate that there are some modes that are perceived as more salient regardless of their actual density in the ensembles. The trend is still present concerning the use of filmic modes, although there seems to be more agreement in this case. For example, the analysis reveals 97.47% use of visual prompts in the Anthropology video, which students rate as 5/5 in terms of importance. This high presence of visual prompts is also consistent with the qualitative answers from students, which show a preference for visual elements in videos. Interestingly, the trend is reversed in the case of music, which is present 98.86% of the time in Linguistics and 98.77% in Anthropology, but which students rate as only 3.25/5 and 3.75/5 respectively. This seems to indicate that music as a mode takes a background role and that its contribution to engagement is more notable in combination with other modes, when it becomes part of a multimodal ensemble. When the figures resulting from the analysis of the two videos are compared, the difference in modal density pointed out earlier becomes even clearer. Regarding embodied modes, the results of the analysis indicate that there is approximately twice as much use of facial expressions (20.29% vs. 37.97%), head movements (10.23% vs. 21.25%) and gestures (21.84% vs. 41.25%) in the Anthropology video, and four times more prominence (11.49% vs. 38.75%). These differences across videos, although also reflected in the students' answers, are somehow more diluted. Concerning filmic modes, the analysis also reveals a considerably higher use of visual effects (10.34 instances/min. vs. 16.50) and visual prompts (79.55% vs. 97.47%) in the Anthropology video, a difference which the students note in their answers to approximately the same extent as in the analysis.
From the previous results it may be argued that, within the modes that are perceived through eyesight, filmic modes seem to play a leading role in both videos. Students note a higher presence of visual prompts and visual effects than gestures, head movements and facial expression, which is also confirmed by the multimodal analysis. They also perceive differences in modal density with more accuracy when dealing with visual, filmic modes. On the other hand, within aural modes, paralanguage (embodied) is ranked higher than music (filmic), and also higher than sound effects in the case of Anthropology, even though the multimodal analysis shows a higher presence of music in both videos.
When it comes to the overall use of strategies, the analysis shows that it is the Linguistics video that features a slightly higher percentage of engagement (76.14% vs. 65%). The students rate overall engagement in the Linguistics video at 3.9/5. This would equal to 79%, which is fairly close to the 76.14% retrieved in the analysis. The results for the Anthropology video are more divergent: the multimodal analysis rates it at 65%, while the students rate it at 4.37/5 (which would equal to 87.4%). Interestingly, the answers to these quantitative questions are not consistent with the qualitative answers: students voted the video on Linguistics as the most engaging (which the multimodal analysis confirms), but they ranked the use of engagement strategies as slightly higher in Anthropology.
What these results seem to suggest is that quantity, both in terms of modes and of strategies, is not the only factor determining engagement in online videos. There seems to be a threshold that influences engagement and once this threshold is reached, students may perceive videos as engaging. For instance, the engagement ratings for the introduction to the Biology video are rather low -3.18/5 -even though it was regarded as the third most engaging introduction in the dataset. Once the videos are deemed as attractive, it is probably the type of strategies that are employed and their modal realizations that makes a difference instead of their modal density.
A comparison of the students' answers with the results of the multimodal analysis also reveals some interesting differences. One is that the students identified some humor in the Linguistics video that the analysis does not show. This is probably due to a disagreement as to what constitutes humor, which is indeed an elusive concept, and points to the need to clearly define taxonomies. Another remarkable difference is that the students see more attention getting in the Anthropology video, while the analysis reveals a higher presence in the Linguistics video. Once again, a possible explanation for this is that attention getting relies more on visual, filmic modes in Anthropology, and they seem to have a more notable effect than embodied and aural modes. This claim, however, should be further explored in future research.
Concerning the density of each strategy, in the case of Linguistics the students list attention getting, emphasis, and dialogic involvement in order of frequency, which is consistent with the multimodal analysis. In the case of Anthropology, emphasis and attention getting are also the two most frequent strategies, also according to the analysis. However, the students find more humor, while the analysis indicates more dialogic involvement. Once more, lack of clarity regarding the taxonomy might have influenced this result, but another plausible explanation is that humor, although less frequent than dialogic involvement, is perceived as more salient.

CONCLUSIONS
This paper has contributed to expanding previous research on YouTube science dissemination videos by exploring the audience uptake in relation to the multimodal nature of such videos and to engagement. One of its most significant results is the the fact that visual modes are revealed as a key aspect to get and maintain attention. This connects the relevance of modes and engagement potential. Also, when ethnographic and multimodal analyses are compared, the impact of modes on the audience is further elicited. In particular, the results show that, in general, visual filmic modes are seen as more salient than aural filmic modes. Within embodied modes, the trend seems to be reversed: it is aural modes such as paralanguage that have the greater perceived impact on the audience. However, when modal density is examined in a fine-grained multimodal analysis, discrepancies are found between the ways in which some modes are perceived and the actual density of such modes. This is particularly noticeable in embodied modes, which are seen as highly relevant although their actual density is not as high. This is referred to as "modal intensity" by Norris (2004) in relation to instances where one mode takes on a special role and stands out over the rest of modes. The same concept of intensity seems to apply to engagement strategies. An example would be the case of humor, which, although not relevant in terms of density, is perceived as particularly noteworthy. Also, the modal realization of each strategy using filmic modes in a coherent manner seems to be a more determining factor than density; this is the case of attention getting as this strategy is remembered more clearly when realized through modes like visual prompts and visual effects. Finally, the quantitative analysis of the students' impressions points towards the existence of a threshold for engagement: the introductions need to reach a minimum degree of interest for the students to perceive them as fully engaging. Once the threshold is overcome, the modal realizations of the strategies and the salience of each mode appear to play a more determining role.
These results have multiple pedagogical applications. In a context of constant internationalization of HE, where digital literacy is of paramount importance (de Wit & Hunter, 2015), being aware of the digital affordances of online-based materials becomes crucial. The possibilities of science dissemination videos are ample in EMI and ESP contexts (Girón-García & Fortanet-Gómez, 2023)  engagement capacity is hereby demonstrated. Studies on the use of science dissemination videos in educational contexts could inform teacher and researcher training programs. This would, in turn, contribute to building the multimodal literacy of the participants in such programs, and to achieving more efficient communication in EMI and ESP environments. A previous examination on behalf of the lecturers of the use of semiotic resources engagement purposes in science dissemination videos is necessary for this purpose. This could help them make better educated choices insofar as the pedagogical use of these videos is concerned. Likewise, science dissemination videos may be used as models that can help students transition to a role of research disseminators.
Similar to all multimodal research, this study is limited in the scope of analysis: only seven introductions were analyzed. Multimodal methodologies require highly detailed annotations and the lack of tools for automated analysis prevents the use of larger corpora. Furthermore, it is important to point out that the background of the students participating in this research might have influenced their perceptions on engagement. In this regard, further studies including more varied audiences might help verify these results. Finally, this study has focused on the perceptions of engagement and the modal density of the introductions to science dissemination videos, but has not really taken into consideration the verbal content of such introductions. This line of research could benefit from a deeper narrative analysis to unveil how engagement is raised and maintained, not only in the introductions, but also throughout the entire videos.