240
Views
0
CrossRef citations to date
0
Altmetric
REGULAR ARTICLES

The role of semantically related gestures in the language comprehension of simultaneous interpreters in noise

ORCID Icon, ORCID Icon & ORCID Icon
Pages 584-608 | Received 03 Apr 2023, Accepted 16 Apr 2024, Published online: 29 Apr 2024

ABSTRACT

Manual co-speech gestures can facilitate language comprehension, especially in adverse listening conditions. However, we do not know whether gestures influence simultaneous interpreters’ language comprehension in adverse listening conditions, and if so, whether this influence is modulated by interpreting experience, or by active simultaneous interpreting (SI). We exposed 24 interpreters and 24 bilinguals without interpreting experience to utterances with semantically related gestures, semantically unrelated gestures, or without gestures while engaging in comprehension (interpreters and bilinguals) or in SI (interpreters only). Tasks were administered in clear and noisy speech. Accuracy and reaction time were measured, and participants’ gaze was tracked. During comprehension, semantically related gestures facilitated both groups’ processing in noise. Facilitation was not modulated by interpreting experience. However, when interpreting noisy speech, interpreters did not benefit from gestures. This suggests that the comprehension component, and specifically crossmodal information processing, in SI differs from that of other language comprehension.

1. Introduction

Manual co-speech gestures generally facilitate spoken language comprehension both in monolingual first language (L1) and second language (L2) settings (Dahl & Ludvigsen, Citation2014; Sueyoshi & Hardison, Citation2005; see Hostetter, Citation2011, for a review), although there is evidence of some individual differences in sensitivity to gestures (e.g. Özer & Göksun, Citation2020). Gestures have been shown to be especially conducive to comprehension in challenging language use, such as in adverse listening conditions (Drijvers & Özyürek, Citation2017, Citation2020; Holle et al., Citation2010; Obermeier et al., Citation2012; Rogers, Citation1978). A particularly challenging type of language use is Simultaneous Interpreting (SI) which involves simultaneous processing and comprehension of (spoken) language input in one language (Seeber, Citation2017) and production of language output in another. As such, SI is considered an instance of extreme language use (Hervais-Adelman et al., Citation2015). Recently, the rapid development of Remote Simultaneous Interpreting (RSI) has led to increasingly adverse listening conditions for interpreters, specifically because of poor sound quality (Caniato, Citation2021; CAPE, Citation2021; Garone, Citation2021). It stands to reason that noisy speech (i.e. speech that is more difficult to comprehend due to disturbances such as background noise or parallel speech) would make SI even more challenging, and that gestures might facilitate language comprehension in that context. Thus, one could expect simultaneous interpreters to benefit from co-speech gestures during language comprehension just like L1 and L2 speakers do, especially in adverse listening conditions. However, the fact that interpreters also produce verbal output while comprehending may modulate the effect of gestures on language comprehension. For instance, since interpreters are already engaging in two tasks, they may have fewer resources left to process visual cues. That said, no empirical data is available on the potential influence of gestures on language comprehension in SI in adverse listening conditions.

In this study, we therefore explored whether gestures facilitate language comprehension during SI in noisy speech, as compared to clear speech. Specifically, we compared comprehension of utterances accompanied by semantically related (representational) and semantically unrelated (in this case pragmatic) manual co-speech gestures during SI and during comprehension in simultaneous interpreters and in a bilingual group with no interpreting experience.

1.1. The role of gestures in language comprehension

A large body of evidence suggests that manual co-speech gestures, the natural manual movements language users make when they speak, can be communicative and facilitate spoken language comprehension in L1 settings (for reviews, see Hostetter, Citation2011; Kelly, Citation2017). Some theories suggest that these gestures are co-orchestrated with speech (Kendon, Citation2004), and even form an integral part of language (Kelly, Citation2017; Kelly, Ozyürek, et al., Citation2010; McNeill, Citation1985, Citation1992, Citation2005). Empirical priming studies lend support to this last theory. Participants’ processing of speech-gesture pairs that are either congruent (e.g. speech: “chop”; gesture: chop) or incongruent (e.g. speech: “chop”; gesture: twist) have revealed a bidirectional influence of gesture and speech, even when one modality is irrelevant to the experimental task (Kelly, Creigh, et al., Citation2010; Kelly et al., Citation2015). In these experiments, incongruent (and irrelevant) gestures disrupted comprehenders’ ability to identify speech targets, and incongruent (irrelevant) speech disrupted their ability to identify gesture targets. Crucially, the influence of gesture and speech was comparable across modalities. The Integrated-Systems Hypothesis, which was developed on this empirical basis, posits that gestures necessarily influence the processing of speech in comprehension, and that speech necessarily influences the processing of gestures (Kelly, Creigh, et al., Citation2010). This close link between gesture and speech has been further corroborated by research on the integration of speech and gesture in the brain. Event-related potentials (ERP) studies suggested that the neural correlates of semantic processing of speech-accompanying gestures are similar to those of spoken words (Emmorey & Özyürek, Citation2014). For instance, Özyürek et al. (Citation2007) demonstrated that gestures and words that are incongruent with the previous sentence context elicit identical N400 effects (the N400 is a component associated with semantic processing, more specifically with the degree of integration difficulty to preceding context – the easier a word can be integrated into a sentence, the smaller the amplitude of the N400; Holle & Gunter, Citation2007). Moreover, the processing of gesture and speech temporally overlaps during language comprehension, suggesting that the brain treats gesture and speech similarly during semantic integration (Özyürek et al., Citation2007). fMRI studies have also shown that gesture perception in a speech context involves the recruitment of brain areas known to be sensitive to semantic processing of linguistic information (Özyürek, Citation2018). More broadly, a network of brain regions that integrates gesture and speech has been identified (see Özyürek, Citation2014, for an overview). Taken together, these results support the assumption of a tight link between gestures and speech in L1 comprehension.

Evidence suggests that gesture and speech also interact with each other in L2 comprehension. In Sueyoshi and Hardison (Citation2005), access to gestures was associated with better language comprehension in low-proficiency L2 speakers, although it did not necessarily benefit comprehension in advanced L2 speakers. That said, the study did not include native speakers as a baseline, which makes it difficult to interpret. In Dahl and Ludvigsen (Citation2014), L2 speakers presented language comprehension patterns comparable to native speakers when they had access to gestures. Investigating the processing of meaning derived from gestures, Ibáñez et al. (Citation2010) confirmed the neuronal integration of visual cues and semantic processing in their L2 population (both highly proficient and low proficient speakers), and showed that highly proficient L2 speakers presented neuronal responses similar to those of native speakers, while there were only modest effects in the low proficiency group. Another study on scientific information uptake in advanced L2 speakers (Kang et al., Citation2013) suggested that this population relied more on representational gestures than native speakers and that it was able to compensate for information in the script that it might not fully follow. All these studies tested different proficiency levels and relied on different methodologies, which may explain the divergent results; to summarise, although gestures might not play exactly the same role as in L1 settings, they do seem to contribute to L2 comprehension.

However, most of these studies, both in L1 and L2 settings, investigated representational gestures, that is, gestures representing the properties of entities and events talked about, drawing on iconicity or similarity of shape, size, and movement (McNeill, Citation1992). For example, a speaker might make a large, circular gesture while saying, “It was a big, round one” (Church et al., Citation2007, p. 138). Such representational gestures typically express information that is semantically related to concurrent speech. Other gestures can express pragmatic aspects of speech such as rhythm, speech acts, stance, or aspects of discourse structure (Kendon, Citation1995, Citation2004). For example, a speaker might rotate both forearms outwards with extended fingers to a “palm up” position to display incapacity, powerlessness or indifference (Debras, Citation2017). Such pragmatic gestures have a more complex semantic relationship to concurrent speech than representational gestures, and their impact on semantic comprehension is less clear.

1.1.1. Gestures in adverse listening conditions

Co-speech gestures, particularly representational ones, have also been shown to be conducive to comprehension in adverse listening conditions. Manipulating noise levels, Rogers (Citation1978) showed that listeners attend more to (all) gestures when the spoken channel is not sufficient for comprehension. This was corroborated by an experimental study by Rimé et al. (Citation1988), cited in Gullberg and Holmqvist (Citation1999), using spoken material in languages more or less intelligible to the participants and investigating their fixations on the speaker’s gestures. The results indicated that the quality of speech influences listeners’ visual attention to gestures: the noisier the speech channel or the lower the comprehension, the more gestures were fixated. In a study on iconic gestures in noise, Holle et al. (Citation2010) found that participants who had access to speakers’ gestures displayed significantly better speech comprehension, and that the benefit of gestures was larger with a moderate signal-to-noise ratio compared to a good signal-to-noise ratio. What is more, visual cues benefitted comprehenders at all tested levels of signal-to-noise ratio, even when speech was barely intelligible.

Drijvers and Özyürek (Citation2017) found that representational gestures enhanced speech comprehension in native speakers in adverse listening conditions. Listeners benefitted from having both visible articulatory information and representational gestures present, as compared with having just visible articulatory information present, or having only auditory information present. In a similar set-up, Drijvers and Özyürek (Citation2020) found that the enhancement from gestures was similar, albeit smaller, for non-native listeners. In an ERP study, Obermeier et al. (Citation2012) found that disambiguating gesture information was taken into account by participants when ambiguous speech was embedded in babble noise but not in clear speech. The study reported in Drijvers et al. (Citation2019) showed a clear gestural enhancement effect in both native and non-native participants (of upper intermediate level), especially in degraded conditions. This enhancement was stronger for native listeners. While non-native listeners gazed more at (representational) gestural information than native listeners, interestingly, their gaze allocation to gestures did not predict gestural benefit during degraded speech comprehension. This suggests that visual attention to gestures and comprehension do not necessarily go hand in hand during L2 comprehension of noisy speech. In sum, (representational) co-speech gestures seem to facilitate language comprehension in adverse listening conditions for a range of different language users, even if the strength of the effect differs depending on skills.

1.2. Multimodality in SI and RSI

SI combines concurrent spoken language comprehension and production in two distinct languages. Arguably, its language comprehension component is likely to share common features with other (mono- and bilingual) language comprehension (Seeber, Citation2017). If it does, then interpreters would be expected to benefit from access to speakers’ co-speech gestures just like other language users. But it is also possible that concurrent speech production typical of SI modulates any potential effect of co-speech gestures on language comprehension. Some previous studies have compared listening to simultaneous interpreting tasks, revealing that word recall was poorer (Gerver, Citation1974), pupil dilation higher – implying a more difficult task – (Hyönä et al., Citation1995) and comprehension less accurate (Gieshoff, Citation2018) during SI as compared to during “pure” listening comprehension (but see Isham, Citation1994).

Simultaneous interpreters process auditory input while also having access to speakers’ visual cues, including their gestures (Galvão, Citation2013; Gieshoff, Citation2018; Seeber, Citation2017). In fact, they believe that seeing the speaker is necessary for successful interpretation (Bühler, Citation1985). More specifically, they consider manual gestures and facial expressions to be the most important visual cues to facilitate understanding and provide emphasis (Rennert, Citation2008). What is more, the fact that interpreters must be able to see the speakers, including their gestures, is enshrined in the working conditions issued by the International Association of Conference Interpreters (AIIC, Citation2007) and in ISO standards (International Organization for Standardization, Citation2016b, Citation2017). However, the potential role of gestures in SI remains to be examined empirically.

There is some evidence concerning the impact of seeing speakers (including their gestures but also proxemics, facial expressions, etc.) on SI (Anderson, Citation1994; Bacigalupe, Citation1999; Balzani, Citation1990; Rennert, Citation2008; Tommola & Lindholm, Citation1995). In most of these studies, however, seeing the speaker was not associated with a significant effect on performance. Additionally, the set-ups used did not allow us to ascertain whether participants were (visually) attending to the visual stimuli nor which of the many visual cues present were processed. Therefore, it remains unclear whether interpreters specifically process speakers’ co-speech gestures, and if they do, whether co-speech gestures enhance language comprehension.

The issue has recently become very topical. In the wake of the Covid-19 pandemic, Remote Simultaneous Interpreting (RSI) has expanded very rapidly, leading to adverse listening conditions for interpreters (Caniato, Citation2021; CAPE, Citation2021). Practitioners are now increasingly confronted with echo and loud noise (Garone, Citation2021). At the same time, visual access to the speaker is no longer guaranteed (Spinolo & Chmiel, Citation2021), with speakers often being asked to turn off their camera when the sound quality is inadequate, although interpreters request continued visual access to speakers (AIIC, Citation2020). While these adverse listening conditions might be a temporary issue eventually solved by technological solutions, RSI is likely to remain part of the interpreting landscape even after the Covid-19 pandemic (Seeber & Fox, Citation2021). That is, noisy speech will continue to inconvenience simultaneous interpreters. Crucially, there is currently a gap in the literature concerning the potential impact of co-speech gestures on SI of noisy speech. It is therefore of practical importance to investigate whether co-speech gestures can facilitate interpreters’ language comprehension in this context, as they might constitute an important tool for interpreters during RSI.

1.2.1. Using eye-tracking to investigate multimodal processing in SI

Eye-tracking techniques allow for minimally-invasive recordings of perceivers’ visual behaviour towards gestures (Gullberg & Holmqvist, Citation1999). In gesture studies, eye-tracking has been used to probe the perception and processing of gestures (Beattie et al., Citation2010; Drijvers et al., Citation2019; Gullberg & Holmqvist, Citation1999, Citation2006; Gullberg & Kita, Citation2009; Özer et al., Citation2023). Gullberg and Holmqvist (Citation1999, Citation2006) notably established that in face-to-face interaction addressees look at a speaker’s face most of the time while gestures are mainly perceived through peripheral vision. That said, some characteristics, such as gestural holds and speakers’ fixations of their own gestures, attract addressees’ overt fixations more frequently.

In interpreting studies, eye-tracking has been used to investigate multimodality in SI (Seeber, Citation2011; Stachowiak-Szymczak, Citation2019). In an eye-tracking experiment, Stachowiak-Szymczak (Citation2019) found that visual (pictures in that case) and auditory input were integrated into SI. Seeber (Citation2011) conducted an eye-tracking experiment relating interpreters’ fixations on visual information to speech. Small numbers (from 1 to 10) were manually gestured as they were uttered in speech, while large numbers were displayed on a screen next to the speaker. Although the face dominated as a locus of attention during SI, the data revealed that interpreters do attend to both gestured numbers and numbers displayed on a screen. More recently, Arbona et al. (Citation2023) used eye-tracking to probe the influence of representational gestures on comprehension during SI. Although the speaker’s face attracted the bulk of fixations, both experienced interpreters and bilingual controls attended to the speaker’s gesture space, and overt visual attention was modulated by the semantic speech-gesture relationship, with gestures semantically related to speech attracting more attention than semantically unrelated gestures. They also found that semantically related gestures were associated with faster RTs compared to semantically unrelated gestures, and this effect was not modulated by the SI task or interpreting experience. They concluded that co-speech gestures are part and parcel of language comprehension in bilingual processing, also during SI.

To summarise, gestures and speech are tightly linked and influence each other, and representational gestures have been shown to facilitate language comprehension, especially in adverse listening conditions. SI involves concurrent language comprehension and production in two distinct languages. If representational gestures have the potential to facilitate language comprehension during SI, interpreters should be expected to benefit from access to such gestures like other perceivers. The recent rise of RSI, which comes with noisy speech, raises the question of whether co-speech gestures may facilitate language comprehension also during SI in noisy speech. The studies on visual access in SI hitherto conducted have presented a host of concomitant visual cues, and/or have not isolated the contribution of manual gestures. They have therefore not allowed us to establish a direct link between gestural visual cues and language comprehension in SI. However, the study by Arbona et al. (Citation2023), specifically investigating semantically related co-speech gestures, suggests that these gestures do (positively) influence language comprehension during SI. However, that study probed clear speech, meaning that we lack empirical data on the potential influence of co-speech gestures on interpreters’ language comprehension in noisy as compared to clear speech.

1.3. The current study: overall research question and rationale

The current study aimed to investigate the potential facilitatory effects of semantically related representational gestures on simultaneous interpreters’ language comprehension in noisy as compared to in clear speech. The first experiment looked at task-contingent differences in the processing of audiovisual signals. It compared how interpreters comprehend (and integrate) audiovisual signals during SI and comprehension. More specifically, it examined the effect of gestures that are semantically related to concomitant speech (representational gestures; target gesture condition), gestures that are semantically unrelated to speech (pragmatic gestures; control gesture condition), and the absence of gestures (no-gesture condition) on comprehension in noisy as compared to clear speech. We expected that when gestures are semantically related to speech, their combination would be more easily processed than when gestures are not semantically related to speech, since in the latter case, the speech-gesture meaning relationship is less clear.

The second experiment looked at experience-contingent differences in processing audiovisual signals in noisy and clear speech. It compared processing during comprehension in two groups with different SI experience: an experimental group of professional simultaneous interpreters and a comparison group of bilinguals (professional translators). The aim was to disambiguate the findings from the first experiment and to determine whether interpreting experience modulates the processing and integration of gestures.

We tested two groups of 24 participants each. This sample size is in line with the interpreting studies literature. Moreover, given that the International Association of Conference Interpreters counts about three thousand members worldwide, all languages considered (AIIC, Citation2019b), it enables us to draw conclusions about the studied population. Moreover, a similar previous study using the sample size yielded significant results (Arbona et al., Citation2023).

2. Experiment 1

In Experiment 1, we asked the following questions:

  1. Do simultaneous interpreters integrate gestural information differently during comprehension in noisy speech as compared to clear speech? 2. Does the task performed (SI vs. comprehension) affect this integration differently in noisy compared to clear speech? 3. Do simultaneous interpreters visually attend to gestures differently in noisy compared to clear speech? 4. Does that visual attention correlate with their comprehension in noisy compared to in clear speech?

We hypothesised a facilitatory influence of semantically related gestures on comprehension in clear speech, and a potentially larger facilitative effect of semantically related gestures on comprehension in noisy speech; i.e. more integration of semantically related gestures during noisy speech. We also predicted potential differences in gesture integration depending on task, and potential differences in visual attention to gestures depending on noise condition.

2.1. Materials and method

2.1.1. Participants

Twenty-four professional conference interpreters participated in the studyFootnote1 (see ). They were recruited via an e-mailFootnote2 describing the eligibility criteria. Interested individuals were invited to sign up for the experiment. Participants completed an adapted version of the Language Experience and Proficiency Questionnaire (Marian et al., Citation2007). All participants had normal or corrected-to-normal vision and reported no language disorders. Participants’ L1 was Spanish (A languageFootnote3), their L2 English (A, B or C languageFootnote4). Twenty-two of the 24 participants were either members of the International Association of Conference Interpreters (AIIC), or accredited by international organisations such as the United Nations, or both. The two remaining participants were professional conference interpreters based in Geneva.

Table 1. Background information provided in the language background questionnaire.

All participants gave written informed consent. The experiment was approved by the Ethics Committee of the Faculty of Translation and Interpreting at the University of Geneva. No participant was involved in the norming of the stimuli.

2.1.2. Task and materials

Participants were asked to either simultaneously interpret (SI task) or to watch (comprehension task) short video clips of a speaker uttering two sentences (e.g. “Look at the terrace! Last Monday, the girl picked the lemon”). The second sentence was either accompanied by a semantically related gesture, a semantically unrelated gesture, or no gesture. The video clip contained either clear speech or noisy speech. Participants then saw two drawings of an action verb on a screen: one target (e.g. picking a lemon) and one distractor (e.g. squeezing a lemon).

In a forced-choice task, participants had to select the drawing corresponding to the video by pressing a button. This picture-matching task was used to probe processing during both comprehension and SI. Accuracy and response times were recorded – there was no time limit to press the button. Since drawings express meaning differently from both speech and gesture, they enabled us to implicitly probe gesture content. Using eye-tracking, we also measured overt visual attention to gestures, operationalised as total visual dwell time (i.e. the total duration of gaze on a particular area of interest) on the speaker’s gesture space. This corresponded to a pre-defined area of interest in front of the speaker going from the speaker’s shoulders to her hips, since this is where speakers usually gesture (McNeill, Citation1992, p. 86). Monitoring what participants look at during the stroke (the meaningful part of the gesture, which displays the shape and the dynamics of the movement in the clearest way; Kendon, Citation2004) enabled us to examine the extent to which overt visual attention to gestures correlated with response accuracy and reaction times.

2.1.2.1. Speech

We created a first set of 30 utterances following one of two patterns: adverbial phrase of time, agent, action and patient (e.g. “Last Monday, the girl picked the lemon.”), or adverbial phrase of time, agent, action, preposition and indication of location (e.g. “Two weeks ago, the boy swung on the rope”). The main verb was the target word. A short introductory sentence (e.g. “Look at the terrace”) was added to ensure interpreters would start interpreting simultaneously before the onset of the target verb. Sentences were chosen so that it would be difficult to predict the upcoming words. Although they do not necessarily reflect actual conference interpreting settings, they allowed us to prevent interpreters from using interpreting strategies, e.g. resorting to a set of usual, often-repeated phrases for opening or closing a meeting. Secondly, we had to ensure that sentences would be comparable across a series of psycholinguistic criteria (e.g. verb frequency, sentence structure, length and plausibility, as described below). It was also important to build sentences which referred to situations that could be realistically drawn, as we used pictures to assess comprehension. This made it difficult to use the kind of abstract content often discussed in conference interpreting settings.

We then created a second set of 30 sentences replacing the verbs with equally plausible target verb candidates (e.g. “Last Monday, the girl squeezed the lemon.” or “Two weeks ago, the boy climbed up the rope”). The resulting 60 sentences (word count: M = 11.7, SD = 0.9) were assigned to two matched stimulus lists. Target verb frequencies were obtained from the Corpus of Contemporary American English (Davies, Citation2008) and two lists of sentences were created to balance verb frequency. Mean verb frequency amounted to 61,417 (SD = 68,789) in List A and to 62,822 (SD = 79,292) in List B, with no significant difference across lists, p > .9. Sentence plausibility was rated separately by 26 Spanish speakers and 28 English speakers (n raters = 54) on a 6-point Likert-type scale (from 1, “very implausible” to 6, “very plausible”). Mean sentence plausibility (the average of the Spanish and the English rating) was 2.9 (SD = 1.0) in List A and 2.9 (SD = 0.8) in List B, with no significant difference across lists, p > .6. Raters who assessed sentence plausibility did not participate in the norming of pictures. All sentences used in the experiment are reported in Appendix B.

2.1.2.2. Gestures

We scripted manual gestures to accompany the sentences in the semantically related gesture condition and in the semantically unrelated gesture condition. Semantically related gestures were representational gestures corresponding to the action content of the target verb. For example, for “squeeze the lemon”, the speaker performed a squeezing gesture: “right hand: hand half open in front of speaker, palm facing down, then fist closing with a rotation of the wrist” (see (a)). Semantically related gestures depicted path rather than manner of movement in motion verbs (i.e. they showed only a trajectory, e.g. going up, but did not provide information about manner of motion such as wiggling the fingers to indicate climbing).

Figure 1. Examples of gestures. (a) Character-viewpoint semantically related gesture for “squeezing”. (b) Semantically unrelated gesture for “squeezing”. (c) Character-viewpoint semantically related gesture for “slicing”. (d) Observer-viewpoint semantically related gesture for “swinging”.

Figure 1. Examples of gestures. (a) Character-viewpoint semantically related gesture for “squeezing”. (b) Semantically unrelated gesture for “squeezing”. (c) Character-viewpoint semantically related gesture for “slicing”. (d) Observer-viewpoint semantically related gesture for “swinging”.

Semantically unrelated gestures were instantiated as pragmatic gestures (Kendon, Citation2004) with no direct semantic relationship to the target verb. We used six manual handshapes: the Grappolo family (albeit for just one practice item; Kendon, Citation2004, pp. 229–238), the Open Hand Prone and Open Hand Supine families (Kendon, Citation2004, pp. 248–283), the “slice gesture”, the “power grip” (Streeck, Citation2008), and the “flick of the hand” (McNeill, Citation1992, p. 16). For example, for “squeeze the lemon”, the speaker performed the gesture called “Open hand prone (‘palm down') – vertical palm”: after a backward and upward movement, the speaker’s palm and forearm end up in a vertical position and the palm of the hand faces directly away from the speaker ((b)).

All 60 sentences were recorded audiovisually by a right-handed female speaker of North-American English in a sound-proof recording studio under controlled lighting conditions. Three versions of each stimulus sentence pair (including a short introductory sentence as described above) were recorded: one in which the speaker did not gesture while uttering the sentences (no-gesture condition), one in which the speaker performed a pragmatic hand gesture while uttering the target verb (semantically unrelated gesture condition), and one in which she performed a representational hand gesture while uttering the target verb (semantically related gesture condition). Sentences were read from a prompter placed in front of the speaker. The intended gestural movement for each clip was described to the speaker but she was asked to perform her own version of them so that they would be as natural as possible. All gestures were performed with the speaker’s dominant (right) hand. The mean duration of the audiovisual recordings was 4.8 s (SD = 0.4, range 3.7-5.9). Horizontally flipped versions of each video clip were created using Adobe Premiere Pro, so that the speaker also seemed to be gesturing with her non-dominant hand. This was to balance out a potential right-hand bias.

Once the clips had been recorded, they were played frame by frame using AvidemuxFootnote5 and gestures were coded for several features to ensure that these were evenly distributed across the lists and conditions (see Appendix A).

Semantically related gestures were categorised either as “character-viewpoint” or “observer-viewpoint”, following McNeill (Citation1992). A character viewpoint incorporates the speaker’s body into gesture space, with the speaker’s hands representing the hands of a character: e.g. the speaker might move her hand up and down as if she were slicing meat herself ((c)). In contrast, an observer-viewpoint gesture excludes the speaker’s body from gesture space, and hands play the part of the character as a whole: the speaker might move her hand from left to right with a swinging movement to depict a character swinging on a rope ((d)). Both lists contained the same number of observer-viewpoint and character-viewpoint gestures.

Gestures were further coded for their timing relative to speech to ascertain that the stroke coincided temporally with the spoken verb form. Verb duration was determined for each video clip by identifying verb onset and offset (as well as preposition offset for the sentences including adverbial phrase of time, agent, action, preposition and indication of location; for this category of sentences, preposition duration was counted in verb duration). Mean verb duration was comparable between semantically related items (M = 508 ms, SD = 116) and semantically unrelated items (M = 510 ms, SD = 126). The mean verb duration of no-gesture items was significantly shorter (M = 460 ms, SD = 126), however, than both semantically related gesture items (p < .05) and semantically unrelated gesture items (p < .05) possibly because the coordination of speech and gesture slowed down production. Verb duration was comparable across lists.

Gesture stroke duration was determined for all gestures and included post-stroke-holds, when present. Mean stroke duration did not differ significantly between semantically related (M = 583 ms, SD = 122) and unrelated gestures (M = 618 ms, SD = 149). Stroke duration did not differ significantly either across lists (cf. Appendix A).

We further coded gestures for “single” or “repeated stroke”. In single stroke gestures the stroke is performed once, while in repeated gestures the stroke is repeated twice. Semantically related and unrelated items were comparable: both categories included 67% single stroke gestures (40 items) and 33% repeated stroke gestures (20 items). Lists were also balanced.

The place of gestural articulation was coded following an adapted version of McNeill’s schema of gesture space (McNeill, Citation1992, p. 89) as in Gullberg and Kita (Citation2009). The “center-center” and “center” categories were merged into one “center” category, while the “upper periphery”, “lower periphery”, etc., were merged into one “periphery” category. No gestures were performed in the “extreme periphery” category. Place of articulation was thus coded as either “center”, “periphery” or “center-periphery”. Lists were balanced in terms of place of articulation, but semantically related and unrelated items differed, as semantically unrelated items were mostly articulated in the “center-periphery” area (65%, 39 items) whereas semantically related gestures were mostly performed centrally (60%, 36 gestures).

Gestures were also coded for the complexity of trajectory. Straight lines in any direction were coded as a “simple trajectory” and more complex patterns were coded as “complex trajectories” (e.g. when the stroke included a change of direction). Semantically related and unrelated items were comparable: both categories included 80% simple trajectories (48 gestures) and 20% complex trajectories (12 gestures). Lists were also balanced.

All gestures used in the experiment are described in Appendix B.

2.1.2.3. Noise

We edited the video clips with Adobe Premiere Pro to create parallel versions with noise. We used demodulated multi-speaker babble created by Amos (Citation2020); this corresponds to a constant noise with the same frequency spectrum as six-speaker babble. The multi-speaker babble (ICRA 7) was originally taken from the ICRA databaseFootnote6 (Dreschler et al., Citation2001). The number of speakers in this noise was not pre-tested. It was not possible to set one single absolute signal-to-noise ratio since we were working with video clips from different recording sessions and varying stress within and across video clips. Each video clip was therefore edited individually: the demodulated multi-speaker babble was added as a supplementary track in each timeline and noise levels were adjusted according to the audio characteristics of each video clip, using the audio track mixer. More specifically, we used the visual representation of levels (dynamic peaks) of the video clip original sound to ensure that the dynamic peak of the noise would not exceed the dynamic peak of the original audio. Importantly, while the target verbs were the most stressed words in most of the clips, the less stressed words had to remain audible, therefore we set noise levels according to the least audible words in the video clips. As a result, while the noisy clips included a constant, distracting noise, all words were still detectable. The average signal-to-noise ratio of the video clips was −5.7 dB. A pilot study was run with three L1 Spanish interpreting trainees to avoid floor and ceiling effects, i.e. to ensure that the noise level made processing more difficult but still allowed participants to engage in comprehension and simultaneous interpreting. The analysis of the behavioural data and of the interpreting recordings confirmed that both tasks were manageable at the chosen noise level, and that performance was neither at floor nor at ceiling.

2.1.2.4. Pictures

Black-and-white line drawings corresponding to the actions depicted in the target verbs were taken from the IPNP database (Szekely et al., Citation2004). However, since more drawings were needed, most of the pictures used in the present experiment were created by an artist using the same style. The drawings were normed for name and concept agreement, familiarity and visual complexity as in Snodgrass and Vanderwart (Citation1980) by 11 L1 English speakers. Pictures that did not yield satisfactory measures were redrawn and normed by 10 L1 English speakers. 10 L1 Spanish speakers were then asked to norm the selected pictures using the same measures. A sweepstake incentive of 50 CHF (for each language group) was made available.

Raters were asked to identify pictures as briefly and unambiguously as possible by typing in the first description (a verb) that came to mind. Concept agreement, which takes into account synonyms (e.g. “cut” and “carve” are acceptable answers for the target “slice”) was calculated as in Snodgrass and Vanderwart (Citation1980). Only picture pairs with concept agreement of over 70% were used. The same raters judged the familiarity of each picture, that is the extent to which they came in contact with or thought about the concept (rather than the way it was drawn). Concept familiarity was rated on a 5-point Likert-type scale (from 1 = “very unfamiliar” to 5 = “very familiar”). The same raters rated the complexity of each picture, that is the amount of detail or intricacy of the drawings (rather than the complexity of the action represented). Picture visual complexity was rated on a 5-point Likert-type scale (from 1 = “very simple” to 5 = “very complex”). As shown in Appendix A, lists were balanced in terms of concept agreement, concept familiarity, and visual complexity of the picture.

We created 12 blocks to accommodate the gesture conditions (semantically related gesture, semantically unrelated gesture, no gesture), the noise conditions (clear speech, noisy speech), and counterbalance for gesture handedness (right/left hand), and target picture position (right/left side). Each block comprised four practice trials and 30 critical trials. Trial-type order was randomised in each block. Each session consisted in four blocks, two assigned to the SI task, two to the comprehension task. The task order was counterbalanced across participants. Each participant saw List A and List B twice, such that the target items of List A served as distractor items for List B, and the other way round, but never saw the same individual trial twice.

2.1.3. Apparatus

Experimental tasks were completed in an ISO4043-compliant mobile interpreting booth (International Organization for Standardization, Citation2016a), programmed in SR Experiment-Builder® and deployed on a Mac Mini®. Visual stimuli were presented on a 23″ (58.4 cm) HP E232 display with a refresh rate of 60 Hz, located approximately 75 cm from the participants. Auditory stimuli were played over an LBB 3443 Bosch headset. Eye-movement data were acquired with an SR Research EyeLink® 1000 desktop-mounted remote eye-tracking system with a sampling rate of 500 Hz. The eye-tracker camera was located in front of the monitor, leaving a distance of approximately 60 cm between participants’ eyes and the eye-tracker. Participants’ spoken interpretations were recorded using a Bosch DCN-IDESK-D interpreting console and fed back into the EyeLink to generate time-aligned stereophonic recordings of stimulus audio output and participant audio input. The input device for non-verbal responses was a VPixx Technologies RESPONSEPixx HANDHELD 5-button response box.

2.1.4. Procedure

Each session consisted of four blocks and lasted approximately one hour. Each block started with a standard 9-point calibration of the eye-tracker. After validation, participants completed a practice-trial session. During and at the end of the practice session, participants could ask clarification questions. Participants were then instructed to launch the critical trials by pressing a button on the response box. Participants had timed three-minute breaks between blocks. During interpreted blocks, the experimenter monitored whether participants were interpreting the trial sentences simultaneously, and, if necessary, reminded participants. No feedback was given during the experiment. The experimenter monitored the eye-tracking display and recalibrated when necessary.

2.1.4.1. Comprehension – picture-matching task

In the comprehension task, participants were asked to “keep looking at the screen while the video [was] being played” to enable the eye-tracker to follow their gaze. They were instructed to use the response box to “choose the picture that best correspond[ed] to the video” between two pictures. Upon launch of a trial, participants saw a short video clip as described in the Task and materials section. This was followed by a blank screen (2,000 ms) upon which two pictures were presented, on the left and right side of the screen, respectively. Once a picture was selected, a drift correction was performed to proceed to the next trial. The procedure is illustrated in .

Figure 2. Trial sequence during comprehension and SI.

Figure 2. Trial sequence during comprehension and SI.

2.1.4.2. SI – picture-matching task

In the SI task, participants were asked to “start interpreting as soon as possible when the video start[ed]”, so that they would be engaged in simultaneous interpreting by the time the target verb was uttered. They were also instructed to “keep looking at the screen while the video [was] being played” to enable the eye-tracker to follow their gaze. They were asked to use the response box to “choose the picture that best correspond[ed] to the video” between two pictures. Upon launch of a trial, the participants saw a short video clip as described in the Task and materials section. This was followed by a blank screen (5,000 ms, which gave participants time to complete their interpretation) upon which two pictures were presented, on the left and right side of the screen, respectively. Once a picture was selected, a drift correction was performed to proceed to the next trial. The procedure is illustrated in .

2.1.5. Analysis

The analyses of the three dependent variables, response accuracy, reaction time (RT) and dwell time, were conducted separately and implemented in R (R Core Team, Citation2013) using the lme4 package (Bates et al., Citation2015).

Practice trials were not included in the analyses. Trials in which participants had not interpreted, only partially interpreted, or had not finished interpreting the stimuli by the onset of the picture-selection task were also excluded from the analysis, which led to the removal of 21.8% of interpreted trials (314 trials, 10.9% of the whole dataset). As the comprehension task did not involve any output (apart from the forced-choice during the picture-matching task), all trials were entered in the analysis (but see trimming procedure below).

2.1.5.1. Accuracy

Accuracy data was analysed using generalised linear mixed models (GLMM) using the optimx optimiser (Nash, Citation2014). The dataset was trimmed before completing the analyses. Responses above and below 3 (overall) SDs from the overall RT mean were considered outliers, which led to the removal of 1.8% (20 trials) of the data points in the SI dataset, and 2.4% (35 trials) of the comprehension dataset. Overall, 13% (377 trials) of all trials were excluded in the Accuracy data analyses.

GLMM analyses were conducted to test the relationship between accuracy and the fixed effects task (2 levels, comprehension and simultaneous interpreting), noise (2 levels, clear speech, noisy speech) and semantic match between speech and gesture (3 levels, semantically related gesture, semantically unrelated gesture, no gesture). Interaction terms were set between task type, noise condition and semantic match. Subjects and items were entered as random effects with by-subject and by-item random intercepts since this was the maximal random structure supported by the data.

2.1.5.2. Reaction time

Linear mixed-effects model (LMM) analyses were run on RT data. The significance of effects was determined by assessing whether the associated t-statistics had absolute values ≥ 2. Only accurate trials were used for the RT analyses, resulting in the exclusion of 2.9% (33 trials) of RT data points in the SI subset and 3.3% (48 trials) in the comprehension subset. Then, the dataset was trimmed before completing the analyses using the same approach as for the Accuracy data. Overall, the excluded trials amounted to 15% (438 trials) of the dataset.

RTs were log-transformed and then analysed using an LMM with the same fixed-effects structure as the GLMM. Subjects and items were entered as random effects with by-subject and by-item random intercepts.

2.1.5.3. Dwell time

LMM analyses were run on dwell time data. The same values as for RT data were used to determine the significance of effects. Dwell time analyses were only performed on the two conditions that contained any gestures (66.6% of the data). Only accurate trials were used, resulting in the exclusion of 2.2% (16 trials) in the SI subset and 2.9% (28 trials) of the comprehension subset. Trials for which dwell time data were missing, that is 2.7% (26 trials) of the SI subset, and 2% (15 trials) of the comprehension subset, were also excluded. Overall, the excluded trials amounted to 44% (1,254 trials) of the dataset.

Two areas of interest were created, one comprising the speaker’s head, the other one including gesture space, from the speaker’s shoulders to her hips. Dwell time was computed in % in each of the areas of interest during the gesture stroke.

We tested the relationship between accuracy and the fixed effects task (2 levels, comprehension and SI), noise (2 levels, clear speech, noisy speech) and semantic match between speech and gesture (2 levels, semantically related gesture and semantically unrelated gesture). Interaction terms were set between task type, noise condition and semantic match. Subjects and items were entered as random effects with by-subject and by-item random intercepts as this was the maximal random structure supported by the data.

2.2. Results

2.2.1. Accuracy

Accuracy scores (A) were close to ceiling in both tasks and across all conditions.

Table 2. (A) Mean response-accuracy in per cent. (B) Mean RT in ms.

The GLMMs (full models including all interactions of interest, as well as reduced models including one interaction at a time) did not reveal any significant interactions or effects.

2.2.2. Reaction time

Mean reaction times are presented in B. The LMMs (full models including all interactions of interest, as well as reduced models including one interaction at a time) did not reveal any significant interactions or effects.

2.2.3. Dwell time

Mean dwell times on the speaker’s gesture space are presented in as a function of task, gesture, and noise.

Figure 3. Mean dwell time on participants’ gesture space in per cent as a function of task, gesture, and noise. Panel A: Dwell time with semantically related gestures during SI. Panel B: Dwell time with semantically unrelated gestures during SI. Panel C: Dwell time with semantically related gestures during comprehension. Panel D: Dwell time with semantically unrelated gestures during comprehension.

Figure 3. Mean dwell time on participants’ gesture space in per cent as a function of task, gesture, and noise. Panel A: Dwell time with semantically related gestures during SI. Panel B: Dwell time with semantically unrelated gestures during SI. Panel C: Dwell time with semantically related gestures during comprehension. Panel D: Dwell time with semantically unrelated gestures during comprehension.

The comprehension task, the clear speech condition and the control gesture condition were set as baselines. The interaction between task, noise condition and gesture condition was not significant. The result of the likelihood-ratio test used to compare the full to a first reduced model (task + gesture condition * noise condition) was significant (χ2 (3) = 11.77, p = .008), suggesting that the full model was a better fit for the data. The output of the full model revealed that dwell time was significantly affected by task (SI: β = −6.09, SE = 1.91, t = −3.19) and noisy speech (β = −5.24, SE = 1.78, t = −2.94).

We fitted an alternative reduced model, task * noise condition + gesture condition. This time, the result of the likelihood-ratio test used to compare the full to the reduced model was not significant. The reduced model revealed significant effects of task, gesture and noise conditions and of an interaction of task and noise, such that SI, noisy speech and the interaction of SI with noisy speech were associated with shorter dwell time, and that semantically related gestures were associated with longer dwell time (SI: β = −5.53, SE = 1.33, t = −4.15; noisy speech: β = −5.69, SE = 1.25, t = −4.54; semantically related gesture: β = 2.20, SE = 0.94, t = 2.34; SI * noisy speech: β = 5.87, SE = 1.90, t = 3.09).

The third reduced model we fitted (task * gesture condition + noise condition) was not as good a fit to the data as the full model (χ2 (3) = 10.14, p = .02).

The second reduced model was the best-fitting, with the lowest AIC and BIC (full model: AIC = 14,289, BIC = 14,348; reduced model 2: AIC = 14,285, BIC = 14,328). The interaction of SI with noisy speech is plotted in .

Figure 4. Interaction of noise condition and task influencing dwell time in percentages on the speaker’s gesture space.

Figure 4. Interaction of noise condition and task influencing dwell time in percentages on the speaker’s gesture space.

2.3. Discussion

Accuracy and RTs were not affected by task type, gesture condition, noise condition or any interaction thereof. Thus, it appears that gestures did not have any effect on interpreters’ accuracy and RTs either in comprehension or in SI. This is in line with most studies on the influence of visual access to the speaker in SI (Anderson, Citation1994; Bacigalupe, Citation1999; Balzani, Citation1990; Rennert, Citation2008; Tommola & Lindholm, Citation1995), although these studies have not probed the influence of gestures specifically on SI. However, this is in contrast to the results reported in Arbona et al. (Citation2023). Moreover, we did not observe any effect of gestures in clear versus noisy speech, in contrast to findings in the literature (Drijvers & Özyürek, Citation2020; Holle et al., Citation2010; Rogers, Citation1978). This is unexpected, especially as the stimuli set and the method were similar in the present study and in Arbona et al. (Citation2023), and the only notable differences lied in the language profiles of the tested participants and the addition of the noise condition. We have no reason to believe that the language profiles influenced the results. Interestingly, in the current experiment, we had to exclude 21.8% of all interpreted trials as participants had not interpreted, only partially interpreted, or had not finished interpreting the stimuli by the onset of the picture-selection task. Stimuli that took interpreters more time to interpret or generated incomplete interpretation may have been associated with more difficulty. This might have caused higher error rates and slower RTs in the picture-matching task. While some interpreted trials also had to be excluded in Arbona et al. (Citation2023), this represented 13.5% of interpreted trials only. In the present experiment, we had to exclude a larger fraction of interpreted trials, the majority of which corresponded to the noisy speech condition (179 excluded trials as compared to 135 in clear speech). Perhaps this is obscuring potential accuracy or RT effects. Another possibility which might explain the discrepancy between the present results and the literature probing the effect of gestures in adverse listening conditions is that the noise level was too low to elicit significant differences across listening conditions. Although we had piloted this noise level on three interpreting trainees, this population is not entirely comparable to experienced interpreters (see, for example, Moser-Mercer et al., Citation2000; Nour et al., Citation2019; Riccardi, Citation1996). In the present study, accuracy, notably, is at ceiling even in the condition without any gestures, which makes it difficult to detect potential facilitatory effects of gestures. While it is not easy to test a noise level which is sufficiently low to allow gestures to contribute to comprehension while still allowing simultaneous interpreting, alternative noise levels should be tested in further research projects to make sure that an appropriate noise level is used.

That said, participants looked significantly longer at the speaker’s gesture space when the speaker performed semantically related gestures than semantically unrelated gestures in both tasks. Moreover, interpreters attended to (all) gestures significantly longer during comprehension than during SI. All these results fit with the findings reported in Arbona et al. (Citation2023), and may reflect task demands in SI.

Lastly, interpreters attended less to (all) gestures in noisy speech compared to clear speech, and the mean dwell time on gesture space was lowest when interpreters simultaneously interpreted noisy speech. It is possible that articulatory movements of the mouth are deemed more informative than gestures in noisy speech, especially during SI. This is in line with the literature on visual attention to the face, irrespective of gesture, which demonstrates increased attention to the mouth in noisy speech compared to in clear speech (Drijvers et al., Citation2019; Król, Citation2018; Rennig et al., Citation2020; Vatikiotis-Bateson et al., Citation1998). However, the areas of interest which we used in the current study (face and gesture space) do not allow to conduct such as fine-grained analysis within the “face” region.

The first experiment did not bring to light any processing differences between the comprehension task and the SI task in terms of accuracy and reaction time. Therefore, engaging in a SI task did not modulate multimodal language comprehension. However, interpreters might have honed their cognitive resources relying on their expertise in SI, and interpreting experience may have had an effect on the interpreters’ behaviour in both tasks. In other words, bilinguals without interpreting experience might behave differently, since interpreting expertise has been shown to positively influence cognitive performance, such as dual-task performance (Strobach et al., Citation2015), and cognitive flexibility (Yudes et al., Citation2011). To investigate this, a second experiment compared experienced interpreters to a group of bilinguals with no interpreting experience.

3. Experiment 2

In the second experiment, we compared the performance of an experimental group consisting of the interpreters in the first experiment and a comparison group of bilinguals without interpreting experience (professional translators) on a comprehension task. Simultaneous interpreters were compared to professional translators since the two groups are likely to be similar in terms of language proficiency and age, and both are used to working with two languages. Translators are language professionals who change written words into another written language; the main difference with simultaneous interpreters being that they work with text rather than with speech in realtime and that there is no simultaneity requirement. We addressed the following questions:

5.

Do bilinguals integrate gestural information during comprehension in the same way as interpreters in noisy speech as compared to in clear speech? 6. Do bilinguals visually attend to gestures in speech in the same way as interpreters in noisy as compared to clear speech? 7. Does that visual attention correlate with their comprehension in the same way as in interpreters in noisy compared to clear speech?

We predicted potential differences in the integration of gestural information and visual attention to gestures between bilinguals and interpreters in noisy speech as compared to in clear speech.

3.1. Materials and method

3.1.1. Participants

Twenty-four translators working from English into Spanish participated in the experiment.Footnote7 They were recruited via an e-mail describing the eligibility criteria. Interested individuals were invited to sign up for the experiment. They completed a questionnaire similar to the one used in the first experiment, with adapted questions regarding their professional background. The group included 4 participants who were trained translators now working in related fields (e.g. lecturer, researcher) or who only worked occasionally as translators. The remaining 20 participants were experienced professional translators (M = 12 years, 11 months; SD = 8 years, 6 months). Eleven participants had previously received training in conference interpreting, either as a stand-alone course or as a part of their translator training. However, 8 of them had never worked as professional interpreters, while 3 had practiced interpreting very occasionally or for a limited period of time (one or two years) dating back 4–15 years. Although interpreting experience was therefore not homogenous in the control group, it was considered negligible compared to that of the experimental group. All participants had normal or corrected-to-normal vision and reported no language disorders. Their L1 was SpanishFootnote8, their L2 English.Footnote9 The experimental and the comparison groups were matched for factors pertaining to background and language experience (see ). The groups were comparable on all tested variables: age, professional experience, languages spoken and language ability and exposure in Spanish and English.

Table 3. Background information provided in the language background questionnaire, and comparison of groups (t-test for numerical variables, Wilcoxon test for ordinal variables): *: p < 0.05, **: p < 0.01, ***: p < .001.

All participants gave written informed consent. The experiment was approved by the Ethics Committee of the Faculty of Translation and Interpreting at the University of Geneva. No participant was involved in the norming of the stimuli.

3.1.2. Design, task, materials, procedure

Participants completed the comprehension task only. Materials, apparatus, procedure and instructions were the same as in the first experiment. Participants completed four blocks, following the same rotation as in the first experiment. However, since there was only one task, the analysis included only two of the blocks, those corresponding to the comprehension blocks in the first experiment. The sessions lasted approximately 50 min. Thus, time on task for bilinguals was slightly shorter than for the interpreters (viewing/listening trials were slightly shorter since no margin had to be added for interpretations). They saw each list twice, like the interpreters, although they completed the same task four times whereas the interpreters completed two different tasks twice.

3.1.3. Analysis

The dependent variables were the same as in the first experiment (response accuracy, RT and dwell time) and analyses were conducted separately.

3.1.3.1. Accuracy

The dataset was trimmed before completing the analyses. Responses above and below 3 (overall) SDs from the overall RT mean were considered outliers, which led to the removal of 2.4% (35 trials) of the bilinguals’ dataset, and 2.4% (35 trials) of the interpreters’ dataset. GLMM analyses were conducted using the optimx optimiser (Nash, Citation2014) to test the relationship between accuracy and the fixed effects group (2 levels, interpreter or bilingual status), noise (2 levels, clear speech and noisy speech) and semantic match between speech and gesture (3 levels, semantically related gesture, semantically unrelated gesture, no gesture). Interaction terms were set between group membership, noise condition and semantic match. Subjects and items were entered as random effects with by-subject and by-item random intercepts since this was the maximal random structure supported by the data.

3.1.3.2. Reaction time

Only accurate trials were analysed, resulting in the exclusion of 3.2% (46 trials) of the bilinguals’ data points and 3.3% (48 trials) of the interpreters’ dataset. Then, the dataset was trimmed before completing the analyses using the same approach as for the Accuracy data. Overall, the excluded trials amounted to 5% (144 trials) of the total.

RTs were log-transformed and analysed using a LMM with the same fixed-effects structure as the GLMM. Subjects and items were entered as random effects with by-subject and by-item random intercepts.

3.1.3.3. Dwell time

Dwell time analyses were only performed on the two conditions that contained any gestures (66.6% of the data). Only accurate trials were used, resulting in the exclusion of 2.9% (28 items) of the bilinguals’ dataset and 2.9% (28 trials) of the interpreters’ dataset. Trials for which dwell time data were missing (0.4% of bilinguals’ data points, 4 trials, and 2%, 15 trials, of the interpreters’ dataset) were also excluded. Overall, the excluded trials amounted to 35% (1,016 trials) of the dataset. The same areas of interest as in the main experiment were used. Dwell time percentage data was analysed using an LMM to test the relationship between dwell time and the fixed effects group (2 levels, interpreter or bilingual status), noise (2 levels, clear speech and noisy speech) and semantic match between speech and gesture (2 levels, semantically related gesture or semantically unrelated gesture). Interaction terms were set between group, noise condition and semantic match. Subjects and items were entered as random effects with by-subject and by-item random intercepts.

3.2. Results

3.2.1. Accuracy

Accuracy scores (A) were close to ceiling in both groups and in all conditions.

Table 4. (A) Mean response-accuracy in per cent. (B) Mean RT in ms.

With the no-gesture condition, the clear speech condition and the bilingual group set as baselines, the full GLMM with interactions did not reveal any effects of noise condition, gesture condition or group membership. We fitted a first reduced model (group membership + gesture condition * noise condition), which revealed only a main effect of semantically related gestures (β = 1.07, SE = 0.50, Z = 2.14, p = .03). The result of the likelihood-ratio test used to compare the full model to this reduced model was not significant (χ2 (5) = 2.35, p = .80), revealing that the reduced model was a better fit to the data. We fitted an alternative reduced model (group membership * gesture condition + noise condition), which did not reveal any effects. The result of the likelihood-ratio test used to compare the full model to this reduced model was not significant (χ2 (5) = 3.32, p = .65), revealing that the reduced model was a better fit to the data. We fitted a last reduced model (gesture condition + group membership * noise condition), which revealed only a main effect of semantically related gestures (β = 0.81, SE = 0.33, Z = 2.43, p = .02). The result of the likelihood-ratio test used to compare the full model to this reduced model was not significant (χ2 (6) = 1.43, p = .96), revealing that the reduced model was a better fit to the data. The third reduced model was the best-fitting, with the lowest AIC and BIC (reduced model 1: AIC = 627.1, BIC = 680.6; reduced model 2: AIC = 628.1, BIC = 681.6; reduced model 3: AIC = 624.2, BIC = 671.7). Since this model had only revealed a significant effect of semantically related gestures, such that trials with semantically related gestures were associated with higher accuracy, we fitted a minimal model with only gesture condition as a fixed effect. The model confirmed that accuracy was significantly affected by semantically related gestures, such that trials with semantically related gestures were associated with higher accuracy (β = 0.81, SE = 0.33, Z = 2.44, p = .01). The result of the likelihood-ratio test used to compare this model to a null model without any fixed effects was significant (χ2 (2) = 6.63, p = .04), revealing that the full model was a better fit to the data. Comparing this full model to the reduced model 3, this full model with only gesture condition as fixed effect was the best fitting, with the lowest AIC and BIC (reduced model 3: AIC = 624.2, BIC = 671.7; full model with only gesture condition: AIC = 621.8, BIC: 651.5).

The GLMMs did not reveal any significant interactions or effects with semantically unrelated gestures as baseline.

3.2.2. Reaction time

Mean reaction times are presented in B. With the no-gesture condition, the clear speech condition and the bilingual group as baselines, the full LMM with interaction terms did not reveal any effects of noise condition, gesture condition or group membership. We then fitted a first reduced model (group membership + gesture condition * noise condition), which did not reveal any significant effects. The result of the likelihood-ratio test used to compare this model to a null model without any fixed effects was not significant (χ2 (5) = 8.75, p = .12), revealing that the reduced model was a better fit to the data. We fitted a second reduced model (group membership * gesture condition + noise condition), which revealed a significant effect of semantically related gestures (β = −0.08, SE = 0.02, t = −3.30). The result of the likelihood-ratio test used to compare this model to the full model was not significant (χ2 (5) = 3.64, p = .60), revealing that the reduced model was a better fit to the data. We fitted a last reduced model (gesture condition + group membership * noise condition), which also revealed a significant effect of semantically related gestures (β = −0.03, SE = 0.02, t = −2.02). The result of the likelihood-ratio test used to compare this model to the full model was not significant (χ2 (6) = 10.43, p = .11), revealing that the reduced model was a better fit to the data. The second reduced model was the best-fitting, with the lowest AIC and BIC (reduced model 1: AIC = 2282.7, BIC = 2341.8; reduced model 2: AIC = 2277.6, BIC = 2336.7; reduced model 3: AIC = 2282.4, BIC = 2335.6). Since this model had revealed a significant effect of semantically related gestures, we fitted a minimal model with only gesture condition as a fixed effect. The model confirmed that RTs were significantly affected by semantically related gestures (β = −0.03, SE = 0.02, t = −2.02), such that these gestures were associated with significantly shorter RTs. The result of the likelihood-ratio test used to compare this model to a null model without any fixed effects was significant (χ2 (2) = 11.76, p = .003), revealing that the full model was a better fit to the data. Comparing this full model to the reduced model 2, this full model with only gesture condition as fixed effect was the best fitting, with the lowest AIC and BIC (reduced model 2: AIC = 2277.6, BIC = 2336.7; full model with only gesture condition: AIC = 2277.9, BIC: 2313.4).

Setting the semantically unrelated gesture condition as baseline to explore potential effects of semantically related gestures as compared to semantically unrelated gestures (while keeping the other baselines), the full LMM with interaction terms did not reveal any effects of noise condition, gesture condition or group membership. We then fitted a first reduced model (group membership + gesture condition * noise condition), which only revealed a significant effect of semantically related gestures (β = −0.08, SE = 0.02, t = −3.30). The result of the likelihood-ratio test used to compare this model to the full model was not significant (χ2 (5) = 8.75, p = .12), revealing that the reduced model was a better fit to the data. We fitted a second reduced model (group membership * gesture condition + noise condition), which revealed again a significant effect of semantically related gestures (β = −0.08, SE = 0.02, t = −3.64). The result of the likelihood-ratio test used to compare this model to the full model was not significant (χ2 (5) = 3.64, p = .60), revealing that the reduced model was a better fit to the data. We fitted a last reduced model (gesture condition + group membership * noise condition), which also revealed a significant effect of semantically related gestures (β = −0.06, SE = 0.02, t = −3.41). The result of the likelihood-ratio test used to compare this model to the full model was not significant (χ2 (6) = 10.43, p = .11), revealing that the reduced model was a better fit to the data. All reduced models revealed a significant effect of semantically related gestures, such that these gestures were associated with faster RTs, but the second reduced model was the best-fitting, with the lowest AIC and BIC (reduced model 1: AIC = 2282.7, BIC = 2341.8; reduced model 2: AIC = 2277.6, BIC = 2336.7; reduced model 3: AIC = 2282.4, BIC = 2335.6). Since this model had only revealed a significant effect of semantically related gestures, we fitted a minimal model with only gesture condition as a fixed effect. The model confirmed that RTs were significantly affected by semantically related gestures (β = −0.06, SE = 0.02, t = −3.41), such that these gestures were associated with significantly shorter RTs. The result of the likelihood-ratio test used to compare this model to a null model without any fixed effects was significant (χ2 (2) = 11.76, p = .003), revealing that the full model was a better fit to the data. Comparing this full model to the reduced model 2, this full model with only gesture condition as fixed effect was the best fitting, with the lowest AIC and BIC (reduced model 2: AIC = 2277.6, BIC = 2336.7; full model with only gesture condition: AIC = 2277.9, BIC: 2313.4).

3.2.3. Dwell time

Mean dwell times on the speaker’s gesture space are presented in as a function of group, gesture, and noise.

Figure 5. Mean dwell time on participants’ gesture space in per cent as a function of group, gesture and noise. Panel A: Interpreters’ dwell time with semantically related gestures. Panel B: Interpreters’ dwell time with semantically unrelated gestures. Panel C: Bilinguals’ dwell time with semantically related gestures. Panel D: Bilinguals’ dwell time with semantically unrelated gestures.

Figure 5. Mean dwell time on participants’ gesture space in per cent as a function of group, gesture and noise. Panel A: Interpreters’ dwell time with semantically related gestures. Panel B: Interpreters’ dwell time with semantically unrelated gestures. Panel C: Bilinguals’ dwell time with semantically related gestures. Panel D: Bilinguals’ dwell time with semantically unrelated gestures.

The bilingual group, the clear speech condition and the control gesture condition were set as baselines. The interaction between group membership, noise condition and gesture condition was not significant. The result of the likelihood-ratio test used to compare the full to a first reduced model (group membership + gesture condition * noise condition) was also not significant. The reduced model revealed that noise had a significant effect on dwell time (noisy speech: β = −5.62, SE = 1.38, t = −4.07).

We fitted an alternative reduced model, group membership * noise condition + gesture condition. Again, the result of the likelihood-ratio test used to compare the full to the reduced model was not significant. The reduced model revealed significant effects of gesture and noise conditions (noisy speech: β = −6.12, SE = 1.37, t = −4.47; semantically related gestures: β = 2.04, SE = 0.97, t = 2.10).

The third reduced model we fitted (group membership * gesture condition + noise condition) was also a better fit to the data than the full model, and revealed significant effects of noise and gesture conditions (noisy speech: β = −5.95, SE = 0.97, t = −6.13; semantically related gesture: β = 2.94, SE = 1.37, t = 2.15).

We fitted another model with noise and gesture conditions as fixed effects (with an interaction term) to explore potential effects without group membership. However, the interaction of noise and gesture conditions was not significant and the result of the likelihood-ratio test used to compare the full to reduced model (without interaction term) was not significant either. The reduced model revealed significant effects of gesture and noise conditions (noisy speech: β = −5.95, SE = 0.97, t = −6.12; semantically related gesture: β = 2.03, SE = 0.97, t = 2.10). This model was also the best-fitting of the four models, with the lowest AIC and BIC (reduced model 1: AIC = 16,611, BIC = 16,655; reduced model 2: AIC = 16,611, BIC = 16,655; reduced model 3: AIC = 16,610, BIC = 16,654; model with two variables only: AIC = 16,608, BIC = 16,641).

3.2.4. Posthoc analysis

The analyses reveal that semantically related gestures had a different effect on interpreters engaged in SI and in comprehension compared to in interpreters and bilinguals engaged in comprehension. Therefore, we compared trends in the three subsets (interpreters in SI, interpreters in comprehension, bilinguals in comprehension) to probe whether one subset might be driving this difference. Regarding accuracy, bilinguals (semantically related gestures: M = 98.3, SD = 13; no gesture: M = 96.6, SD = 18.1) arguably behaved more similarly to interpreters engaging in comprehension (semantically related gestures: M = 98.5, SD = 12.1; no gesture: M = 96.6, SD = 18.1) than to interpreters during SI (semantically related gestures: M = 98.2, SD = 13.5; no gesture: M = 97.5, SD = 15.7). As for RTs, with semantically unrelated gestures as baseline, we found the same pattern (bilinguals: semantically related gestures: M = 1,503 ms, SD = 677; semantically unrelated gestures: M = 1,672, SD = 877; interpreters engaging in comprehension: semantically related gestures: M = 1,564, SD = 785; semantically unrelated gestures: M = 1,601, SD = 812; interpreters during SI: semantically related gestures: M = 1,436, SD = 520; semantically unrelated gestures: M = 1,433, SD = 525). The effect went in the same direction in interpreters engaging in comprehension (t = −1.28) and in bilinguals (t = −3.45), whereas it was reversed during SI (t = 0.60). However, this pattern was different with the no-gesture condition as baseline: the results of the bilinguals (semantically related gestures: M = 1,503, SD = 677; no gesture: M = 1,663, SD = 854) were more similar to those of the SI task (semantically related gestures: M = 1,436, SD = 520; no gesture: M = 1,470, SD = 542) than to the those of interpreters engaging in comprehension (semantically related gestures: M = 1,564, SD = 785; no gesture: M = 1,530, SD = 740). The effect went in the same direction in bilinguals (t = −3.11) and in interpreters engaged in SI (t = −0.61), whereas it was reversed in interpreters engaging in comprehension (t = 0.47). Thus, two out of three effects had a pattern which was closer in bilinguals and interpreters engaged in comprehension, whereas SI was associated with less variability across conditions and flatter trends. The data reported in this section is summarised in .

Table 5. Posthoc analysis. (A) Mean response-accuracy in per cent. (B) Mean RT in ms.

3.3. Discussion

The second experiment did not show any significant differences in comprehension or overt visual attention during comprehension between interpreters and bilinguals, which suggests that interpreting experience did not influence attention to and processing of gestures in clear or noisy speech.

Noise did not influence accuracy or RT, but gesture condition did. Both groups were significantly faster and more accurate when presented with semantically related gestures than without gesture. Semantically unrelated gestures did not have the same effect. Moreover, semantically related gestures were associated with faster RTs (in both groups) than semantically unrelated gestures. This finding, which is in line with Arbona et al. (Citation2023), suggests that both in interpreters and in bilinguals, processing during comprehension is sensitive to gestures’ semantic relationship with the spoken utterance.

However, similarly to the first experiment, in the absence of a significant difference in terms of accuracy and/or RT between clear speech and noisy speech, and given that especially accuracy is at ceiling in both groups even in the condition without gestures, it is possible that the chosen noise level was too low to elicit significant differences across listening conditions and to allow gestures to contribute to language comprehension. This would explain the discrepancy between the present results and previous research (e.g. Drijvers & Özyürek, Citation2020; Holle et al., Citation2010; Rogers, Citation1978).

That said, noisy speech was associated with less attention to (all) gestures. Participants seem to favour the face as the locus of attention in adverse listening conditions, perhaps because they glean information from articulatory mouth movements (although the predefined areas of interest are too coarse-grained to allow us to probe this question in detail). The literature certainly shows a preference for fixations on the mouth in noisy speech (Drijvers et al., Citation2019; Król, Citation2018; Rennig et al., Citation2020; Vatikiotis-Bateson et al., Citation1998). Moreover, and in line with Arbona et al. (Citation2023), both groups attended to semantically related gestures significantly longer than to semantically unrelated gestures, which suggests that participants’ visual attention patterns, too, are sensitive to the semantic relationship between gesture and speech, both in clear and noisy speech.

4. General discussion

This study aimed to probe the potential effect of manual co-speech gestures on simultaneous interpreters’ language comprehension in noisy as compared to clear speech. The first question was whether simultaneous interpreters integrate gestural information differently during language comprehension in noisy and clear speech. Since SI is considered an instance of extreme language use (Hervais-Adelman et al., Citation2015), noisy speech could make the task even more challenging, and gestures might facilitate language comprehension in that context. This is particularly relevant in the context of Remote Simultaneous Interpreting, which is currently often hallmarked by degraded sound (Seeber & Fox, Citation2021). However, in the first experiment, gestures did not affect interpreters’ language comprehension in clear or in noisy speech in either task. Therefore, the integration of gestural information during interpreters’ language comprehension does not seem to differ during noisy as compared to clear speech. This is in contrast to a series of studies conducted on co-speech gestures and comprehension of noisy speech (e.g. Drijvers & Özyürek, Citation2020; Holle et al., Citation2010; Rogers, Citation1978). More generally, according to the Integrated-Systems Hypothesis (Kelly, Creigh, et al., Citation2010), gestures would be expected to influence speech processing. This discrepancy between results may be explained by the fact that we investigated SI, as opposed to “pure” comprehension, and/or by the fact that we recruited experienced interpreters as participants. Another possibility is that the noise level tested in the current paper was too low to elicit significant differences across listening conditions. Indeed in both experiments, accuracy was at ceiling even in the condition without gestures, so it may have been difficult to detect potential contributions of gestures to language comprehensions. Further research projects should test alternative noise levels in similar set-ups to exclude this possibility.

The second question was whether the integration of gestural information during language comprehension is affected differently by task or by interpreting experience in adverse listening conditions as opposed to in clear speech. In the first experiment, co-speech gestures did not influence interpreters’ comprehension, and task did not affect integration differently in noisy speech compared to in clear speech. In the second experiment, however, both interpreters and bilinguals without interpreting experience were affected by gestures: during comprehension, processing was sensitive to the semantic relevance of gestures in the utterance. Crucially, interpreting experience did not affect the integration of gestural information differently in clear as compared to noisy speech (with the caveat of the noise level which we already mentioned).

These findings notwithstanding, semantically related gestures had a different effect on interpreters engaged in SI and in comprehension compared to in interpreters and bilinguals engaged in comprehension. In a posthoc analysis, we compared trends in the three subsets (interpreters in SI, interpreters in comprehension, bilinguals in comprehension) to probe whether one subset might be driving this difference. We found that bilinguals and interpreters engaged in comprehension showed similar patterns in two out of three cases, whereas SI was associated with less variability across conditions and flatter trends. Therefore, it would seem that the SI task is driving the differences in the effect of co-speech gestures on language comprehension, both in clear and noisy speech. This suggests that the SI task modulates the effect that co-speech gestures have on language comprehension in clear and in noisy speech. The fact that simultaneous interpreters produce a verbal response while comprehending other linguistic input may reduce the influence of co-speech gestures on comprehension in this context. Therefore, it seems that the multimodal comprehension component in SI shares distinct features compared to other (bilingual) comprehension tasks. This may explain why studies on the effect of visual access to the speaker on SI have found null effects (Anderson, Citation1994; Bacigalupe, Citation1999; Balzani, Citation1990; Rennert, Citation2008; Tommola & Lindholm, Citation1995), although they have not looked at gestures, specifically. In any case, the results suggest that in Remote Simultaneous Interpreting, manual gestures are unlikely to improve comprehension in noisy speech.

That said, further research is needed to better understand how noisy speech may influence multimodal processing during SI. Indeed, in a study investigating the effect of co-speech gestures in clear speech (Arbona et al., Citation2023), semantically related gestures were found to facilitate interpreters’ processing both during SI and comprehension. In other words, although the current study did not reveal any effect of noisy speech on comprehension, it might be that noise had an overall effect on the interpreters. Many interpreter participants reported that they found the task difficult because of the noisy speech. We can speculate that, as of a certain level of difficulty, co-speech gestures may no longer be helpful to interpreters. Carefully varying noise levels could help shed more light on multimodal processing during SI. This would also allow us to exclude the possibility that the noise level which we tested in the present paper was too low to elicit significant differences across listening conditions and thus did not allow gestures to contribute to language comprehension.

On the other hand, interpreting experience did not seem to affect the integration of gestural information. This is in contrast to other findings, according to which interpreting experience might positively influence cognitive performance, e.g. dual-task performance (Strobach et al., Citation2015) or cognitive flexibility (Yudes et al., Citation2011), suggesting that interpreters might be better than other bilinguals at integrating visual and auditory input in parallel. However, we are not aware of any previous SI study that has systematically investigated co-speech gestures, specifically. This leads us to two conclusions: First, co-speech gestures are indeed integrated; they are part and parcel of the processing of noisy speech during comprehension. Second, during comprehension, semantically related co-speech gestures have a facilitatory effect on both interpreters’ and bilinguals’ processing of noisy speech. These findings are in line with the literature (e.g. Drijvers & Özyürek, Citation2020; Holle et al., Citation2010; Rogers, Citation1978) and more globally with the Integrated-Systems Hypothesis, which predicts an influence of co-speech gestures on speech processing (Kelly, Creigh, et al., Citation2010), although it does not make predictions about noisy speech.

The last question was whether simultaneous interpreters and bilinguals without SI experience visually attend to gestures differently in noisy and clear speech and whether that attention correlates with their comprehension. Most of the participants’ overt visual attention was focused on the speaker’s face rather than on her gesture space, as in previous eye-tracking studies of visual attention to gestures (Gullberg & Holmqvist, Citation2006; Gullberg & Kita, Citation2009). That said, both bilinguals and interpreters attended to the speaker’s gesture space, and overt visual attention was modulated by gestural characteristics, in line with previous findings (Beattie et al., Citation2010; Gullberg & Holmqvist, Citation1999, Citation2006; Gullberg & Kita, Citation2009; Özer et al., Citation2023). Crucially, all participants looked significantly longer at the speaker’s gesture space in clear speech, compared to in noisy speech, and interpreters looked at the gesture space less when simultaneously interpreting noisy speech. This may seem counter-intuitive, since gestures might have been expected to help compensate for poorer sound quality, especially in a complex task such as SI. However, this result is perhaps not so surprising. It may be that interpreters engaged in SI, especially in noisy speech, prefer fixating the speaker’s face to glean verbal speech information from the speaker’s mouth (see Jesse et al., Citation2000). More generally, all participants in the current study might have favoured fixating the face to access articulatory mouth movements, in the same way, that listeners fixate the mouth rather than other areas of the speaker’s face when processing noisy speech (Drijvers et al., Citation2019; Król, Citation2018; Rennig et al., Citation2020; Vatikiotis-Bateson et al., Citation1998). However, our areas of interest do not allow us to assess which facial cues participants might have attended to, or whether they actually fixated gesturing hands directly or rather some other part of the speaker’s gesture space. Further research is needed to probe visual attention to the speaker’s face during SI more precisely. This would be particularly relevant in combination with data on the potential effect of articulatory mouth movements on SI, especially in noisy speech.

Interestingly, overt visual attention was not always correlated with language comprehension. During the comprehension task, participants fixated the gesture space longer with semantically related gestures compared to with semantically unrelated gestures, and had shorter reaction times with semantically related gestures compared to with semantically unrelated gestures. In other words, semantically related gestures attracted more overt visual attention to the speaker’s gesture space, possibly because they were deemed informative, and were associated with faster RTs, whereas semantically unrelated gestures did not attract so much overt visual attention to the gesture space, perhaps because they were not considered as useful, and were not associated with an acceleration effect. Thus, processing seemed to dovetail with the comprehension outcome. However, in interpreters, while visual attention was sensitive to semantically related gestures, noisy speech and SI, language comprehension was not. Thus, although these variables influenced processing as measured through overt visual attention, the outcome in terms of language comprehension (accuracy and RTs) did not show any sign of modulation. This is in line with studies showing that fixations on gestures or signs do not necessarily lead to information uptake, while information uptake can take place without fixations (Emmorey et al., Citation2008; Gullberg & Kita, Citation2009; Özer et al., Citation2023). This suggests that while eye-tracking measures do not necessarily reflect language comprehension in the same way as accuracy and RTs, in combination with other measures, they can help shed more light on multimodal language comprehension in SI.

4.1. Conclusions

This study shows that during the processing of noisy speech, co-speech gestures that are semantically related to the content of speech can have a facilitatory effect on language comprehension in interpreters and other bilinguals regardless of interpretation experience. However, the SI task seems to modulate this effect, such that interpreters show no facilitation effect during SI of noisy speech. Although alternative noise levels should be carefully tested in similar set-ups to exclude the possibility that effects could not be detected because the noise level was too low, taken together, these results suggest that multimodal language comprehension during SI operates differently from other (bilingual) multimodal comprehension. It suggests important new avenues of study to better understand multimodal language processing during “extreme language use”.

Supplemental material

Supplemental Material

Download Zip (28.5 KB)

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data that support the findings of this study is available under the following DOI: 10.17605/OSF.IO/8XQ72

.

Notes

1 One additional participant could not be tested as calibration of the eye-tracker could not be performed.

2 E-mail addresses were found either through the AIIC registry of conference interpreters (https://aiic.org/site/dir/interpreters) or via a professional secretariat list.

3 The “A” language is the interpreter’s first language (or the language they speak best; it is an active language into which they work from all their other working languages). A “B” language is also an active language, in which the interpreter is perfectly fluent, but is not their first language. A “C” language is a passive language which the interpreter understands perfectly but into which they do not work – they will interpret from this language into their active language(s) AIIC. (Citation2019a). The AIIC A-B-C. https://aiic.org/site/world/about/profession/abc.

4 Three participants had two A languages: Spanish and English.

5 We did not annotate the video clips with a software such as Elan. Rather, we reported all measures coded using Avidemux in a master spreadsheet and excluded the video clips from the stimuli set if they displayed components which were not usable with the rest of the set (e.g. actress hesitations during the utterance, target verbs and gesture strokes not temporally aligned, gestures performed very peripherally, very long or very short gestures/target verbs, or very long or very short video clips overall compared to the rest of the stimuli set, etc.). This way, we ensured that the stimuli set was coherent and that the different subcategories were comparable.

7 One additional participant could not be tested as calibration could not be performed.

8 n = 20 as this question related to the translators’ language combination.

9 Same as above.

10 Number of years participants spent in a country or region where the relevant language is spoken

11 See above.

12 Number of years participants spent in a country or region where the relevant language is spoken.

13 See above.

References

  • AIIC. (2007). Checklist for simultaneous interpretation. https://aiic.org/document/4395/Checklist%20for%20simultaneous%20interpretation%20equipment%20-%20ENG.pdf
  • AIIC. (2019a). The AIIC A-B-C. https://aiic.org/site/world/about/profession/abc
  • AIIC. (2019b). Inside AIIC. https://aiic.org/site/world/about/inside
  • AIIC. (2020). AIIC Covid-19 Distance Interpreting Recommendations for Institutions and DI Hubs. https://aiic.org/document/4839/AIIC%20Recommendations%20for%20Institutions_27.03.2020.pdf
  • Amos, R. (2020). Prediction in interpreting [Doctoral dissertation, University of Geneva]. Archive ouverte. https://archive-ouverte.unige.ch/unige:148890
  • Anderson, L. (1994). Simultaneous interpretation: Contextual and translation aspects. In S. Lambert & B. Moser-Mercer (Eds.), Bridging the gap: Empirical research in simultaneous interpretation (pp. 101–120). John Benjamins.
  • Arbona, E., Seeber, K., & Gullberg, M. (2023). Semantically related gestures facilitate language comprehension during simultaneous interpreting. Bilingualism: Language and Cognition, 26(2), 425–439. https://doi.org/10.1017/S136672892200058X
  • Bacigalupe, L. A. (1999). Visual contact in simultaneous interpretation: Results of an experimental study. In A. Alvarez Lugris & A. Fernandez Ocampo (Eds.), Anovar/anosar. Estudios de traduccion e interpretacion (Vol. 1, pp. 123–137). Universidade de Vigo.
  • Balzani, M. (1990). Le contact visuel en interprétation simultanée: résultats d’une expérience (Français–Italien). In L. Gran & C. Taylor (Eds.), Aspects of applied and experimental research on conference interpretation (pp. 93–100). Campanotto.
  • Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01
  • Beattie, G., Webster, K., & Ross, J. (2010). The fixation and processing of the iconic gestures that accompany talk. Journal of Language and Social Psychology, 29(2), 194–213. https://doi.org/10.1177/0261927X09359589
  • Bühler, H. (1985). Conference interpreting: A multichannel communication phenomenon. Meta: Journal des Traducteurs, 30(1), 49–54. https://doi.org/10.7202/002176ar
  • Caniato, A. (2021). RSI sound myth buster: Ten misconceptions that result in RSI sounding terrible. https://aiic.org/site/blog/RSI-sound-myth-buster?language=fr_FR&
  • CAPE. (2021). CAPE survey confirms continued Parliamentary interpreters’ health and safety risks a year into the pandemic. https://www.acep-cape.ca/en/news/cape-survey-confirms-continued-parliamentary-interpreters-health-and-safety-risks-year
  • Church, R. B., Garber, P., & Rogalski, K. (2007). The role of gesture in memory and social communication. Gesture, 7(2), 137–158. https://doi.org/10.1075/gest.7.2.02bre
  • Dahl, T. I., & Ludvigsen, S. (2014). How I see what you're saying: The role of gestures in native and foreign language listening comprehension. The Modern Language Journal, 98(3), 813–833. 10.1111modl.12124
  • Davies, M. (2008). The corpus of contemporary American English (COCA): One billion words, 1990–2019. http://corpus.byu.edu/coca/
  • Debras, C. (2017). The shrug: Forms and meanings of a compound enactment. Gesture, 16(1), 1–34. https://doi.org/10.1075/gest.16.1.01deb
  • Dreschler, W. A., Verschuure, H., Ludvigsen, C., & Westermann, S. (2001). ICRA noises: Artificial noise signals with speech-like spectral and temporal properties for hearing instrument assessment. International Collegium for Rehabilitative Audiology, 40(3), 148–157. https://doi.org/10.3109/00206090109073110
  • Drijvers, L., & Özyürek, A. (2017). Visual context enhanced: The joint contribution of iconic gestures and visible speech to degraded speech comprehension. Journal of Speech, Language, and Hearing Research, 60(1), 212–222. https://doi.org/10.1044/2016_JSLHR-H-16-0101
  • Drijvers, L., & Özyürek, A. (2020). Non-native listeners benefit less from gestures and visible speech than native listeners during degraded speech comprehension. Language and Speech, 63(2), 209–220. https://doi.org/10.1177/0023830919831311
  • Drijvers, L., Vaitonytė, J., & Özyürek, A. (2019). Degree of language experience modulates visual attention to visible speech and iconic gestures during clear and degraded speech comprehension. Cognitive Science, 43(10), e12789. https://doi.org/10.1111/cogs.12789
  • Emmorey, K., & Özyürek, A. (2014). Language in our hands: Neural underpinnings of sign language and co-speech gesture. In M. S. Gazzaniga & G. R. Mangun (Eds.), The cognitive neurosciences (5th ed., pp. 657–666). MIT Press.
  • Emmorey, K., Thompson, R., & Colvin, R. (2008). Eye gaze during comprehension of American sign language by native and beginning signers. The Journal of Deaf Studies and Deaf Education, 14(2), 237–243. https://doi.org/10.1093/deafed/enn037
  • Galvão, E. Z. (2013). Hand gestures and speech production in the booth: Do simultaneous interpreters imitate the speaker? In C. Carapinha & I. A. Santos (Eds.), Estudos de linguística (Vol. II, pp. 115–130). Imprensa da Universidade de Coimbra.
  • Garone, A. (2021). Perceived impact of remote simultaneous interpreting on auditory health public data set. Mendeley Data, V1. 10.17632jvmf9gzpw8.1
  • Gerver, D. (1974). Simultaneous listening and speaking and retention of prose. Quarterly Journal of Experimental Psychology, 26(3), 337–341. https://doi.org/10.1080/14640747408400422
  • Gieshoff, A.-C. (2018). The impact of audio-visual speech input on work-load in simultaneous interpreting [Doctoral dissertation, Johannes Gutenberg-Universität Mainz]. Gutenberg Open Science. https://openscience.ub.uni-mainz.de/bitstream/20.500.12030/2182/1/100002183.pdf
  • Gullberg, M., & Holmqvist, K. (1999). Keeping an eye on gestures: Visual perception of gestures in face-to-face communication. Pragmatics & Cognition, 7(1), 35–63. https://doi.org/10.1075/pc.7.1.04gul
  • Gullberg, M., & Holmqvist, K. (2006). What speakers do and what addressees look at: Visual attention to gestures in human interaction live and on video. Pragmatics & Cognition, 14(1), 53–82. https://doi.org/10.1075/pc.14.1.05gul
  • Gullberg, M., & Kita, S. (2009). Attention to speech-accompanying gestures: Eye movements and information uptake. Journal of Nonverbal Behavior, 33(4), 251–277. https://doi.org/10.1007/s10919-009-0073-2
  • Hervais-Adelman, A., Moser-Mercer, B., & Golestani, N. (2015). Brain functional plasticity associated with the emergence of expertise in extreme language control. Neuroimage, 114, 264–274. https://doi.org/10.1016/j.neuroimage.2015.03.072
  • Holle, H., & Gunter, T. C. (2007). The role of iconic gestures in speech disambiguation: ERP evidence. Journal of Cognitive Neuroscience, 19(7), 1175–1192. https://doi.org/10.1162/jocn.2007.19.7.1175
  • Holle, H., Obleser, J., Rueschemeyer, S. A., & Gunter, T. C. (2010). Integration of iconic gestures and speech in left superior temporal areas boosts speech comprehension under adverse listening conditions. Neuroimage, 49(1), 875–884. https://doi.org/10.1016/j.neuroimage.2009.08.058
  • Hostetter, A. B. (2011). When do gestures communicate? A meta-analysis. Psychological Bulletin, 137(2), 297–315. https://doi.org/10.1037/a0022128
  • Hyönä, J., Tommola, J., & Alaja, A.-M. (1995). Pupil dilation as a measure of processing load in simultaneous interpretation and other language tasks. The Quarterly Journal of Experimental Psychology, 48(3), 598–612. https://doi.org/10.1080/14640749508401407
  • Ibáñez, A., Manes, F., Escobar, J., Trujillo, N., Andreucci, P., & Hurtado, E. (2010). Gesture influences the processing of figurative language in non-native speakers: ERP evidence. Neuroscience Letters, 471(1), 48–52. https://doi.org/10.1016/j.neulet.2010.01.009
  • International Organization for Standardization. (2016a). Simultaneous interpreting – Mobile booths – Requirements (ISO Standard no. 4043:2016). https://www.iso.org/standard/67066.html
  • International Organization for Standardization. (2016b). Simultaneous interpreting – Permanent booths – Requirements (ISO Standard no. 2603:2016). https://www.iso.org/standard/67065.html
  • International Organization for Standardization. (2017). Simultaneous interpreting – Quality and transmission of sound and image input – Requirements (ISO Standard no. 20108:2017). https://www.iso.org/standard/67062.html
  • Isham, W. P. (1994). Memory for sentence form after simultaneous interpretation: Evidence both for and against deverbalization. In S. Lambert & B. Moser-Mercer (Eds.), Bridging the gap: Empirical research in simultaneous interpretation (pp. 191–211). John Benjamins.
  • Jesse, A., Vrignaud, N., Cohen, M. M., & Massaro, D. W. (2000). The processing of information from multiple sources in simultaneous interpreting. Interpreting. International Journal of Research and Practice in Interpreting, 5(2), 95–115. https://doi.org/10.1075/intp.5.2.04jes
  • Kang, S., Hallman, G. L., Son, L. K., & Black, J. B. (2013). The different benefits from different gestures in understanding a concept. Journal of Science Education and Technology, 22(6), 825–837. https://doi.org/10.1007/s10956-012-9433-5
  • Kelly, S. D. (2017). Exploring the boundaries of gesture-speech integration during language comprehension. In R. B. Church, M. W. Alibali, & S. D. Kelly (Eds.), Why gesture? How the hands function in speaking, thinking and communicating (pp. 243–265). John Benjamins. https://doi.org/10.1075/gs.7.12kel
  • Kelly, S. D., Creigh, P., & Bartolotti, J. (2010a). Integrating speech and iconic gestures in a Stroop-like task: Evidence for automatic processing. Journal of Cognitive Neuroscience, 22(4), 683–694. https://doi.org/10.1162/jocn.2009.21254
  • Kelly, S. D., Healey, M., Özyürek, A., & Holler, J. (2015). The processing of speech, gesture, and action during language comprehension. Psychonomic Bulletin & Review, 22(2), 517–523. https://doi.org/10.3758/s13423-014-0681-7
  • Kelly, S. D., Ozyürek, A., & Maris, E. (2010b). Two sides of the same coin: Speech and gesture mutually interact to enhance comprehension. Psychological Science, 21(2), 260–267. https://doi.org/10.1177/0956797609357327
  • Kendon, A. (1995). Gestures as illocutionary and discourse structure markers in Southern Italian conversation. Journal of Pragmatics, 23(3), 247–279. https://doi.org/10.1016/0378-2166(94)00037-F
  • Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge University Press.
  • Król, M. E. (2018). Auditory noise increases the allocation of attention to the mouth, and the eyes pay the price: An eye-tracking study. PLoS One, 13(3), e0194491. https://doi.org/10.1371/journal.pone.0194491
  • Marian, V., Blumenfeld, H. K., & Kaushanskaya, M. (2007). The Language Experience and Proficiency Questionnaire (LEAP-Q): Assessing language profiles in bilinguals and multilinguals. Journal of Speech, Language, and Hearing Research, 50(4), 940–967. https://doi.org/10.1044/1092-4388(2007/067)
  • McNeill, D. (1985). So you think gestures are nonverbal? Psychological Review, 92(3), 350–371. https://doi.org/10.1037/0033-295X.92.3.350
  • McNeill, D. (1992). Hand and mind: What gestures reveal about thought. University of Chicago Press.
  • McNeill, D. (2005). Gesture and thought. University of Chicago Press. https://doi.org/10.7208/chicago/9780226514642.001.0001
  • Moser-Mercer, B., Frauenfelder, U., Casado, B., & Künzli, A. (2000). Searching to define expertise in interpreting. In B. Englund Dimitrova & K. Hyltenstam (Eds.), Language processing and simultaneous interpreting: Interdisciplinary perspectives (pp. 107–131). John Benjamins.
  • Nash, J. C. (2014). On best practice optimization methods in R. Journal of Statistical Softwares, 60(2), 1-14. https://doi.org/10.18637/jss.v060.i02
  • Nour, S., Struys, E., & Stengers, H. (2019). Attention network in interpreters: The role of training and experience. Behavioral Sciences, 9(4), 43. https://doi.org/10.3390/bs9040043
  • Obermeier, C., Dolk, T., & Gunter, T. C. (2012). The benefit of gestures during communication: Evidence from hearing and hearing-impaired individuals. Cortex, 48(7), 857–870. https://doi.org/10.1016/j.cortex.2011.02.007
  • Özer, D., & Göksun, T. (2020). Visual-spatial and verbal abilities differentially affect processing of gestural vs. spoken expressions. Language, Cognition and Neuroscience, 35(7), 896–914. https://doi.org/10.1080/23273798.2019.1703016
  • Özer, D., Karadöller, D. Z., Özyürek, A., & Göksun, T. (2023). Gestures cued by demonstratives in speech guide listeners’ visual attention during spatial language comprehension. Journal of Experimental Psychology: General, 152(9), 2623–2635. https://doi.org/10.1037/xge0001402
  • Özyürek, A. (2014). Hearing and seeing meaning in speech and gesture: Insights from brain and behaviour. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1651), 20130296. https://doi.org/10.1098/rstb.2013.0296
  • Özyürek, A. (2018). Role of gesture in language processing: Toward a unified account for production and comprehension. In S.-A. Rueschemeyer & M. G. Gaskell (Eds.), Oxford handbook of psycholinguistics (2nd ed., pp. 592–607). Oxford University Press.
  • Özyürek, A., Willems, R. M., Kita, S., & Hagoort, P. (2007). On-line integration of semantic information from speech and gesture: Insights from event-related brain potentials. Journal of Cognitive Neuroscience, 19(4), 605–616. https://doi.org/10.1162/jocn.2007.19.4.605
  • R Core Team. (2013). R: A language and environment for statistical computing. http://www.R-project.org/
  • Rennert, S. (2008). Visual input in simultaneous interpreting. Meta, 53(1), 204–217. https://doi.org/10.7202/017983ar
  • Rennig, J., Wegner-Clemens, K., & Beauchamp, M. S. (2020). Face viewing behavior predicts multisensory gain during speech perception. Psychonomic Bulletin & Review, 27(1), 70–77. https://doi.org/10.3758/s13423-019-01665-y
  • Riccardi, A. (1996). Language-Specific strategies in simultaneous interpreting. In C. Dollerup & V. Appel (Eds.), New horizons – teaching translation and interpreting (pp. 213–222). John Benjamins.
  • Rimé, B., Boulanger, B., & d'Ydevalle, G. (1988, August 28–September 2). Visual attention to the communicator's nonverbal behavior as a function of the intelligibility of the message [paper presentation]. 24th International Congress of Psychology, Sydney, Australia.
  • Rogers, W. T. (1978). The contribution of kinesic illustrators toward the comprehension of verbal behavior within utterances. Human Communication Research, 5(1), 54–62. https://doi.org/10.1111/j.1468-2958.1978.tb00622.x
  • Seeber, K. G. (2011, May 12-14). Multimodal input in simultaneous interpreting. An eye-tracking experiment [paper presentation]. 1st International Conference TRANSLATA, Translation & Interpreting Research: Yesterday – Today – Tomorrow, Innsbruck, Austria.
  • Seeber, K. G. (2017). Multimodal processing in simultaneous interpreting. In J. W. Schwieter & A. Ferreira (Eds.), The handbook of translation and cognition (pp. 461–475). Wiley Blackwell. https://doi.org/10.1002/9781119241485.ch25
  • Seeber, K. G., & Fox, B. (2021). Distance conference interpreting. In M. Albl-Mikasa, & E. Tiselius (Eds.), The Routledge handbook of conference interpreting (pp. 491–507). Routledge. https://doi.org/10.4324/9780429297878
  • Snodgrass, J. G., & Vanderwart, M. (1980). A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity, and visual complexity. Journal of Experimental Psychology: Human Learning and Memory, 6(2), 174–215. https://doi.org/10.1037/0278-7393.6.2.174
  • Spinolo, N., & Chmiel, A. (2021). Inside the virtual booth: the impact of remote interpreting settings on interpreter experience and performance. https://www.aiic.org.br/site/blog/the-virtual-booth
  • Stachowiak-Szymczak, K. (2019). Eye movements and gestures in simultaneous and consecutive interpreting. Springer.
  • Streeck, J. (2008). Gesture in political communication: A case study of the democratic presidential candidates during the 2004 primary campaign. Research on Language and Social Interaction, 41(2), 154–186. https://doi.org/10.1080/08351810802028662
  • Strobach, T., Becker, M., Schubert, T., & Kühn, S. (2015). Better dual-task processing in simultaneous interpreters. Frontiers in Psychology, 6, 1590–1590. https://doi.org/10.3389/fpsyg.2015.01590
  • Sueyoshi, A., & Hardison, D. M. (2005). The role of gestures and facial cues in second language listening comprehension. Language Learning, 55(4), 661–699. https://doi.org/10.1111/j.0023-8333.2005.00320.x
  • Szekely, A., Jacobsen, T., D'Amico, S., Devescovi, A., Andonova, E., Herron, D., Lu, C. C., Pechmann, T., Pl��h, C., Wicha, N., Federmeier, K., Gerdjikova, I., Gutierrez, G., Hung, D., Hsu, J., Iyer, G., Kohnert, K., Mehotcheva, T., Orozco-Figueroa, A., … Bates, E. (2004). A new on-line resource for psycholinguistic studies. Journal of Memory and Language, 51(2), 247–250. https://doi.org/10.1016/j.jml.2004.03.002
  • Tommola, J., & Lindholm, J. (1995). Experimental research on interpreting: Which dependent variable? In J. Tommola (Ed.), Topics in interpreting research (pp. 121–133). University of Turku.
  • Vatikiotis-Bateson, E., Eigsti, I. M., Yano, S., & Munhall, K. G. (1998). Eye movement of perceivers during audiovisual speech perception. Perception & Psychophysics, 60(6), 926–940. https://doi.org/10.3758/BF03211929
  • Yudes, C., Macizo, P., & Bajo, T. (2011). The influence of expertise in simultaneous interpreting on non-verbal executive processes. Frontiers in Psychology, 2, 309–309. https://doi.org/10.3389/fpsyg.2011.00309