211
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Optimising virtual object position for efficient eye-gaze interaction in Hololens2

&
Article: 2337765 | Received 22 Nov 2023, Accepted 18 Mar 2024, Published online: 14 Apr 2024

ABSTRACT

Our study explored eye-tracking technology in the Hololens2 HMD. We assessed the effectiveness of eye-gaze interactions in a 3D environment, particularly in text entry applications. Existing recommendations for the placement and size of virtual objects are often followed without empirical validation. Therefore, we evaluated text entry target selection rates within the manufacturer’s specified optimal gaze interaction zone. We measured the spatial accuracy and precision of eye gaze data and optimised target positions for enhanced text entry performance. By establishing confidence ellipses covering 95% of gaze points per target, we derived an Area of Interest (AOI) recalibration function. Applying a receiver operating characteristic-based method, we quantified the recalibrated tracker’s performance at various AOIs. Our results indicate an optimal recalibrated AOI radius is 0.036 m, and the ideal object-plane distance from the eye-plane is 2.25 m. This recalibrated specification allows to efficiently interact with Hololens2 via eye movement, achieving target selection rates exceeding 90%.

1. Introduction

In the past decade, many augmented reality head-mounted display (AR HMD) headsets have become available to researchers. Text entry is an essential task in these headsets. Previous studies have proposed several techniques for text entry in AR HMDs using combinations of head and hand control (Hang Lee et al. Citation2019; Xu et al. Citation2019). However, these mid-air text entry techniques have induced hand/arm fatigue during experiments (Lu et al. Citation2021). Moreover, these body movements often cannot be performed reliably by individuals with physical impairments. Since some very recent HMDs have incorporated eye-tracking technology, eye-gaze based text entry in AR HMD has become a topic of increasing interest.

Hololens2 is one of the recent AR HMDs with integrated eye-tracking technology for low-effort text entry. Nevertheless, modest target selection rate (TSR) and confounding eye saccades remain challenges to eye-gaze based text entry in this headset (Zhao and Li Citation2021). TSR in this study is defined as the number of correctly selected targets divided by the total number of activated virtual objects in a distinct text entry plane and depends heavily on the spatial accuracy of the eye-tracker.

A saccade herein refers to a rapid and conjugate eye movement that voluntarily shifts the eyes from one target to another while typing (Goffart Citation2009). Many previous studies have tried to reduce saccades in eye gaze-based applications by optimising the distance between keyboard characters, text entry layout and the distance from the plane of the eyes to the text entry plane (Richard and Howell Citation2002; Hansen et al. Citation2003; Huckauf and Mario Citation2008; Rajanna and Paulin Hansen Citation2018; Hang Lee et al. Citation2019; Benligiray et al. Citation2019). For example, a previous study (Benligiray et al. Citation2019) found that a circular text entry layout outperforms conventional rectangular layouts in terms of minimising saccades and maximising the number of words typed per minute. The reduction in size of and distance between keyboard characters may also limit the number of eye saccades (Rajanna and Paulin Hansen Citation2018). However, there is still a trade-off between increasing TSR and decreasing eye saccades. Therefore, a quantitative assessment of eye-tracking performance is critical to optimising target positioning.

Researchers often resort to the manufacturer’s specifications rather than empirically characterising eye tracking performance. According to the manufacturer, for an optimal experience, eye-gazed based interaction with virtual objects (e.g. virtual keys) in Hololens2 should be positioned between 1.25 m and 5 m from the plane of the eyes, whereas content such as text, should be situated closer than 1 m from the plane of the eyes for readability. Finally, the recommended optimal viewing angle for content is between 0° and 35° below the horizon, as shown in (left panel).

Figure 1. (Left) the manufacturer’s reported optimal zone for Hololens2, (right) Image geometry for Hololens2 showing the Cartesian coordinates of a single target (red dot) in the target plane, the target view angle (θ), the distance from the eye plane to the target plane(z,), the head gaze centre (o) in the target plane, and the position (O) of the eyes in the eye plane.

Figure 1. (Left) the manufacturer’s reported optimal zone for Hololens2, (right) Image geometry for Hololens2 showing the Cartesian coordinates of a single target (red dot) in the target plane, the target view angle (θ), the distance from the eye plane to the target plane(z,), the head gaze centre (o) in the target plane, and the position (O) of the eyes in the eye plane.

Prior research has predominantly evaluated the quality (accuracy and precision) of eye gaze data by focusing on the view angle (θ) towards a target (, right panel). This often involves assuming a fixed distance (z,), between the target and the eye planes (eye-to-target plane distance), with only the x,, and y, positions of the target varying (Aziz and Komogortsev Citation2022; Wei et al. Citation2023; Lei et al. Citation2023). To the best of our knowledge, only a limited number of studies have attempted to quantify the quality of eye gaze data in augmented reality headsets, considering variations in all three Cartesian components (x,, y,, z,) while maintaining a consistent view angle (Bozkir et al. Citation2023).

In this study, we assessed the impact of varying the placement of virtual objects on TSR in the context of spatially constrained gaze interactions typical of gaze-based text entry within an HMD. We anticipated that TSR would change when the eye-to-target plane distance varies, despite a constant view angle. The results of our study suggest that achieving a high TSR within the optimal zone reported by the manufacturer is still a significant challenge, especially when it comes to spatially constrained gaze interactions. Subsequently, we determined the ideal target position in relation to the eyes by examining metrics such as spatial accuracy and precision as indicators of eye tracking data quality. Additionally, we analysed how these metrics change with post hoc recalibration. We identified disparities between the manufacturer’s specified signal quality and our empirical results.

2. Method

2.1. Participants

Twenty typically developed adults aged between 18 and 35 years, 9 males, without any cognitive, physical, or ocular impairment, were recruited. All reported having either normal or corrected to normal vision. Five participants wore glasses during the experiment as the device is glasses-friendly, and eight participants had experience using HMD headsets before. However, none of the participants had experience with the augmented reality environment. All participants provided informed written consent. The protocol was approved by the research ethics board of Holland Bloorview Kids Rehabilitation Hospital and the University of Toronto.

2.2. Stimuli and data collection

The HoloLens 2 device provides a 52° diagonal field of view, records eye gaze data with a sampling rate of 30 Hz and has a nominal spatial accuracy of 1.5°. In this study, we made use of both head and eye tracking data provided by Hololens2. We developed the tasks in a 3D environment with Unity 2021.3 LTS. The virtual objects resided on a 2D plane that remained at the head gaze centre to prevent participants from searching for a target sphere outside of their initial field of view, as doing so would introduce a secondary search task. This experiment employed a within-subjects design with two eye gaze-based interaction variables: target area of interest (AOI) and the eye-to-target plane distances. Experiment 1 investigated the effect of these two variables on TSR, considering a consistent view angle. In Experiment 2, the quality (i.e. spatial accuracy and precision) of the eye gaze data was assessed to optimise the target area of interest and the eye-to-target plane distance. Participation in both experiments, including preparation time, totalled about one hour.

2.2.1. Experiment 1: the effect of target area of interest and eye-to-target plane distance on TSR

The participants donned the headset while sitting 1 metre away from a white wall. The built-in calibration of Hololens2 was automatically invoked. A plane containing 30 transparent interactive virtual objects with keyboard characters was displayed to the user, as shown in .

Figure 2. Experimental set up (a) Virtual location of 4 different target planes at different eye-to-target plane distances, corresponding to Tasks 1–4. (b) Depiction of 4 differently sized target AOIs corresponding to Tasks 5–8. (c) The virtual object layout on a target plane with a specific AOI for every target.

Figure 2. Experimental set up (a) Virtual location of 4 different target planes at different eye-to-target plane distances, corresponding to Tasks 1–4. (b) Depiction of 4 differently sized target AOIs corresponding to Tasks 5–8. (c) The virtual object layout on a target plane with a specific AOI for every target.

To accommodate tracking errors, the target AOI was defined as an invisible area around every object. Eye gaze activated the target when gaze coordinates fell within the target’s AOI continuously for 2 sec. The AOI was invisible to the user; only the contents were observable. The contents were white to minimally contrast the white wall background and emulate the worst-case, low foreground-background contrast scenario. In , the background is displayed in black for clarity. However, in the experiment, the background was a white wall. For each trial, the participant was cued with a randomly selected object’s content at the centre of the target plane for 8 seconds. During this time, the participant sought the cued target among the virtual objects and dwelled for at least 2 seconds over the target’s AOI to activate it. To confirm that readability did not affect the TSR, participants verbally named the cue when it appeared.

Trials with each of the 30 virtual objects proceeded as above, in random order, iterating through all 8 tasks (). The displayed content and the activated targets were automatically recorded, yielding a total of 3200 data points across all trials and participants. Including rest periods of 30 seconds between tasks, this experiment lasted between 16 and 36 minutes. To simulate an actual application, users were allowed to move their heads naturally. The task was repeated with various target AOI sizes and eye-to-target plane distances, as illustrated in . shows the position parameters corresponding to .

Table 1. Tasks 1–4 parameters for testing various eye-to-target plane distance for a fixed target AOI corresponding to , Task 5–8 parameters for testing various target AOIs at fixed a target-to-eye plane distance corresponding to .

With a constant target AOI diameter (0.05 m), the target visual angle decreased in tasks 1–4 from 1.9° to 0.76°, corresponding to changes in eye-to-target plane distances, namely, from 1.5 to 3.75 m. Subsequently, in tasks 5–8, the plane distance was fixed at 1.5 m, and the target visual angle varied according to , corresponding to the variations in target AOI from 0.05 to 0.02 m.

Table 2. Mean accuracy and precision for the 30 different targets across all subjects for task 1–4.

We measured TSR across various eye-to-target plane distances and target AOIs in search of the combination that would yield the smallest decrease in TSR for a decrease in visual angle (i.e. either a decrease in AOI or a target plane closer to the eye plane). We empirically defined the optimal target location (eye-to-target plane distance and view angle) as that which yielded a TSR above 90% with the least view angle, to minimise the magnitude of the preceding eye saccade.

2.2.2. Experiment 2: eye gaze data quality assessment

We estimated the quality (spatial accuracy and precision) of the eye gaze data for the best 4 tasks as determined in Experiment 1. For Experiment 2, the letter targets were replaced with red fixation targets in the locations shown in .

Figure 3. The location of fixation targets on an x-y plane for Experiment 2. Note that during the experiment, only one fixation target was shown at a time.

Figure 3. The location of fixation targets on an x-y plane for Experiment 2. Note that during the experiment, only one fixation target was shown at a time.

In each trial, a single fixation target (in red) was randomly displayed for 3 seconds against a white background. The participant was asked to dwell on the fixation target currently shown. After 300 ms during which no fixation targets were shown, the next unique random fixation target was presented. Unlike the previous experiment, head movements were restricted by asking participants to remain in constant contact with a chinrest. The above was repeated for all 29 fixation targets.

For each trial, we extracted 100 eye gaze samples over a 3.3 sec interval (3 sec fixation and 300 ms rest). Eye blinks yielded null samples during this interval. The total data collection time of Experiment 2 was approximately 7 minutes, yielding roughly 240,000 eye gaze samples. The tasks were ordered consistently for all participants, but fixation targets within each task were randomised.

Due to the varying durations of blinks across trials, the number of non-null samples for every fixation target also fluctuated (F=25.98,p<0.001).

The non-null samples recorded for one participant are exemplified in . The blue dots in represent the collected eye gaze samples (eye tracker data), whereas the red ‘x’ markers represent the fixation targets (head tracker data). The spatial variance of the fixation targets despite the use of the chinrest was observed due to slight head movements. The unfilled red circles represent the AOIs.

Figure 4. The Hololens head and eye tracker data from 1 user. The red x’s correspond to the position of the fixation targets (head tracking data) while the blue dots are the position of gaze samples (eye tracker data) in the x-y plane. The areas of interest (AOIs; red circles) are overlaid for illustrative purposes only.

Figure 4. The Hololens head and eye tracker data from 1 user. The red x’s correspond to the position of the fixation targets (head tracking data) while the blue dots are the position of gaze samples (eye tracker data) in the x-y plane. The areas of interest (AOIs; red circles) are overlaid for illustrative purposes only.

2.3. Artifact removal

A gaze tracker can fail to estimate the gaze direction as a consequence of incorrect calibration, physical features of the eye such as shape and colour, external factors such as glasses or makeup, or even blinking. These tracking faults may lead to data loss and introduce noticeable spikes in the eye gaze signal, negatively impacting eye gaze ray tracking, and may vary in severity across different regions of tracking or change over time due to lighting, pupil size, or slippage of the head-mounted display in AR/VR settings. Although we could not prevent these artefacts, we mitigated their impact as follows.

2.3.1. Eye saccade removal

A segment of data after each fixation activation was removed given that participants needed time to fixate on the next target. In so doing, we limited the effect of eye saccade landing error on our subsequent analyses. The number of saccade samples may vary while gazing at different fixation targets depending in part, on the number of blinks, the velocity of the moving eyes and the distance between successive fixation targets. Herein, we applied a Bayesian online learning algorithm to partition the eye gaze data into dwell and saccade samples (Tafaj et al. Citation2012). This algorithm has been favourably evaluated in various studies (Kasneci Citation2013; Enkelejda et al. Citation2015; Kübler et al. Citation2015) and assumes that the distances between consecutive gaze samples while dwelling at a fixation target follow a Gaussian distribution while distances between gaze samples pertaining to a saccade belong to a different Gaussian distribution. The parameters of these Gaussian distributions (mean and standard deviation) were determined via maximum likelihood estimation for each individual fixation target. A random eye gaze sample landing within one standard deviation of the mean of the Gaussian distribution of the fixation samples was classified as an eye dwell. This method does not require the setting of arbitrary thresholds or limiting experimental conditions. Segmentation of gaze samples was thus target-specific rather than naively uniform across targets.

2.3.2. Head movement removal

We also mitigated the influence of head position on eye gaze estimation accuracy. The resting eye and head, even when using a chinrest, rarely displayed perfect stability (Skavenski et al. Citation1979). Although our intention was to keep both the participant and the target stationary in the world reference frame using a chinrest, demonstrates noticeable oscillation among the fixation targets and eye gaze samples. This oscillation arises from the movement of the target plane in each frame, indicating the necessity for online head movement correction. The Hololens2 eye gaze samples take into account estimates of head and eye positions in the world reference frame as shown in .

Figure 5. Change of the target plane position in the world reference frame as a consequence of head movement. However, the fixation target and eye gaze sample remained fixed in the target plane reference frame because of Hololens2-inbuilt online head movement correction, which accounted for head motion and the corresponding change in user eye position.

Figure 5. Change of the target plane position in the world reference frame as a consequence of head movement. However, the fixation target and eye gaze sample remained fixed in the target plane reference frame because of Hololens2-inbuilt online head movement correction, which accounted for head motion and the corresponding change in user eye position.

To distinguish between movements originating from the head and those originating from the eyes, we utilised the Solver Handler tool within the Microsoft Mixed Reality Toolkit. This tool ensured that the centre of the target plane remained aligned with the direction of the ray cast from the head, enabling the target plane to synchronise its movement with the head. This approach also kept the fixation point position consistently aligned with the point of the ray cast from the head. Consequently, by stabilising alterations in the target plane and establishing a mapping of all subsequent movements to the initial position, we effectively eliminated the influence of head movement.

2.4. Evaluation

2.4.1. Spatial accuracy and precision

During target fixation, we calculated tracking accuracy and precision. Accuracy was quantified in terms the mean Cartesian distance (Friedman et al. Citation2021), as well as the mean difference in view angle (Holmqvist et al. Citation2012) between the fixation target and the estimated eye gaze positions. To improve accuracy, it is possible to recalibrate the system by mapping the target AOI to the AOI of the actual eye gaze position using either explicit (Flatla et al. Citation2011; Fares et al. Citation2013; Pfeuffer et al. Citation2013; Ramirez Gomez and Gellersen Citation2018) or implicit (Fares et al. Citation2013; Sidenmark and Lundström Citation2019) procedures.

Precision was estimated as the spread of the eye gaze samples when the user was fixating on the target. Specifically, precision was quantified both as the mean distance and mean difference in view angle between each eye gaze sample and the mean of the cluster of gaze samples under consideration. Techniques such as filtering (Špakov Citation2012; Casiez et al. Citation2012; Maria Feit et al. Citation2017) can be used to address poor precision. When a user fixated on the centre of a target, the captured eye gaze data were assumed to be normally distributed in the x and y directions around a mean, offset from the target centre (reflected in the accuracy measure) and with a certain covariance (captured by the precision measure) (Lei et al. Citation2023). This distribution of eye gaze data was used to derive the appropriate target AOI. The confidence ellipse was used to visualise the AOI superimposed over the normally distributed samples and thereby provide a 2D confidence interval. A 95% confidence ellipse has been recommended for robust and smooth eye gaze interaction (Aziz and Komogortsev Citation2022). The centre of the confidence ellipse was the mean of the corresponding eye gaze data, and thus termed the gaze centre for the cognate fixation target.

2.4.2. Recalibration

The offset between the centre of the confidence ellipse and the position of the corresponding fixation target necessitated a function to recalibrate the target AOI for unknown samples, even though the built-in manufacturer’s calibration had been performed at the start of the experiment. The most common choice for recalibration is a first-order linear polynomial equation function:

(1) Xr=xr1xr30,Yr=yr1yr30,Xo=xo1xo30,Yo=yo1yo30(1)
(2) Xr=AxXo+BxYo+Cx(2)
(3) Yr=AyXo+ByYo+Cy(3)

where the coefficients Ax,Bx,Cx and Ay,By,Cy were estimated using linear regression. Xo,Yo and Xr,Yr are arrays of the positions of 30 original (i.e. from recorded gaze samples) and recalibrated target AOIs, respectively.

The receiver operating characteristic (ROC) quantified the recalibrated tracker’s performance at different target AOI thresholds. Herein, the target AOI threshold (ρ) refers to a unique radius of a circular AOI centred around the recalibrated position xri,yri of the ith target. We defined positive and negative cases for the original and recalibrated AOIs of the ith fixation target as follows:

  • A participant’s gaze centre (average eye gaze position for ith target) was considered a positive case with respect to the recalibrated AOI, if the offset between that gaze centre and xri,yri < ρ

  • A participant’s gaze centre was considered a positive case with respect to the original AOI, if the offset between that gaze centre and xoi,yoi < ρ

  • A participant’s gaze centre was considered a negative case with respect to the recalibrated AOI, if the offset between that gaze centre and xri,yri > ρ

  • A participant’s gaze centre was considered a negative case with respect to the original AOI, if the offset between that gaze centre and xoi,yoi > ρ

The true positive rate for the ithtarget was thus defined as the number of positive cases for the ithtarget divided by the total number of participants, while the false positive rate was the number of negative cases divided by the total number of participants. The area under the curve (AUC) was used to evaluate the quality of the recalibration.

3. Results

3.1. Target selection rate

depicts the TSR for tasks 1–8 across different target view angles in Experiment 1.

Figure 6. Target Selection Rate (TSR) for various view angles, resulting from different eye-to-target plane distances and target AOIs.

Figure 6. Target Selection Rate (TSR) for various view angles, resulting from different eye-to-target plane distances and target AOIs.

View angles stem from different eye-to-target plane distances and target AOIs. The TSR for a fixed target AOI (task 1–4) exceeded 90% for view angles above 1.53°. The standard deviation in TSR increased with decreasing view angle. The optimal target view angle and distance were determined to be 1.53° and 2.25 m, respectively. As task 1–4 yielded better TSRs than tasks 5–8, the target planes and AOIs of the former were used in Experiment 2.

3.2. Saccade removal

shows the eye gaze behaviour of Subject 9 while a fixation target was shown.

Figure 7. Eye gaze samples collected for one target for Subject 9. a) Density of offsets between eye gaze samples and the target (Mean = 0.01090 m). b) the corresponding distribution in the X-Y plane. c) the offset plotted against temporally ordered eye gaze samples.

Figure 7. Eye gaze samples collected for one target for Subject 9. a) Density of offsets between eye gaze samples and the target (Mean = 0.01090 m). b) the corresponding distribution in the X-Y plane. c) the offset plotted against temporally ordered eye gaze samples.

One hundred eye gaze samples were recorded as non-blink samples. Each eye gaze sample was labelled as either saccadic or dwell-related. Upon inspecting the positively skewed histogram of offsets between eye gaze samples and the target (), the classification threshold was set as one standard deviation from the mean (solid vertical black line). Samples within the pink area (one standard deviation around the mean) were deemed dwell-related while those beyond were designated as saccadic.

portray the demarcation of eye gaze samples from two different perspectives. shows the eye gaze samples in a 2-dimensional plane with those outside the pink boundary (dwell region) being considered as part of the preceding eye saccade. shows the offset across eye gaze samples while the fixation target was visible. The pink highlighted area corresponds to that in .

3.3. Accuracy and precision

Artifacts introduced by alterations in lighting, pupil size, or head gear slippage can have a detrimental impact on eye-gaze based interaction (Drewes et al. Citation2012; Niehorster et al. Citation2020; Holmqvist et al. Citation2022). However, in this study, the tracking quality remained stable for each participant. A multivariate analysis of variance revealed no significant differences in accuracy or precision between the first and last target fixations for each participant (F = 2.3240, p < 0.05). The Cartesian and polar measures of accuracy and precision for all 30 targets across participants are presented in .

Despite the constant diameter of the target AOI (0.05 m), measures of precision and accuracy exhibited variation. While the offset measure for precision (2nd last column of ) deteriorated with increasing eye-to-target plane distance, these values were relatively small, ranging from 4.2% to 10% of the target AOI diameter. Therefore, we replaced the cluster of eye gaze samples proximal to a target with the average gaze position (gaze centre) of those samples.

The Cartesian accuracy, represented by the mean offset between gaze centre and gaze samples (5th column of ) diminished with increasing eye-to-target plane distance. For accurate target selection, the radius of the target AOI must be sufficiently large to compensate for gaze tracking errors. In this experiment, the target AOI radius was 0.025 m, and the mean offset between fixation target and recorded eye gaze positions (5th column of ) for eye-to-target plane distances of 2.25 m and below was less than the target AOI radius (<0.025 m). The optimal target position was that which maximised both accuracy and eye-to-target plane distance, i.e. minimised the spatial extent of inter-target eye saccades. Accordingly, the optimal eye-to-target plane distance was 2.25 m yielding accuracy measures of 0.0201 m and 0.51°, which is consistent with the best target position of Experiment 1 in terms of TSR.

A reduction in the view angle (4th column, ) due to objects being positioned at a greater distance from the eye plane led to an increase in the angular measures of eye gaze accuracy (6th column, ) and precision (last column, ). However, the Cartesian accuracy decreased. The discrepancy between the accuracies cited by the manufacturer and those recorded in this study within the manufacturer’s ‘optimal zone’, across different eye-to-target plane distances raises concerns about the accuracies reported in other studies (Kapp et al. Citation2021; Aziz and Komogortsev Citation2022).

3.4. Area of interest

resents confidence ellipses contoured around participants’ gaze centre at four different eye-to-target plane distances from the eye plane.

Figure 8. 95% confidence ellipses (purple outlines) fitted to the eye gaze data of each participant (grey dots), for all fixations targets (red dots), at each plane distance (specified at the top of each graph). A solid cross indicates the gaze center.

Figure 8. 95% confidence ellipses (purple outlines) fitted to the eye gaze data of each participant (grey dots), for all fixations targets (red dots), at each plane distance (specified at the top of each graph). A solid cross indicates the gaze center.

Clearly, the amount of intersection between confidence ellipses of neighbouring targets was directly related to eye-to-target plane distance. At an eye-to-target plane distance at or below 2.25 m, overlaps were negligible and accurate target selection was enabled.

As shown in , the eye gaze centre (black cross) was generally shifted to the left of the fixation targets (red dots) due to negative eye tracker offsets in both vertical and horizontal directions. Furthermore, the absolute value of this offset increased as the eye-to-target plane distance lengthened. The standard deviation of the offset also increased as the target plane distance was extended, indicating a greater degree of instability in the amount of offset. At all eye-to-target plane distances, the horizontal offset was significantly greater than the vertical offset. For instance, at a target plane distance of 2.25 m, the mean horizontal offset measured 0.0137 m while the vertical offset was only 0.0050m. Mean offsets between the fixation targets (red dots in ) and the corresponding eye gaze centers (black crosses in ) are summarized in for all the eye-to-target plane distances.

Figure 9. Vertical and horizontal offset between fixation targets and the eye gaze center at different eye-to-target plane distances.

Figure 9. Vertical and horizontal offset between fixation targets and the eye gaze center at different eye-to-target plane distances.

3.5. Recalibration

The coefficients of the linear transformation between original (Xo, Yo) and recalibrated (Xr, Yr) gaze sample coordinates (EquationEquations 2 and Equation3) are shown in .

Table 3. The coefficients of EquationEquations (2) and (Equation3) for different eye-to-target plane distances (tasks 1–4), the confidence intervals around the coefficients were bounded by ±5×103m.

We note that coefficients Ax and Ay as well as Bx and By decreased as the eye-to-target plane distance increased. At the same time, the intercepts for both equations (Cx and Cy)increased as the eye-to-target plane distance increased. These observations confirm that the dependence of the recalibrated position (Xr, Yr) on the original position (Xo,Yo)of the target was reduced as expected, as the offset between these positions increased with greater interplane separation, necessitating a larger correction (Cx and Cy). Clearly, to maintain accuracy of eye tracking, recalibration is necessary as the eye-to-target plane distance increases. Note that recalibration did not change the location of the original target displayed on screen but rather adjusted the original target AOI, which was invisible to the user.

3.6. Performance evaluation

Using the coefficients in , the recalibration equations for the optimal eye-to-target plane distance of 2.25 m are given as,

(4) Xr=0.9606Xo+0.0040Yo0.0137(4)
(5) Yr=0.0167Xo+0.9792Yo0.0050(5)

The intercept or horizontal correction in EquationEquation 4, i.e. of 0.0137 m, is an order of magnitude larger than the vertical correction in EquationEquation 5, i.e. 0.0050 m (F = 636.16, p < 0.001), which corroborates the observation that horizontal rather than vertical offsets dominated in and 9. The coefficients for Yo in EquationEquation 4 and Xo in EquationEquation 5 were an order of magnitude smaller than those of the other independent variables and hence, we replaced the bivariate polynomial EquationEquations 4 and Equation5 with univariate approximations:

(6) XrXo0.0137(6)
(7) YrYo0.0050(7)

compares the ROC curve determined with the recalibrated and original AOIs. The AOI threshold,ρ, ranged from 0.01 m to 0.06 m to construct the ROC curve.

Figure 10. ROC curves based on the original and recalibrated target AOI.

Figure 10. ROC curves based on the original and recalibrated target AOI.

As depicted in , the ROC’s AUC improved after recalibration, approaching a value closer to one. The optimal ρ was found to be approximately 0.036 m, where the curve was nearest to the point (0, 1). Consequently, the optimal target Area of Interest (AOI) was determined to be a circular AOI positioned at the recalibrated centre, boasting a radius of 0.036 m (equivalent to a 0.072 m diameter). Utilizing this finely tuned AOI, participants’ gaze centres for a given target were more accurately associated with the correct corresponding fixation target. The equivalent 0.072 m diameter of the original target AOI indicates the necessity of increasing the target size from 0.005 m (in Experiment 2) to 0.072 m.

4. Discussion

The efficiency of eye-gaze interaction in a 3D environment with 2D virtual objects depends on their placement. This is particularly important for augmented reality applications like text entry, navigation systems, museums tours, and educational tools, where users are typically presented with many virtual objects. The current study maximised target selection rate in a spatially constrained gaze interaction experiment in Hololens2 by optimising the target position based on the manufacturer’s specifications. With typically developed adults, we determined the following optimal settings for virtual object placement to maximise TSR: target view angle of 1.53 degrees, eye-to-target plane distance should be 2.25 m, the optimal target AOI diameter (after recalibration) is 0.072 m. With these settings, TSRs exceeding 90% were generally achievable by participants. Tracking quality did not significantly change for a given participant over the duration of the experiment, suggesting minimal, if any, effect of eye fatigue.

4.1. Maximizing TSR

Results obtained in Experiment 1 () confirm that for every pair of tasks at the same view angle, i.e., (tasks 1 and 5), (tasks 2 and 6), (tasks 3 and 7) and (tasks 4 and 8), accuracy varied as a function of the target’s Cartesian coordinates (x, y, z). For example, TSR obtained in task 2 was higher than the threshold of 90%. However, at the same view angle, TSR achieved in task 6 (same view angle but smaller AOI and shorter eye-to-target plane distance) was below the threshold. These results emphasise the necessity of considering the target’s 3-dimensional location when assessing the quality of an eye tracker. We also found that increases in the eye-to-target plane distance reduced the TSR less drastically than did a reduction in the target AOI. Therefore, increasing eye-to-target plane distance is preferred over target AOI reduction as a strategy to minimise eye saccades.

4.2. Accuracy and precision

The manufacturer specifies an accuracy of 1.5  within the reported optimal region. However, our empirical findings ( and ) revealed variations in accuracy across different distances within the same region. Specifically, at the optimal reported eye-to-target plane distance in this study (2.25 m), the accuracy prior to recalibration was 0.51  which is much better than that the 1.5  cited by the manufacturer. However, the manufacturer did not provide details regarding the specific methods employed to obtain their reported accuracy. It is possible that their assessment incorporated additional factors, such as user comfort, to determine optimal accuracy.

Literature indicates that accuracy decreases (offset between target and gaze positions increases) as the distance from the calibration point increases. The latter refers to the distance at which the calibration routine’s fixation targets are displayed (Barz et al. Citation2016; Microsoft Citation2020b). Consequently, we hypothesise that the calibration routine for HoloLens 2 is designed to be approximately 2.25 metres away from the user. This is further corroborated by the manufacturer’s recommendation of maintaining a 2-metre interaction distance (Microsoft Citation2020a). The observed increase in accuracy (decrease in the mean difference in view angle between target and gaze positions) could be attributed to reduced vergence-accommodation conflict (Kramida Citation2016), especially since Hololens2 provides only a singular combined gaze ray.

The previous study (Aziz and Komogortsev Citation2022) also assessed the Hololens2 eye gaze data quality, reporting an accuracy of 6.47  while only considering the mean difference in view angle between target and gaze positions. However, we found that accuracy varies between 0.44  to 0.58  depending on the eye-to-target plane distance. The better accuracy achieved in our study is in part, attributable to our approach to reduce head movements and saccades. In our study, the target plane’s centre remains fixed in the direction of the ray cast from the head, allowing the plane to move as the head moves, which enables better control of the head movement component of eye gaze shifts. In contrast, previous studies (Kapp et al. Citation2021; Aziz and Komogortsev Citation2022) only used a chinrest to mitigate head movement. However, shows that even with a chinrest, the raw fixation samples still exhibited fluctuations due to head movement. Furthermore, unlike previous studies, we explicitly dynamically estimated the number of saccade and fixation samples for each target, yielding more accurate estimates of the gaze centre. Recall that the number of saccade samples between targets fluctuated (F=25.98,p<0.001) due to factors such as the frequency of blinks and eye movement speed between two consecutive target sequences.

Holmqvist et al. (Citation2011) observed that premium eye gaze tracking systems generally achieve precision levels below 0.10°, while less advanced devices may exhibit precision values as high as 1°. Numerous studies deploying these latter devices have employed filters to reduce gaze sample dispersion (Richard and Howell Citation2002; Hansen et al. Citation2003; Huckauf and Mario Citation2008; Rajanna and Paulin Hansen Citation2018; Hang Lee et al. Citation2019; Benligiray et al. Citation2019). However, the precision measurements in our study fell within the range of 0.154° to 0.210°, obviating the need for filtering, and in fact, allowing us to ignore gaze sample dispersion altogether for targets at view angles in excess of 0.210. Previous studies have reported similar precision values for Hololens 2, namely, 0.24  (Kapp et al. Citation2021) and 0.14  (Aziz and Komogortsev Citation2022) when only considering the mean difference in view angle between target and gaze positions at a single eye-to-target plane distance.

Comparing amongst other HMD eye trackers, Macinnes et al. (Citation2018) evaluated the accuracy (difference in view angle between target and gaze positions) and precision of three devices using multiple target locations with seated participants. They reported accuracies and precisions for the Pupil Labs 120 Hz Binocular glasses (0.84 , 0.16 ), the SensoMotoric Instruments (SMI) Eye Tracking Glasses 2 (1.21 , 0.19 ) and the Tobii Pro Glasses 2 (1.42 , 0.34 ). Our angular results () for the HoloLens 2, prior to recalibration, yielded an accuracy of 0.51  and a precision of 0.18 . Our empirically determined precision value is thus on par with those of other HMD eye trackers while accuracy is better than that reported by Macinnes et al. (Macinnes et al. Citation2018) for other eye-tracking HMDs, suggesting that the eye tracking data from HoloLens 2 can be effectively used in research experiments. However, one drawback is that the Hololens 2 sampling rate of 30 Hz is lower than that of the devices tested by Macinnes et al.

4.3. Optimal AOI

The optimal AOI denotes an invisible region that activates the target virtual object when dwelled upon. The user only sees the target in its original position within the text entry interface. After head movement and saccade removal, by applying confidence ellipse and ROC methods, the optimal plane distance was confirmed to be 2.25 m, with the target diameter of 0.072 m. Also, the system should be recalibrated by shifting the AOI by way of EquationEquations 4 and 5.

The drastically different intercepts in the recalibration equations (0.0137,0.0050) were indicative of asymmetrical spatial distributions of eye gaze data. This finding corroborates Aziz et al. (Citation2022) who also arrived at axially dependent recalibration shifts in Hololens2. Additionally, variations in spatial distribution have been observed in other eye trackers; for instance, Feit et al. (Citation2017) reported poor accuracy and precision towards the lower and right screen edges in certain eye-tracking systems. Our study’s results, particularly the univariate dependence of the recalibrated coordinates (as suggested by a single dominant coefficient in EquationEquations 4 and Equation5) suggests that variations in the horizontal and vertical offsets in the gaze samples were independent. This observation aligns to that of Huang et al. (Citation2019; Yu et al. Citation2019; Lei et al. Citation2023).

illustrates a notable post-recalibration enhancement in the accuracy of eye-tracking performance. The corresponding optimal diameter of the circular AOI was 0.072 m. We note that increasing the AOI of the target up to 0.072 m markedly elevated true positives, but further expansion of the target size would run the risk of also inflating false positives. Herein, lies the inherent trade-off that necessitated the optimisation of the target AOI.

4.4. Limitations and future work

Despite attempts to mimic real-life conditions by minimising the contrast between the keyboard characters and the background and allowing unconstrained head movements in Experiment 1, the laboratory environment is ultimately limited. For example, as in other studies, the target was a virtual object rather than a single point, but we assumed that the user fixated on the centre of the circular area of interest. This optimistically biased the measures of tracking accuracy. Experiment 2 was also conducted in a laboratory environment using high-contrast between fixation targets and background, which may have yielded higher data quality than outside the lab. Additionally, participants were seated comfortably, in the absence of distractions. Future research should aim to evaluate eye-tracking data quality in more representative environments.

Certain participants experienced notably inferior accuracy compared to that of other participants () in Experiment 2. Multiple device recalibration attempts confirmed that built-in calibration was not the culprit. The headset was simply unable to correctly detect the position of the pupils of these participants. Future research ought to assess eye gaze data quality in Hololens2 considering specific participant characteristics such as eye colour and eye shape (Mahanama et al. Citation2022; Robinson Citation2022).

The definition of negative and positive cases during target activation can impact the ROC. When a given fixation target was activated, we designated all gaze samples associated with the 29 non-active targets as negative cases as these targets were at considerable distance from the activated target. However, this definition does render the likelihood of correctly predicting negatives very high, which in turn results in very low false positives. Future research may restrict negative cases to only neighbouring fixation targets.

5. Conclusion

We assessed the eye-tracking capabilities of the Hololens2 in 20 typically developed adults to gauge its potential for gaze-based text entry. Our experiments identified the optimal position and area of interest of virtual objects for gaze-based interaction in immersive mixed environments catering to small eye movements, and fast and accurate target selection. Future studies should consider an online auto-recalibration task with the optimal specifications determined in this study and test with a more heterogeneous user population.

Credit authorship contribution statement

Mahya Mirbagheri: Acquisition of data, Analysis and/or interpretation of data, Writing – original draft, Writing – review & editing. Tom Chau: Analysis and/or interpretation of data, Writing – original draft, Writing – review & editing.

Disclosure statement

No potential conflict of interest was reported by the authors.

Data availability statement

The original ethics approval for this study did not include data sharing.

Additional information

Funding

Mahya Mirbagheri received two fundings for this research from the Toronto Rehabilitation Institute and Holland Bloorview Kids Rehabilitation Hospital. In addition, a grant was received from Natural Sciences and Engineering Research Council of Canada with [Grant no: RGPIN-2019-06033]. All authors approved the version of the manuscript to be published.

Notes on contributors

Mahya Mirbagheri

Mahya Mirbagheri is a PhD student at University of Toronto and researcher at at Holland Bloorview Kids Rehabilitation Hospital. She has contributed to the field of Human-Computer Interaction (HCI), focusing on enhancing Augmented Reality (AR) and Artificial Intelligence (AI) for better human interaction, with her work being particularly influential in developing communication solutions for youth with communication needs.

Tom Chau

Tom Chau is a Professor in the Institute of Biomedical Engineering and Distinguished Senior Scientist at Holland Bloorview Kids Rehabilitation Hospital. His research focuses on the investigation of novel access pathways to facilitate communication for children and youth with severe physical impairments.

References

  • Aziz S, Komogortsev O. 2022. An assessment of the eye tracking signal quality captured in the HoloLens 2. 2022 Symposium on Eye Tracking Research and Applications (ETRA ’22). Association for Computing Machinery. New York, NY, USA. Article 5. p. 1–12; doi:10.1145/3517031.3529626.
  • Barz M, Daiber F, Bulling A. 2016. Prediction of gaze estimation error for error-aware gaze-based interfaces. Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications; New York, NY, USA: ACM Press. p. 275–278.
  • Benligiray B, Topal C, Akinlar C. 2019. SliceType: fast gaze typing with a merging keyboard. J Multimodal User Interface. 13(4):321–334. doi: 10.1007/s12193-018-0285-z.
  • Bozkir E, Özdel S, Wang M, David-John B, Gao H, Butler K, Jain E, Kasneci E. 2023. Eye-tracked virtual reality: a comprehensive survey on methods and privacy challenges. doi:10.48550/arXiv.2305.14080.
  • Casiez G, Roussel N, Vogel D. 2012. 1 € filter: a simple speed-based low-pass filter for noisy input in interactive systems. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’12); New York, NY, USA: Association for Computing Machinery. p. 2527–2530. doi:10.1145/2207676.2208639.
  • Drewes J, Masson GS, Montagnini A. 2012. Shifts in reported gaze position due to changes in pupil size: ground truth and compensation. Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA ’12); New York, NY, USA: Association for Computing Machinery. p. 209–212. doi:10.1145/2168556.2168596.
  • Enkelejda K, Gjergji K, Kübler TC, Wolfgang R. 2015. Online recognition of fixations, saccades, and smooth pursuits for automated analysis of traffic hazard perception. In: Koprinkova-Hristova P, editors. Artificial Neural Networks, Springer Series in Bio-/Neuroinformatics. Vol. 4, p. 411–434. doi:10.1007/978-3-319-09903-3_20.
  • Fares R, Fang S, Komogortsev O. 2013. Can we beat the mouse with MAGIC? Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’13). Association for Computing Machinery; New York, NY, USA. 1387–1390. doi:10.1145/2470654.2466183.
  • Flatla DR, Gutwin C, Nacke LE, Bateman S, Mandryk RL. 2011. Calibration games: making calibration tasks enjoyable by adding motivating game elements. Proceedings of the 24th annual ACM symposium on User interface software and technology (UIST ’11); New York, NY, USA: Association for Computing Machinery. p. 403–412. doi:10.1145/2047196.2047248.
  • Friedman L, Lohr D, Hanson T, Komogortsev OV. 2021. Angular offset distributions during fixation are, more often than not, multimodal. J Eye Mov Res. 14(3): doi:10.16910/jemr.14.3.2.
  • Goffart L. 2009. Saccadic eye movements. Oxford: Academic Press. doi:10.1016/B978-008045046-9.01101-3.
  • Hang Lee L, Yung Lam K, Pan Yau Y, Braud T, Hui P. 2019. Hibey: Hide the keyboard in augmented reality. IEEE International Conference on Pervasive Computing and Communications. PerCom, Kyoto, Japan. p. 1–10. doi:10.1109/PERCOM.2019.8767420.
  • Hansen JP, Johansen AS, Hansen DW, Itoh K, Mashino S 2003. Language technology in a predictive, restricted on-screen keyboard with ambiguous layout for severely disabled people. Proceedings of the EACL workshop on language modeling for text entry methods. doi:10.3115/1628195.1628203.
  • Holmqvist K, Lee Örbom S, Hooge IT, Niehorster DC, Alexander RG, Andersson R, Benjamins JS, Blignaut P, Brouwer A-M, Chuang LL. 2022. Eye tracking: empirical foundations for a minimal reporting guideline. Behav Res Methods. 55(1):364–416. doi:10.3758/s13428-021-01762-8.
  • Holmqvist K, Nyström M, Andersson R. 2011. Eye tracking: a comprehensive guide to methods and measures. Oxford, UK: Oxford University Press.
  • Holmqvist K, Nyström M, Mulvey F. 2012. Eye tracker data quality: what it is and how to measure it. Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA ’12). New York, NY, USA: Association for Computing Machinery. p. 45–52. doi:10.1145/2168556.2168563.
  • Huang J, Tian F, Li N, Fan X 2019. Modeling the uncertainty in 2D moving target selection. Proceedings of the 32nd annual ACM symposium on user interface software and technology; New Orleans, Louisiana, USA. p. 1031–1043.
  • Huckauf A, Mario HU 2008. Gazing with pEyes: towards a universal input for various applications. Proceedings of the 2008 symposium on Eye tracking research & applications (ETRA ’08). New York, NY, USA: Association for Computing Machinery. p. 51–54. doi:10.1145/1344471.1344483.
  • Kapp S, Barz M, Mukhametov S, Sonntag D, Kuhn J. 2021. ARETT: Augmented reality eye tracking toolkit for head mounted displays. Sensors. 21(6):2234. doi: 10.3390/s21062234.
  • Kasneci E. 2013. Towards the automated recognition of assistance need for drivers with impaired visual field [ Ph.D. Dissertation]. 72074 Tübingen: University of Tübingen, Wilhelmstr. 32. http://tobias-lib.uni-tuebingen.de/volltexte/2013/7033.
  • Kramida G. 2016. Resolving the vergence-accommodation conflict in head-mounted displays. IEEE Trans Vis Comput Graph. 22(7):1912–1931. doi: 10.1109/TVCG.2015.2473855.
  • Kübler TC, Sippel K, Fuhl W, Schievelbein G, Aufreiter J, Rosenberg R, Rosenstiel W, Kasneci E. 2015. Analysis of eye movements with Eyetrace. International Joint Conference on Biomedical Engineering Systems and Technologies. Springer. p. 458–471. doi:10.1145/3171221.3171287.
  • Lei T, Chen J, Chen J, et al. 2023. Modeling the gaze point distribution to assist eye-based target selection in head-mounted displays. Neural Comput Applic. doi:10.1007/s00521-023-08705-8.
  • Lu X, Yu D, Liang H-N, Goncalves J. 2021. Itext: hands-free text entry on an imaginary keyboard for augmented reality systems. In: The 34th Annual ACM Symposium on User Interface Software and Technology (UIST ’21). New York, NY, USA: Association for Computing Machinery; pp. 815–825.
  • Macinnes JJ, Iqbal S, Pearson J, Johnson EN. 2018. Wearable eye-tracking for research: Automated dynamic gaze mapping and accuracy/precision comparisons across devices. BioRxiv. 4.
  • Mahanama B, Jayawardana Y, Rengarajan S, Jayawardena G, Chukoskie L, Snider J, Jayarathna S. 2022. Eye movement and pupil measures: a review. Front Comput Sci. 3:733531. doi: 10.3389/fcomp.2021.733531.
  • Maria Feit A, Williams S, Toledo A, Paradiso A, Kulkarni H, Kane S, Ringel Morris M. 2017. Toward everyday gaze input: accuracy and precision of eye tracking and implications for design. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ’17). New York, NY: Association for Computing Machinery. p. 1118–1130. doi:10.1145/3025453.3025599.
  • Microsoft. Comfort. Available online: (accessed on 25 November 2020a). https://docs.microsoft.com/de-de/windows/mixed-reality/design/comfort.
  • Microsoft. EyesPose Class. Available online: (accessed on 17 November 2020b). https://docs.microsoft.com/de-de/uwp/api/windows.perception.people.eyespose?view=winrt-19041.
  • Niehorster DC, Santini T, Hessels RS, Hooge ITC, Kasneci E, Nyström M. 2020. The impact of slippage on the data quality of head-worn eye trackers. Behav Res. 52(3):1140–1160. doi: 10.3758/s13428-019-01307-0.
  • Pfeuffer K, Vidal M, Turner J, Bulling A, Gellersen H. 2013. Pursuit calibration: making gaze calibration less tedious and more flexible. Proceedings of the 26th annual ACM symposium on User interface software and technology (UIST ’13). New York, NY: Association for Computing Machinery. p. 261–270. doi:10.1145/2501988.2501998.
  • Rajanna V, Paulin Hansen J. 2018. Gaze typing in virtual reality: impact of keyboard design, selection method, and motion. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications (ETRA ’18). New York, NY: Association for Computing Machinery. p. 1–10. Article 15. doi:10.1145/3204493.3204541.
  • Ramirez Gomez A, Gellersen H. 2018. Smooth-i: smart re-calibration using smooth pursuit eye movements. Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications (ETRA ’18). New York, NY: Association for Computing Machinery. p. 1–5. Article 10. doi:10.1145/3204493.3204585.
  • Richard B, Howell I. 2002. Zooming interfaces! Enhancing the performance of eye controlled pointing devices. Proceedings of the 5th International ACM Conference on Assistive Technologies(ASSETS’02). New York, NY: ACM. p. 119–126. doi:10.1145/638249.638272.
  • Robinson DA. 2022. Properties of rapid eye movements. Prog Brain Res. 267(1):271–286.
  • Sidenmark L, Lundström A. 2019. Gaze behaviour on interacted objects during hand interaction in virtual reality for eye tracking calibration. Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications (ETRA ’19). New York, NY, USA: Association for Computing Machinery. Article 6. p. 1–9. doi:10.1145/3314111.3319815.
  • Skavenski AA, Hansen RM, Steinman RM, Winterson BJ. 1979. Quality of retinal image stabilization during small natural and artificial body rotations in man. Vision Res. 19(6):675–683. doi: 10.1016/0042-6989(79)90243-8.
  • Špakov O. 2012. Comparison of eye movement filters used in HCI. Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA ’12). Association for Computing Machinery. New York, NY, USA. 281–284. doi:10.1145/2168556.2168616.
  • Tafaj E, Kasneci G, Rosenstiel W, Bogdan M. 2012. Bayesian online clustering of eye movement data. Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA ’12). New York, NY, USA. Association for Computing Machinery. 285–288. doi:10.1145/2168556.2168617.
  • Wei Y, Shi R, Yu D, Wang Y, Li Y, Yu L, Liang H-N. 2023. Predicting gaze-based target selection in augmented reality headsets based on eye and head endpoint distributions. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). New York, NY, USA: Association for Computing Machinery. Article 283. p. 1–14. doi:10.1145/3544548.3581042.
  • Xu W, Liang H-N, He A, Wang Z. 2019. Pointing and selection methods for text entry in augmented reality head mounted displays. IEEE International Symposium on Mixed and Augmented Reality (ISMAR). Beijing, China. p. 279–288. doi:10.1109/ISMAR.2019.00026.
  • Yu D, Liang HN, Lu X, Fan K, Ens B. 2019. Modeling endpoint distribution of pointing selection tasks in virtual reality environments. ACM Trans Graph. 38(6):1–13. doi: 10.1145/3355089.3356544.
  • Zhao S, Li Y. 2021. Understanding eye-tracking performance and biases for text entry on head-mounted AR displays. Proc ACM Hum-Comput Interact. 5(ISS):1–13. CSCW2, Article 384 (May 2021). doi: 10.1145/3488544.