292
Views
0
CrossRef citations to date
0
Altmetric
Sports Performance

Diving into a pool of data: Using principal component analysis to optimize performance prediction in women’s short-course swimming

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 519-526 | Received 26 Oct 2023, Accepted 17 Apr 2024, Published online: 05 May 2024

ABSTRACT

This study aimed to optimise performance prediction in short-course swimming through Principal Component Analyses (PCA) and multiple regression. All women’s freestyle races at the European Short-Course Swimming Championships were analysed. Established performance metrics were obtained including start, free-swimming, and turn performance metrics. PCA were conducted to reduce redundant variables, and a multiple linear regression was performed where the criterion was swimming time. A practical tool, the Potential Predictor, was developed from regression equations to facilitate performance prediction. Bland and Altman analyses with 95% limits of agreement (95% LOA) were used to assess agreement between predicted and actual swimming performance. There was a very strong agreement between predicted and actual swimming performance. The mean bias for all race distances was less than 0.1s with wider LOAs for the 800 m (95% LOA −7.6 to + 7.7s) but tighter LOAs for the other races (95% LOAs −0.6 to + 0.6s). Free-Swimming Speed (FSS) and turn performance were identified as Key Performance Indicators (KPIs) in the longer distance races (200 m, 400 m, 800 m). Start performance emerged as a KPI in sprint races (50 m and 100 m). The successful implementation of PCA and multiple regression provides coaches with a valuable tool to uncover individual potential and empowers data-driven decision-making in athlete training.

Introduction

A fundamental goal for performance analysts, sports scientists, and coaches is to identify the key performance indicators (KPIs) that significantly impact athletes’ performances in their particular disciplines (M. D. Hughes & Bartlett, Citation2002; M. Hughes et al., Citation2019; O’donoghue, Citation2009; Reichmuth et al., Citation2021). By accurately pinpointing these essential metrics, coaches can optimise training strategies and enhance overall performance (M. Hughes et al., Citation2019; O’donoghue, Citation2009). This is of particular importance for female swimmers, because women are an underrepresented study population in elite sport. As such, previous studies showed that only 39% of the study participants in three highly ranked sport scientific journals were women (Costello et al., Citation2014).

In the realm of swimming, a global sport, there can be a multitude of performance data provided to coaches and athletes. For example, the timings at the starting blocks as well as the first few 5 m splits represent a swimmer’s reaction times and skills from the starting blocks (Da Silva et al., Citation2019). In particular, in sprint races start performance is highly related to overall performance (Born et al., Citation2022). With increasing race distance, start performance effects decreased while turn performance effects increased (Born et al., Citation2021b). As such, turn time is commonly recorded because a swimmer’s ability to execute a swift turn might be related to the final performance (Polach et al., Citation2021). This might be especially poignant in short-course races held in 25 m pools, that involve twice the number of turns for a given event than long-course, Olympic-style races (50 m pool length) (Cuenca-Fernández et al., Citation2022). In addition to the acyclic parts of the swim race (start and turns), race analyses reveal multiple/various parameters for the free-swimming phase. For example, prior research has analysed performance metrics such as stroke rate, distance per stroke, split times and changes in swimming velocity over the time course of the race (Born et al., Citation2022).

However, with a plethora of performance metrics, there arises the challenge of dealing with overlapping covariances, leading to a “data tsunami”. In such cases, it becomes arduous for coaches and athletes to discern which parameters hold the most significant contributions to overall performance and can be considered as KPIs. A common technique employed to address this complexity is principal component analysis (PCA). The significance of PCA lies in its ability to reduce the dimensionality of the data and identify the most important variables (KPIs) that contribute to an athlete’s performance. By condensing the information into a smaller set of meaningful components, coaches can focus on the most influential factors that affect race times. This streamlining process helps coaches and athletes to better understand and interpret the complex relationships between different variables (O’Donoghue, Citation2008; Rojas-Valverde et al., Citation2020; Weaving et al., Citation2019). To help with data reduction, PCA has been utilised previously within sports such as swimming (Burkhardt et al., Citation2023), skeleton (Colyer et al., Citation2017) or rugby (Parmar et al., Citation2018). Additionally, multiple regression analysis can be used to pinpoint KPIs and construct a precise performance prediction model.

While some previous studies have developed performance prediction models in swimming (Heazlewood, Citation2006; Wu et al., Citation2022), these studies have considered retrospective swimming times from previous competitions in order to determine the improvement rate and used this to predict future swimming performances. As such, these prediction models did not consider the significance of various KPIs (e.g., Start or Turn performance) that contribute to overall swim performance and may not help coaches and swimmers to identify individual potentials and develop race and training strategies. Performance prediction based on KPIs would provide coaches with a valuable tool to continuously monitor athletes’ performances over time and serves as a pivotal training guide. Forecasting athletes’ performances will empower coaches with critical insights, facilitating more informed training decisions, enabling athletes to fine-tune their preparation based on data-driven insights, and ultimately enhancing their competitive edge in the world of swimming. Furthermore, the majority of the existing literature in swimming focuses on men, with lesser attention given to their female counterparts.

Therefore, the aim of this study was to optimise performance prediction in women’s short-course swimming through the application of data reduction techniques (PCA) and multiple regression. The study pursued three primary objectives: firstly, to identify KPIs tailored to different distances in short course freestyle swimming; secondly, to construct a precise performance prediction model; and thirdly provide a practical tool that empowers coaches and athletes to accurately predict race performance and individual potentials based on these identified KPIs.

Methods

Experimental approach to the problem

A retrospective analysis was performed from performance data obtained from the 2019 European Short-Course Swimming Championships. Video analysis was used to evaluate various established performance metrics. Using a combination of data-reduction techniques (PCA) and multiple regression, KPIs and performance prediction models were identified. A practical tool was developed in Microsoft Excel that empowers coaches and athletes to accurately predict race performance based on these identified KPIs which can be used to fine-tune training based on data-driven insights.

Subjects

Participants included all individual freestyle races of the female swimmers competing at the 2019 European Short-Course Swimming Championships in Glasgow, Scotland (N = 118; 800 m: Observations = 29; World Aquatic Points = 820 ± 77; 400 m: Observations = 43; World Aquatic Points = 816 ± 64; 200 m: Observations = 48; World Aquatic Points = 801 ± 67; 100 m: Observations = 61; World Aquatic Points = 763 ± 84; 50 m: Observations = 53; World Aquatic Points = 767 ± 79). All swimmers that participate at events hosted by the European Swimming Association LEN (Ligue Européenne de Natation) agree to be video monitored for television broadcasting and race analysis of the participating nations. The study was pre-approved by the leading institution’s internal review board (registration number: 098-LSP-191119) and was in accordance to the latest version of the code of conduct of the World Medical Association for studies involving human subjects (Helsinki Declaration).

Data collection

A twelve-camera system (Spiideo, Malmö, Sweden) was employed to monitor all races. Among these cameras, ten focused on individual swimmers, each capturing one of the ten lanes (V59 PTZ, Axis Communications AB, Lund, Sweden). Additionally, two cameras with a fixed view were placed at a 90° angle to the swimming lanes, one at the 5 m mark and the other at the 20 m mark, to oversee the start and turn sections for all lanes. Video footage was gathered at a sampling rate of 50 Hz. The video footages were manually post processed (Kinovea 0.9.1; Joan Charmant & Contrib., https://kinovea.org/) to determine split times, stroke rate (SR), distance per stroke (DPS), and the duration from the starting beep to the 5 m, 10 m, and 15 m marks (Start5, Start10, Start15). For every turn, the duration from 5 m prior to the moment of wall contact (in5), the duration from the wall contact point to 5 m after the turn (out5), and the duration from the wall contact point to 10 m after the turn (out10) was determined. The top of the head was used as reference point to determine time until 5 m, 10 m, and 15 m after the start as well as 5 m before, 5 m and 10 m after wall contact during the turn. Reliability of the data analysis has previously been determined with an intra-class correlation coefficient of 0.98 ± 0.04 (Born et al., Citation2021a, Citation2021b, Citation2022). Race times were provided by the official timekeeper of the championship (Microplus Informatica, Marene CN, Italy).

Data analyses

Free-swimming speed (FSS) was calculated from the middle 10 m section of each lap from the difference between split time, out10 and in5 (Equation 1). FSS was not calculated for the first lap given the influence of the start on swimming speed.

(1) Freeswimmingspeed=10SplitTimeout10+in5(1)

Stroke rate (SR; Hz) was calculated by dividing 60 by the time of a single stroke. Distance-per-stroke (DPS; m) was calculated by multiplying stroke time and section speed during the free-swimming phase. In addition, Start5to10 and Start10to15 were calculated from the time differences between Start10 and Start5 as well as Start15 and Start10, respectively. Out5to10 was calculated from the time difference between the turn’s Out10 and Out5 and the average of each metric was calculated across all laps for each race. The remaining variables were calculated because these provide information related to performance during the underwater phase after the start (Start5to10 and Start10to15) or the turn (Out5to10). Analyses of the underwater phase following the start or turns have been included in previous literature and can have a large impact on overall performance (Cuenca-Fernández et al., Citation2022; Veiga & Roig, Citation2016).

Race times were categorised into distinct classifications based on performance outcomes: Did Not Qualify (DNQ): swimmers who did not progress beyond the heats and did not qualify for any further rounds; Qualified (Q): swimmers who successfully qualified for either the semi-final (QSF) or the final (QF); Medallists (M): swimmers who achieved podium positions and won medals in their respective events. It is noteworthy, that the 200 m, 400 m and 800 m events at the European short-distance championships involve no semi-finals, which requires even the strongest swimmers in the field to perform to their best in the heats to assure qualification of the finals.

Development of the potential predictor

A practical tool was developed using Microsoft Excel (Supplementary File 1) further referred to as the Potential Predictor. The Potential Predictor was designed to utilise the identified KPIs for each race distance. It incorporates the predefined race classifications (DNQ, QSF, QF, and M), allowing coaches to estimate race performance times using KPIs and compare athlete performances against predicted outcomes based on these thresholds.

Statistical analyses

To assess variables with a high degree of covariance (≥0.8), a covariance matrix was computed for all z-scored data. The covariance analyses for each race distance (800 m, 400 m, 200 m, 100 m, and 50 m) generally revealed high covariances among the variables, but with some exceptions. In the 800 m, 400 m and 200 m races, Start5, Start10to15, and in5 showed lower covariances. In contrast, the 100 m race had high covariances for all variables except Start10to15. Lastly, the 50 m race showed lower covariances for FSS, Start5, in5, and Start10to15. Subsequently, a Principal Component Analysis (PCA) was conducted on all remaining metrics (i.e., those with high covariances) in order to eliminate redundant variables that captured similar information. The Kaiser-Meyer-Olkin measure was used to verify the sampling adequacy of the data, with a value of .5 used as a threshold for acceptability (Kaiser, Citation1974). The Bartlett test of sphericity was also used to determine the suitability of the data for PCA, with significance accepted at an α level of p ≤ .05. Orthogonal rotation (varimax) was used to improve the identification and interpretation of factors (Hair et al., Citation2010). Principal Components (PCs) with and eigenvalues greater than 1 were extracted (2 PCs in all cases). The rotated covariance matrix was consulted to determine the component loadings to determine which factors contributed the greatest to the variations in swimming performance data. The most heavily loaded (most strongly related) variable to each component were then retained, along with the original variables which did not display a high degree of covariance, to be used as predictor variables in a stepwise multiple linear regression analysis, in which the criterion was swim time performance. Entered variables remained in the model if a significant R2 change (p < .05) was reported. A K-fold cross-validation technique was used to provide a rigorous assessment of model stability, similar to that previously applied in similar research (Colyer et al., Citation2017). Briefly, this involves splitting the data into K roughly equal-size parts and then fitting a regression model to K − 1 parts and validating this model against the Kth part. This process is then repeated for K = 1, 2, … , K. In the current study, each Kth part comprised data for 1 observation. Prediction errors were calculated for K iterations and combined to provide an overall standard error of the estimate (SEE). The SEE of the original multiple regression analysis was compared with the SEE when the K-fold validation method was applied. Further, the correlation between predicted and actual swimming time was computed and compared with the R2 value of the multiple regression. It has been previously suggested that a model can be considered stable if the R2 differences does not exceed .10 (Kleinbaum et al., Citation2013). Finally, the unstandardized β coefficients from the linear regressions were used to form prediction equations. The predicted and actual swimming performances, along with the 95% limits of agreement (LOA) and the 95% confidence intervals (95%CI) of the LOAs, were subsequently analysed using methods described by Bland and Altman (Bland & Altman, Citation1999).

All statistical analyses were performed using SPSS Statistics (Version 29; IBM Corporation, NY).

Results

displays the rotated component matrix showing the variables which had the highest component loadings to each PC. PC1 was most strongly correlated with FSS for the 800 m, 200 m and 100 m races. PC1 was most strongly correlated with Start10 for the 400 m and Start15 for the 50 m. PC2 was most strongly correlated with SR for the 200 m, 100 m and 50 m races, FSS for the 400 m and Start10 for the 800 m. These variables were, therefore, retained to be used in the stepwise multiple linear regression analyses, along with the original variables without a high degree of covariance.

Table 1. The explained variance of two principal components and the varimax rotated component matrix for all freestyle swimming races.

Stepwise multiple linear regressions revealed the KPIs for each race distance. The unstandardized β coefficients were then used to form the following regression equations. The regression equations are reported with the SEE from the multiple regression analysis; the SEE following K-fold validation; the R2 value of the multiple regression model and the difference in R2 between the multiple regression model and the relationship between actual and predicted swimming time. In all instances, the differences in SEE between the original model and the SEE from the K-fold validation were minimal and the R2 differences were < 0.1 indicating model stability.

SwimTime800 = 626.817–198.693*FSS +49.825*in5

(SEE = 4.03; SEE k-fold validation = 3.91. R2  = 0.941; R2 difference = 0.006).

SwimTime400 = 344.285–97.939*FSS +16.978*in5

(SEE = 0.39; SEE k-fold validation = 0.38. R2  = 0.997; R2 difference = 0.003).

SwimTime200 = 161.803–41.955*FSS +8.267*in5

(SEE = 0.12; SEE k-fold validation = 0.15. R2  = 0.999; R2 difference = 0.001).

SwimTime100 = 89.659–21.033*FSS +1.670*Start10to15

(SEE = 0.32; SEE k-fold validation = 0.33. R2  = 0.978; R2 difference = 0.003).

SwimTime50 = 4.123 + 2.229*Start15 +2.155*in5

(SEE = 0.30; SEE k-fold validation = 0.30. R2  = 0.882; R2 difference = 0.05).

displays Bland and Altman plots, illustrating the agreement between predicted and actual swimming performance across all freestyle races. The results indicate a consistently very strong agreement between predicted and actual swimming performance for all race distances. Specifically, for the 800 m race, the predicted swimming performance showed a slight overprediction with a mean bias of 0.03 seconds compared to actual swimming performance, but with wider LOA (95% LOA: −7.6 to +7.7 seconds; 95%CI: −1.29 to + 1.22 s; Panel A). For the 400 m race, the predicted swimming performance exhibited a slight underprediction with a mean bias of −0.05 seconds compared to actual swimming performance and tight LOA (95% LOA: −0.5 to +0.4 seconds; 95%CI: −0.02 to + 0.11 s; Panel B). In the case of the 200 m race, the predicted swimming time demonstrated an excellent agreement with actual swimming performance, showing a negligible mean bias of 0.00 seconds and tight LOA (95% LOA: −0.2 to +0.2 seconds; 95%CI: −0.03 to + 0.03 s; Panel C). Similarly, for both the 100 m and 50 m races, the predicted swimming times displayed very strong agreement with actual swimming performance, each having a mean bias of 0.00 seconds and tight LOA (95% LOA: −0.6 to +0.6 seconds; 95%CI: −0.07 to + 0.07 s; Panels D and E, respectively).

Figure 1. Bland and Altman plots with 95% limits of agreement displaying the agreement between predicted and actual swim time for the 800 m (panel a), 400 m (panel b), 200 m (panel c), 100 m (panel d), and 50 m (panel e) freestyle races.

Figure 1. Bland and Altman plots with 95% limits of agreement displaying the agreement between predicted and actual swim time for the 800 m (panel a), 400 m (panel b), 200 m (panel c), 100 m (panel d), and 50 m (panel e) freestyle races.

displays the mean swimming performance and the mean of the two main KPIs for each race distance for each performance classification (DNQ, QSF, QF and M).

Table 2. Mean swimming time and KPIs for the performance classifications for all race distances.

Discussion

This study focused on optimising performance prediction in short course swimming by utilising data reduction techniques (PCA) and multiple regression. The study pursued three primary objectives: firstly, to identify KPIs tailored to different distances in short-course women’s freestyle swimming; secondly, to construct a performance prediction model; and thirdly provide a practical tool that empowers coaches and athletes to accurately predict race performance based on these identified KPIs.

The main findings of this study were: 1) FSS was identified as a KPI for all races except one (50 m), clearly displaying the importance of fast free-swimming speeds for short-course freestyle swimming performance across a range of distances; 2) in5 time was identified as a KPI for all races except one (100 m), displaying the importance of a fast turn for short-course freestyle swimming performance across a range of distances; 3) starting times were only KPIs for sprint races (50 m and 100 m). Additionally, this article provides a practical tool (Potential Predictor) that empowers coaches and swimmers to predict race times with 95% LOA and how specific alternations in the KPI affect their race time (refer to supplementary material). Forecasting athletes’ performances can empower coaches with critical insights, facilitating more informed training decisions, enabling athletes to fine-tune their preparation based on data-driven insights, and ultimately enhancing their competitive edge in the world of swimming.

PCA and methodological approach

PCA is valuable due to its capacity to decrease the complexity of data and pinpoint KPIs crucial for an athlete’s success. Through consolidating information into a concise set of components, coaches can prioritise the most impactful variables affecting athletic performance, particularly race times. This simplification aids in enhancing comprehension of the intricate interplay among various factors for both coaches and athletes. In addition, prior research has highlighted the practicality of PCA as a dimension-reduction and visualisation tool to assist in decision making and communication in sports (Weaving et al., Citation2019).

The multiple regression analysis approach in the present study complements PCA by establishing quantifiable relationships between the identified KPIs and race times. Through this method, coaches and athletes gain insights into the specific impacts of each KPI and how changes in one of them may affect overall race performance. By understanding the individual contributions of various KPIs, coaches can tailor training programs to prioritize areas of improvement, target specific weaknesses, and develop more effective training strategies and race preparation. It is noteworthy that, KPIs and prediction equations likely differ for long-course swimming, different stokes and for men. As such, coaches should use caution when using these data to predict swimming performance in contexts other than women’s short-course freestyle events.

Free-swimming speed, SR, DPS

FSS emerged as a crucial KPI in all races except the 50 m. This finding indicates that in the 50 m event, swimmers can swim at similar speeds, but those with a better start or approach to the wall are the ones who excel. This is in line with previous research findings, where start (r = 0.91 p < 0.001) and turn performances (r = 0.94 p < 0.001) were closely correlated with the total race time, especially in short-course 50 m races (Born et al., Citation2021b). Further, the revelation regarding FSS as a KPI for the longer distance events (100–800 m) is in line with prior research, which has consistently identified it as a significant factor distinguishing between top-tier finalists and slower competitors in individual medley events (Born et al., Citation2022). Additionally, other studies have underscored FSS as a major KPI for overall swimming performance (Tor et al., Citation2014). While recent research has shed light on the importance of acyclic phases, such as starts and turns, for race outcomes (Born et al., Citation2021b), the free-swimming segments continue to account for a substantial portion of total race time, ranging from 42.5% to 44.1% (Born et al., Citation2022). This firmly establishes the pivotal role of FSS in overall race performance, as reaffirmed by our study. Notably, in long-course races with fewer turns, the significance of FSS probably becomes even more pronounced.

As FSS is the result of the interplay between SR and DPS, enhancements in either or both of these variables can lead to improved FSS. Intriguingly, SR displayed the strongest correlation with PC2 in the 200 m, 100 m, and 50 m races, despite not significantly contributing to predictive power in multiple regression analyses. On the contrary, DPS featured in the PCA for all race distances but never emerged as the strongest correlation with any PC. Consequently, DPS was omitted from the multiple regression analyses due to its substantial covariance with other metrics. However, it’s important to note that strategies aimed at improving either SR or DPS have the potential to result in a faster FSS and, consequently, enhanced performance across various race distances. For instance, strengthening muscular power (Toussaint & Vervoorn, Citation1990; Wirth et al., Citation2022) and refining stroke technique (Wirth et al., Citation2022) can contribute to achieving a swifter FSS. Importantly, previous research has demonstrated that maximal strength training notably improves DPS, while incorporating resisted sprint swimming can enhance SR (Crowley et al., Citation2017). It’s crucial to highlight that recent research findings revealed the fastest 50 m swim times were not achieved by maximizing either SR or DPS concurrently, but rather an effective combinations of the two (Morais et al., Citation2023). The findings of our current study further reinforce this evidence, as FSS was identified as a KPI for all race distances except for 50 m event. Notably, in no instance were SR or DPS identified as standalone KPIs. While our study considered average SR and DPS across the race, it’s important to acknowledge the potential existence of various stroke patterns for optimizing FSS, which may even vary between different sections of a race (Morais et al., Citation2023). Consequently, while the PCA may not have identified SR and DPS as KPIs, these parameters should still be carefully evaluated.

Start and turn performances

Start and turn performances were identified as KPIs for the 50 m freestyle race. This emphasis on the importance of the acyclic phases, especially in shorter race distances, is consistent with prior research findings (Born et al., Citation2021b). Notably, the contribution of start performance to the overall race time was most significant in shorter races, reaching as high as 25% (Born et al., Citation2021b). The heightened importance of acyclic phases in shorter races can be attributed to the unique characteristics of water. Unlike solid surfaces found on land, water lacks resistance for swimmers’ hands and feet to push off from, resulting in reduced propelling efficiency (Zamparo et al., Citation2020). However, the starting block and pool wall offer a solid foundation, enabling swimmers to harness their explosive power to its fullest potential. Previous studies have demonstrated a strong correlation between leg strength and power and the 15 m start time in well-trained swimmers (Thng et al., Citation2020; West et al., Citation2011). Furthermore, the utilization of undulating kicking techniques aids in maintaining velocity during the underwater phase, facilitating the smooth transfer of the high speeds achieved during the push-off to the subsequent free-swimming phase (Ruiz-Navarro et al., Citation2022; Veiga & Roig, Citation2016). Nevertheless, it’s crucial to acknowledge that swimmers attain their highest velocities during the start and turn segments, gradually decelerating as they transition into the free-swimming phase of each lap (Tor et al., Citation2015; Veiga & Roig, Citation2016).

While swimmers compete in their designated lanes with wave breaking lane ropes in place, as mandated by international swimming regulations (https://www.worldaquatics.com/rules/competition-regulations), they are still susceptible to the wave drag generated by nearby competitors. Consequently, swimmers typically employ a positive pacing strategy, with a focus on achieving a rapid first lap to secure a leading position ahead of their rivals (Simbaña-Escobar et al., Citation2018). Because sprint races result in the highest velocities and consequently the greatest wave production, positioning oneself ahead of competitors is of utmost importance for success. This notion is reinforced by the present study, which identifies start performance as a KPI for both the 50 m and 100 m races. However, the dynamics of front-end and back-end speed, as well as the influence of various positions relative to neighbouring swimmers, remain relatively unexplored in the field of research. These aspects warrant the attention of future studies to provide a more comprehensive understanding of the complexities involved in competitive swimming.

The KPIs shifted towards FSS and turn performance as the race distances increased. This observation aligns with previous research findings that highlight the evolving contribution over time, where start performance decreases in significance, while turns and the free-swimming phase gain prominence (Born et al., Citation2021b, Citation2022). In particular, the approach to the wall during the turn (in5) was identified as a crucial KPI. The last 5 metres leading up to the wall (in5) are inherently linked to FSS. However, they also encompass a pivotal aspect of the turn, namely, the body rotation just before contacting the wall (David et al., Citation2022). In the case of tumble turns, commonly used in freestyle races, this involves precisely timing the initiation of the rotation while approaching the wall at full swimming speed. This initiation of the rotation plays a pivotal role in determining the precise position and distance to the pool wall, preparing for the subsequent push-off that marks the beginning of the underwater phase for the following lap (Puel et al., Citation2012; Weimar et al., Citation2019). During practice sessions, swimmers routinely execute numerous turns, whether training in a 25 m or 50 m pool. Nevertheless, considering the intricacies of the turn, it is imperative to place particular emphasis on refining the in5 phase while maintaining race-specific velocities. This targeted focus on in5 at race pace velocities is vital for mastering the complexities of the turn and can significantly impact overall race performance.

Practical applications

Supplementary File 1 – the “Potential Predictor”, is a user-friendly Excel spreadsheet tailored for coaches and athletes, empowering them to forecast performance accurately. By utilising this tool, coaches and athletes can conveniently input their data into the predictor fields to obtain their projected race times.

To use this tool effectively, coaches should carefully consider the specific KPIs associated with each race distance. These KPIs should be collected under optimal conditions, such as selecting the best results from multiple trials. Subsequently, these gathered KPIs can be entered into the Potential Predictor to ascertain an individual swimmer’s potential, along with the lower and upper 95% LOA. Coaches should carefully consider the error margins (95% LOAs) around the predicted time (mean bias). Coaches can manipulate one or more KPIs to assess their impact on future race outcomes. Moreover, the Potential Predictor offers benchmark thresholds from data derived from the championships. This feature enables coaches and athletes to evaluate their performance and potential in comparison to their peers, providing valuable insights into their standings within the competitive landscape.

In our Bland and Altman analyses for the 400 m and 200 m swimming races, we encountered an important consideration: the differences between the actual and predicted swimming times did not follow a normal distribution. This non-normal distribution is the reason for the “inverted U” shape observed in . Traditionally, a non-parametric approach to Bland and Altman analyses involves the removal of outliers from the data, for example, by eliminating the upper and lower 5% of data points (Bland & Altman, Citation1999). However, we made a deliberate choice not to employ this method. Our intention was to ensure that our performance predictions encompassed a wide range of potentials, including swimmers both competing to advance from the heats and those in contention for medals. It’s worth noting that Bland and Altman themselves acknowledged that the use of this non-parametric approach has little impact on the limits of agreement (Bland & Altman, Citation1999). In our specific analyses, we found that the limits of agreement for the 400 m and 200 m events remained consistent with a range of ±0.6 seconds, regardless of the method employed. As a result, we conducted our analyses for the 400 m and 200 m events using standard procedures.

Finally, it is essential to acknowledge that the predictor metrics identified in this study are specifically based on their statistical significance as the strongest KPIs through the employed methods. However, this does not imply that other metrics or variables are insignificant in achieving successful performance. A holistic approach still considers multiple factors for comprehensive evaluation. Additionally, KPIs cannot be assessed independently. Larger effort put into one race section may interfere with performance in another phase of the race.

In conclusion, the main findings of this study emphasise the crucial role of free-swimming speed and turn performances in women’s short-course freestyle races and demonstrate that start performance is of particular importance for sprint races (i.e., 50 m and 100 m). Furthermore, the successful application of PCA and multiple regression offers valuable insights for coaches to optimise athlete training and improve race outcomes. The utilization of the predefined race classifications and KPI-based predictions simplifies decision-making regarding athlete training and race strategies and provides valuable insights for optimizing athlete performance in swimming competitions.

Supplemental material

Potential Predictor.xlsx

Download MS Excel (88.5 KB)

Acknowledgments

We would like to express our gratitude to all competitors. There were no specific grants or funding for the present study. All data are available on request from the corresponding author.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplemental material

Supplemental data for this article can be accessed online https://doi.org/10.1080/02640414.2024.2346670

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

References

  • Bland, J. M., & Altman, D. G. (1999). Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8(2), 135–160. https://doi.org/10.1177/096228029900800204
  • Born, D.-P., Kuger, J., Polach, M., & Romann, M. (2021a). Start and turn performances of elite male swimmers: Benchmarks and underlying mechanisms. Sports Biomechanics, 23(4), 1–19. https://doi.org/10.1080/14763141.2021.1872693
  • Born, D.-P., Kuger, J., Polach, M., & Romann, M. (2021b). Turn fast and win: The importance of acyclic phases in top-elite female swimmers. Sports, 9(9), 122. https://doi.org/10.3390/sports9090122
  • Born, D.-P., Romann, M., & Stöggl, T. (2022). Start fast, swim faster, turn fastest: Section analyses and normative data for individual medley. Journal of Sports Science and Medicine, 21, 233. https://doi.org/10.52082/jssm.2022.233
  • Burkhardt, D., Born, D.-P., Singh, N. B., Oberhofer, K., Carradori, S., Sinistaj, S., & Lorenzetti, S. (2023). Key performance indicators and leg positioning for the kick-start in competitive swimmers. Sports Biomechanics, 22(6), 752–766. https://doi.org/10.1080/14763141.2020.1761435
  • Colyer, S. L., Stokes, K. A., Bilzon, J. L., Cardinale, M., & Salo, A. I. (2017). Physical predictors of elite skeleton start performance. International Journal of Sports Physiology and Performance, 12(1), 81–89. https://doi.org/10.1123/ijspp.2015-0631
  • Costello, J. T., Bieuzen, F., & Bleakley, C. M. (2014). Where are all the female participants in sports and exercise medicine research? European Journal of Sport Science, 14(8), 847–851. https://doi.org/10.1080/17461391.2014.911354
  • Crowley, E., Harrison, A. J., & Lyons, M. (2017). The impact of resistance training on swimming performance: A systematic review. Sports Medicine, 47(11), 2285–2307. https://doi.org/10.1007/s40279-017-0730-2
  • Cuenca-Fernández, F., Ruiz-Navarro, J. J., Polach, M., Arellano, R., & Born, D.-P. (2022). Turn performance variation in European elite short-course swimmers. International Journal of Environmental Research and Public Health, 19(9), 5033. https://doi.org/10.3390/ijerph19095033
  • Da Silva, J. K., Dos Santos, P. S., Favaro, S. O., Lirani, L. D. S., & Osiecki, R. (2019). Reaction time on swimming block start in competitors swimmers on world swimming championship. Journal of Physical Education & Sport, 19(2), 376–380. https://doi.org/10.7752/jpes.2019.s2056
  • David, S., Grove, T., Mv, D., Koster, P., & Beek, P. J. (2022). Improving tumble turn performance in swimming—The impact of wall contact time and tuck index. Frontiers in Sports and Active Living, 4, 936695. https://doi.org/10.3389/fspor.2022.936695
  • Hair, J. F., Anderson, R. E., Babin, B. J., & Black, W. C. (2010). Multivariate data analysis: A global perspective (Vol. 7). Pearson.
  • Heazlewood, T. (2006). Prediction versus reality: The use of mathematical models to predict elite performance in swimming and athletics at the Olympic Games. Journal of Sports Science and Medicine, 5(4), 480.
  • Hughes, M. D., & Bartlett, R. M. (2002). The use of performance indicators in performance analysis. Journal of Sports Sciences, 20(10), 739–754. https://doi.org/10.1080/026404102320675602
  • Hughes, M., Franks, I. M., Franks, I. M., & Dancs, H. (2019). Essentials of performance analysis in sport. Routledge.
  • Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39(1), 31–36. https://doi.org/10.1007/BF02291575
  • Kleinbaum, D. G., Kupper, L. L., Nizam, A., & Rosenberg, E. S. (2013). Applied regression analysis and other multivariable methods. Cengage Learning.
  • Morais, J. E., Barbosa, T. M., Bragada, J. A., Nevill, A. M., & Marinho, D. A. (2023). Race analysis and determination of stroke frequency–stroke length combinations during the 50-M Freestyle Event. Journal of Sports Science and Medicine, 22, 156. https://doi.org/10.52082/jssm.2023.156
  • O’Donoghue, P. (2008). Principal components analysis in the selection of key performance indicators in sport. International Journal of Performance Analysis in Sport, 8(3), 145–155. https://doi.org/10.1080/24748668.2008.11868456
  • O’donoghue, P. (2009). Research methods for sports performance analysis. Routledge.
  • Parmar, N., James, N., Hearne, G., & Jones, B. (2018). Using principal component analysis to develop performance indicators in professional rugby league. International Journal of Performance Analysis in Sport, 18(6), 938–949. https://doi.org/10.1080/24748668.2018.1528525
  • Polach, M., Thiel, D., Kreník, J., & Born, D.-P. (2021). Swimming turn performance: The distinguishing factor in 1500 m world championship freestyle races? BMC Research Notes, 14(1), 1–7. https://doi.org/10.1186/s13104-021-05665-x
  • Puel, F., Morlier, J., Avalos, M., Mesnard, M., Cid, M., & Hellard, P. (2012). 3D kinematic and dynamic analysis of the front crawl tumble turn in elite male swimmers. Journal of Biomechanics, 45(3), 510–515. https://doi.org/10.1016/j.jbiomech.2011.11.043
  • Reichmuth, D., Olstad, B. H., & Born, D.-P. (2021). Key performance indicators related to strength, endurance, flexibility, anthropometrics, and swimming performance for competitive aquatic lifesaving. International Journal of Environmental Research and Public Health, 18(7), 3454. https://doi.org/10.3390/ijerph18073454
  • Rojas-Valverde, D., Pino-Ortega, J., Gómez-Carmona, C. D., & Rico-González, M. (2020). A systematic review of methods and criteria standard proposal for the use of principal component analysis in team’s sports science. International Journal of Environmental Research and Public Health, 17(23), 8712. https://doi.org/10.3390/ijerph17238712
  • Ruiz-Navarro, J. J., Cuenca-Fernández, F., Sanders, R., & Arellano, R. (2022). The determinant factors of undulatory underwater swimming performance: A systematic review. Journal of Sports Sciences, 40(11), 1243–1254. https://doi.org/10.1080/02640414.2022.2061259
  • Simbaña-Escobar, D., Hellard, P., & Seifert, L. (2018). Modelling stroking parameters in competitive sprint swimming: Understanding inter-and intra-lap variability to assess pacing management. Human Movement Science, 61, 219–230. https://doi.org/10.1016/j.humov.2018.08.002
  • Thng, S., Pearson, S., Rathbone, E., & Keogh, J. W. (2020). The prediction of swim start performance based on squat jump force-time characteristics. PeerJ, 8, e9208. https://doi.org/10.7717/peerj.9208
  • Tor, E., Pease, D. L., & Ball, K. A. (2015). Comparing three underwater trajectories of the swimming start. Journal of Science and Medicine in Sport, 18(6), 725–729. https://doi.org/10.1016/j.jsams.2014.10.005
  • Tor, E., Pease, D. L., Ball, K. A., & Hopkins, W. G. (2014). Monitoring the effect of race-analysis parameters on performance in elite swimmers. International Journal of Sports Physiology and Performance, 9(4), 633–636. https://doi.org/10.1123/ijspp.2013-0205
  • Toussaint, H. M., & Vervoorn, K. (1990). Effects of specific high resistance training in the water on competitive swimmers. International Journal of Sports Medicine, 11(3), 228–233. https://doi.org/10.1055/s-2007-1024797
  • Veiga, S., & Roig, A. (2016). Underwater and surface strategies of 200 m world level swimmers. Journal of Sports Sciences, 34(8), 766–771. https://doi.org/10.1080/02640414.2015.1069382
  • Weaving, D., Beggs, C., Dalton-Barron, N., Jones, B., & Abt, G. (2019). Visualizing the complexity of the athlete-monitoring cycle through principal-component analysis. International Journal of Sports Physiology and Performance, 14(9), 1304–1310. https://doi.org/10.1123/ijspp.2019-0045
  • Weimar, W., Sumner, A., Romer, B., Fox, J., Rehm, J., Decoux, B., & Patel, J. (2019). Kinetic analysis of swimming flip-turn push-off techniques. Sports, 7(2), 32. https://doi.org/10.3390/sports7020032
  • West, D. J., Owen, N. J., Cunningham, D. J., Cook, C. J., & Kilduff, L. P. (2011). Strength and power predictors of swimming starts in international sprint swimmers. Journal of Strength & Conditioning Research, 25(4), 950–955. https://doi.org/10.1519/JSC.0b013e3181c8656f
  • Wirth, K., Keiner, M., Fuhrmann, S., Nimmerichter, A., & Haff, G. G. (2022). Strength training in swimming. International Journal of Environmental Research and Public Health, 19(9), 5369. https://doi.org/10.3390/ijerph19095369
  • Wu, P.-Y., Garufi, L., Drovandi, C., Mengersen, K., Mitchell, L. J. G., Osborne, M. A., & Pyne, D. B. (2022). Bayesian prediction of winning times for elite swimming events. Journal of Sports Sciences, 40(1), 24–31. https://doi.org/10.1080/02640414.2021.1976485
  • Zamparo, P., Cortesi, M., & Gatta, G. (2020). The energy cost of swimming and its determinants. European Journal of Applied Physiology, 120(1), 41–66. https://doi.org/10.1007/s00421-019-04270-y