73
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A comparative evaluation of clustering methods and data sampling techniques in the prediction of reservoir landslide deformation state

, , , &
Received 26 Aug 2023, Accepted 31 Mar 2024, Published online: 12 Apr 2024
 

ABSTRACT

Landslides exhibiting step-wise deformation characteristics are extensively dispersed throughout the Three Gorges Reservoir (TGR) region of China. Predicting the deformation state of landslides in TGR holds paramount significance in landslide early warning and risk management. Machine learning-based landslide deformation state prediction is a combination of clustering and imbalanced classification. This paper compares the efficacy of three prevalent clustering methods, namely K-means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Gaussian Mixture Model (GMM), in the clustering analysis process. Furthermore, the paper evaluates the performance of three widely-used data sampling technologies, namely Synthetic Minority Oversampling Technique (SMOTE), SMOTE-Edited Nearest Neighbors (SMOTE-ENN), and ADAptive SYNthetic Sampling (ADASYN), in the imbalanced classification process. The Baijiabao and Bazimen landslides in the TGR region, which exhibit step-wise deformation characteristics, are used as case studies. Results indicate that DBSCAN and GMM exhibit significant advantages in the clustering process. Meanwhile, the mixture models that integrate oversampling technologies and classification algorithms perform exceptionally well in imbalanced classification. The aforementioned algorithms are recommended for predicting the deformation states of step-wise landslides in the TGR region. The machine learning-based predictive models can serve as potent instruments in facilitating the implementation of early warning systems aimed at mitigating landslide risks.

Acknowledgments

The dataset is provided by the National Field Observation and Research Station of Landslides in the TGR Area of the Yangtze River.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 To determine the number of runs needed for model convergence, we evaluated the random forest classifier with multiple runs and analyzed the stability of key statistics of mean and standard deviation. The results of the empirical analysis showed that both metrics stabilized within the 100 runs. Therefore, we believe that 100 runs provide sufficient convergence for the model statistics.

Additional information

Funding

This work was supported by the natural science foundation of jiangsu province: [Grant Number No.BK20220421]; the state key program of the national natural science foundation: [Grant Number No.42230702].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 172.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.