Full article: Multi-Scale Dense Graph Attention Network for Hyperspectral Classification

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

In recent years, numerous deep learning-based methods have gained increasing attention in hyperspectral classification, particularly the Graph Neural Network, which exhibits superior capabilities in structural description. However, a single graph structure is not suitable for hyperspectral feature representation. Therefore, we propose a novel Multiple-Scale graph network structure, known as the Multi-Scale Dense Graph Attention network for hyperspectral classification. Firstly, semi-supervised local Fisher discriminant analysis and superpixel segmentation were employed for dimensionality reduction and multi-scale graph construction, respectively. Secondly, Spectral-Spatial convolution is applied to extract shallow features from the image. Subsequently, an improved graph self-attention network is sequentially applied to each scale graph, and the different scale graphs are densely connected through spatial feature alignment modules, designed using twice matrix multiplication. Finally, the combined pixel-level feature map from multiple graph spaces is derived, and Spectral-Spatial convolution is employed to fuse the abundant feature maps for hyperspectral classification. Experimental results on various hyperspectral datasets demonstrate the superiority of our MSDesGATnet over many state-of-the-art methods. The code is available at https://github.com/l7170/MSDesGAT.git.

RÉSUMÉ

Ces dernières années, de nombreuses méthodes basées sur l’apprentissage profond ont suscité une attention croissante dans la classification hyperspectrale, en particulier le réseau neuronal graphique, qui présente des capacités supérieures en termes de description structurelle. Cependant, une seule structure de graphe n’est pas adaptée à la représentation des caractéristiques hyperspectrales. C’est pourquoi nous proposons une nouvelle structure de réseau graphique à plusieurs échelles, connue sous le nom de réseau d’attention graphique dense à plusieurs échelles pour la classification hyperspectrale. Tout d’abord, une analyze discriminante de Fisher locale semi-supervisée et une segmentation de superpixel ont été utilisées respectivement pour la réduction de la dimensionnalité et la construction de graphes à plusieurs échelles. Ensuite, une convolution spectrale-spatiale est appliquée pour extraire des caractéristiques de niveau supérieur à partir de l’image. Par la suite, un réseau d’attention graphique amélioré est appliqué séquentiellement à chaque graphe à différentes échelles, et les graphes à différentes échelles sont connectés de manière dense grâce à des modules d’alignement des caractéristiques spatiales, conçus à l’aide de deux multiplications matricielles. Enfin, la carte des caractéristiques combinées au niveau des pixels à partir de plusieurs espaces de graphe est obtenue, et une convolution spectrale-spatiale est utilisée pour fusionner les nombreuses cartes de caractéristiques pour la classification hyperspectrale. Les résultats expérimentaux sur divers ensembles de données hyperspectrales démontrent la supériorité de notre réseau MSDesGAT par rapport à de nombreuses méthodes de pointe. Le code est disponible sur https://github.com/l7170/MSDesGAT.git.

Introduction

Hyperspectral imagery (HSI) combines spatial information with hundreds of continuous, narrow spectral bands, enabling the extraction of more discriminative features for precise land cover classification. However, three main limitations persist in HSI classification: (1) Redundancy in spectral features within hyperspectral data. (2) Limited availability of training samples. (3) Significant spatial variability in spectral signatures (Camps-Valls and Bruzzone Citation2005). To reduce the impact of high-dimensional redundancy, the focus primarily lies on obtaining a low-dimensional representation of high-dimensional data samples while retaining most of the intrinsic information. Techniques such as Principal Component Analysis, Fisher Discriminant Analysis, Local FDA (Li et al. Citation2012), manifold learning, Graph representation (Li et al. Citation2016), and Semi-supervised Local Fisher Discriminant Analysis (Li et al. Citation2017) have been employed. In the case of HSI classification with limited sample size, methods such as sample augmentation (Li et al. Citation2018), semi-supervised learning (Aydemir and Bilgin Citation2019), adversarial network (Huang et al. Citation2022), few-shot learning (Liu et al. Citation2022), superpixel segmentation-based approaches (Zheng et al. Citation2019), Pseudo-labeling combined with pre-trained model (Yao, Hong, et al. Citation2023) and enhanced prototype contrastive learning (Liu et al. Citation2023) are utilized. To address the third limitation, Support Vector Machine (Mercier and Lennon Citation2003) and Random Forest algorithms (Amini et al. Citation2014), along with incorporating spatial information such as Markov random field (Tarabalka et al. Citation2010), Random walk (Cao and Wang Citation2017), and guided filtering (Kang et al. Citation2014; Liao et al. Citation2018), have achieved significant success in traditional machine learning-based HSI classification over the past decade.

In recent years, deep learning methods have demonstrated promising performance in HSI classification. Notably, Convolutional Neural Networks (CNNs) such as 1D-CNN, 2D-CNN, 3D-CNN (Audebert et al. Citation2019), or Hybrid-CNN (Roy et al. Citation2020; Datta et al. Citation2022) have been successful in effectively extracting spatial and spectral features at the pixel level. These features are then utilized in fully connected neural networks for classification. At the same time, multimodal-based methods have become the trend, such as the use of multimodal data (Yao, Hong, et al. Citation2023) while using advanced supervised classification methods such as transformer (Zhang et al. Citation2023; Yao, Zhang, et al. Citation2023) or unsupervised methods (UCSL; Yao, Zhang, et al. Citation2023), have achieved significant results. However, CNNs with fixed-size convolution kernels can only process regular grid data, making it challenging to fit different sizes of ground objects in HSIs. To address this, some researchers proposed Multiscale Feature Fusion Networks based on global weighting (MSFGW; Wang et al. Citation2023) and Non-Gridding Multi-level Concatenated Atrous Pyramid Convolution (NG-APC; Zhang et al. Citation2019). These methods employ atrous convolution kernels to process multiscale objects. While different convolution kernels can be combined (MJ-DNN; Liu et al. Citation2022), the features extracted exhibit spatial scale invariance. Consequently, some scholars have suggested HSI classification models based on Graph Neural Networks (GNNs). GNNs offer the flexibility to organize nodes that describe the relationships among ground objects in the spatial or spectral structure. For instance, Spectral-Spatial Graph Convolutional Networks (SSGCN; Qin et al. Citation2019) treat each super-pixel in each spectral band as a graph node, albeit resulting in redundant computations. Among GNN frameworks, Graph Convolutional Neural Network (GCN) can aggregate the spectral features of ground objects and effectively improve the spatial variability of HSI spectral signatures (Hong et al. Citation2021; Hu et al. Citation2022). However, GCN relies on graph structure and is not applicable to HSI scenes with ground objects of different sizes, which need to be described by multi-scale graph structure, and at the same time, with the increasing number of GCN layers, the oversmoothing problem occurs, which has been alleviated by some advanced GCN-based methods using DropEdge technique and residual connection (Yu et al. Citation2023) in recent years, which have shown strong generalization ability. Inspired by the multi-head self-attention (MHSA) mechanism, the graph attention network (GAT) learns weights for different neighbor nodes during training to enhance classification accuracy. Moreover, GAT suits multiple graph structures (Zhao et al. Citation2022).

Recently, there have been methods proposed for single-scale (CEGCN; Liu et al. Citation2021) or multi-scale feature extraction using GNNs. Attention multihop graph and multiscale convolutional fusion network (AMGCFN; Zhou et al. Citation2023), multi-scale receptive fields: Graph attention neural network (MRGAT; Ding et al. Citation2023) and multiscale graph convolutional network (MSGCN; Zhao et al. Citation2021) use different hops to achieve multi-scale feature extraction. Though they achieve multi-scale feature extraction, they are based on the same graph space. The effectiveness of feature extraction depends on the initial graph space making use of a node under a single graph space tends to be impure. Once multi-scale features are extracted in different graph spaces, there will inevitably be misaligned superpixel sizes in different graph spaces. Local aggregation and global attention network (LAGAN; Chen et al. Citation2023) uses two superpixel fusion strategies, namely minimum fusion and maximum fusion, which can fuse two superpixels of different sizes, but at the cost of lose some original pixel-level features and cannot achieve perfect alignment. GACP proposes a method for converting pixel-level features to superpixel-level features, but it can only perform the conversion in a single graph space and cannot achieve multiple graph space transformations. Using dense connections in GNNs can make the extracted features richer, as demonstrated by methods like AMGCFN and Deep Hybrid: Multi-graph neural network (DHMG; Yao, Zhi-Li, et al. Citation2023). AMGCFN achieves dense connections between different scales, while DHMG achieves dense connections between different feature extraction methods. However, both methods extract features in the same graph space, and achieving dense connections between features extracted from different graph spaces can result in even richer features. After obtaining rich feature maps, further feature fusion can be performed, such as MSFGW and multiple spatial features extraction-fusion (MSFs-EF; Liao and Wang Citation2020), which fuse the multiple feature maps. For fusing multiple feature maps, methods such as extracting joint spatial-spectral information can be further used (Huang et al. Citation2022).

Taking inspiration from the notable accomplishments of Spectral-Spatial convolution, GAT, and DenseNet mentioned in HSI classification, we present a pioneering multi-scale densely connected graph attention network tailored to address the limitations of other approaches rooted in the multi-scale concept and dense graph network. This innovative network, known as MSDesGATnet, introduces a novel graph node feature alignment module (GNFA) that enables dense connections across various scales while preserving the original pixel-level features. Simultaneously, it establishes strong associations across different graph spaces. Leveraging the GNFA module and DenseNet, MSDesGATnet outperforms other GNN-based multi-graph methods like the dynamic multi-scale graph convolutional network classifier (DMSGer; Yang et al. Citation2022). Our final feature maps exhibit exceptional noise resistance and an exponentially vast collection of feature maps, which is evident in both the accuracy of classification and the resultant classification map. The key contributions of this paper are as follows:

To combine deep graph node features of different scales and implement a DenseNet architecture, we have developed the Graph Node Feature Alignment modules (GNFA) that utilize two matrix multiplications as spatial transformations at each cross-scale connection. To the best of our knowledge, this is the first instance of integrating a multi-scale graph with DenseNet for HSI classification.
SELF and superpixel segmentation are employed for dimensionality reduction and multi-scale graph construction, respectively. Furthermore, it saves memory and is time-consuming.
To address the variability of spectral signatures in HSI scenes, we put forward a groundbreaking end-to-end multi-scale graph-dense network integrated with an MHSA mechanism. By leveraging the amalgamated features within the multi-scale graph spaces, we enhance the robustness and accuracy of semi-supervised HSI classification.
MSDesGATnet exhibits superior accuracy across various publicly accessible datasets, effectively demonstrating its robustness in the face of node impurities. This unique attribute enables fellow researchers to leverage our framework when exploring graph node feature fusion algorithms with new datasets, eliminating the need for additional time-consuming searches for original datasets with compatible resolutions.

Relevant methods

In this section, we introduce the remaining three algorithms utilized in our proposed approach. For data preprocessing, we employ the Semi-LFDA and simple linear iterative clustering (SILC) algorithms. The former serves to reduce data dimensions, while the latter constructs graph spaces. In the feature iteration phase, we incorporate the GATv2 algorithm (Brody et al. Citation2021).

Semi-LFDA algorithm

The original hyperspectral image comprises numerous channels, often in the dozens or even hundreds, which may contain considerable noise. Hence, it becomes necessary to diminish the redundant dimensions of the data while retaining the bands that can potentially benefit our classification task. To leverage the advantages of both supervised and unsupervised dimensionality reduction techniques, Semi-LFDA was employed for reducing the data dimensions. In SELF, the labeled data contributes to the projection via LFDA and PCA is used to preserve the global structure of the unlabeled data.

Assuming the existence of labeled samples $X^{L} \in R^{d \times n'}$ , unlabeled samples $X^{U} \in R^{d \times n^{″}}$ , $X = [X^{L}, X^{U}], n = n^{'} + n^{″}$ , then $X^{L}$ is utilized to compute the local between-class scatter matrix $S^{(lb)}$ and local within-class scatter matrix $S^{(lw)}$ for the labeled samples, and subsequently, the total scatter matrix $S^{(t)}$ was derived. Next, $S^{(lb)}$ and $S^{(lw)}$ was regularized as: [1] $\{\begin{matrix} S^{(rlb)} = (1 - β) S^{(lb)} + β S^{(t)} \\ S^{(rlw)} = (1 - β) S^{(lw)} + β I_{d} \end{matrix}$ [1] where $β$ determines the balance between PCA and LFDA. To build the transformation matrix, the eigenvectors associated with eigenvalues that contribute to a cumulative sum exceeding a threshold such as 99%, as shown in the following equation: [2] $\{λ_{k}, φ_{k}\} \leftarrow S^{(rlb)} φ_{k} = λ_{k} S^{(rlw)} φ_{k}$ [2]

These eigenvectors constitute the transformation matrix. We employ it to reduce the data dimensionality, subsequently feeding the processed image into our network alongside the SILC algorithm.

SLIC algorithm

Utilizing the dimensionality-reduced data, we conduct superpixel segmentation on the dataset. Each superpixel resulting from the segmentation is treated as a node, with its feature being the average of the pixel features within that region, as depicted in . Substituting pixels with superpixels can significantly reduce computational requirements and complexity. Additionally, MSDesGATnet relies on the SLIC algorithm to repeatedly initialize different numbers of cluster centers to generate a multi-scale graph.

Figure 1. SLIC algorithm for graph construction.

GATv2 algorithm

In GNN, the features of each node are aggregated and updated. To better enable MSDesGAT for feature extraction in multi-scale graph space, we use dynamic Graph attention. Assuming $[H_{1}, H_{2}, \dots H_{n}]$ represents the features of $n$ nodes, the node features $H_{i}^{k}$ are updated according to the following procedure: [3] $H_{i}^{h} = α_{i, i} W H_{i}^{k} + \sum_{j \in N (i)} α_{i, j} W H_{j}^{k}$ [3] $W$ is a parameter that can be learned, while $N (i)$ representing the neighboring nodes of a node $i$ . The attention coefficient $α$ is calculated as follows: [4] $α_{i, j} = \frac{exp (a^{⊤} LeakyReLU (W [H_{i}^{k} ∥ H_{j}^{k}]))}{\sum_{j \in N (i) \cup \{i\}} exp (a^{⊤} LeakyReLU (W [H_{i}^{k} ∥ H_{j}^{k}]))}$ [4]

To ensure numerical stability and robustness, the attention coefficients were normalized and the attention mechanism was extended to multi-head attention.

By employing the MHSA mechanism, GATv2 accomplishes the update of node features at the superpixel level. In our experiments, each single-scale graph undergoes two sequential GATv2 layers. The first layer consists of 4 multi-head attention, while the second layer involves a single attention, as illustrated in .

Figure 2. Composition of GAT layer in MSDesGATnet.

Proposed MDesGATnet framework

In this section, we will present our proposed MSDesGATnet architecture for hyperspectral classification, as depicted in . It comprises two main components: Graph Node Feature Alignment modules and Fusion feature modules. The former facilitates the transformation of data between graph space and pixel space, while the latter connects the data processed by GATv2 to the input and iterates this process, similar to DenseNet. Additionally, we incorporate spectral-spatial convolution to fuse the rich feature maps obtained from multiple scales.

Figure 3. Flowchart of the proposed MSDesGATnet.

The input to the entire MSDesGATnet is derived from the image data after SELF dimensionality reduction. Let $X \in R^{n \times r}$ represent the reduced-dimensional data, which consists of $n$ pixels, with each pixel containing $r$ dimensional features.

Graph node feature alignment modules

When employing multi-scale graph neural networks for feature extraction, it is crucial to address the transfer of node features across different scale graphs. As the boundaries of superpixel segmentations at different scales often do not align precisely, it is not feasible to directly fuse node features across scales. Therefore, we propose the GNFA module, which enables the transfer of node features through a sequential transformation at the pixel-node-pixel level.

The relationship between nodes and pixels can be represented by the transformation matrix $Q \in R^{n \times m}$ . For instance, if we obtain m nodes through superpixel segmentation and there is a total of $n$ pixels, we can construct the $Q$ matrix as follows: [5] $Q_{nm} = \{\begin{matrix} 1 & if x_{n} \in V_{m} \\ 0 & else \end{matrix}$ [5] where $x_{1}, x_{2}, \dots x_{n}$ represents the sequence of pixels and $V_{1}, V_{2}, \dots V_{m}$ represents the sequence of nodes. This equation demonstrates that if a pixel $x_{n}$ belongs to the nodes $V_{m}$ , then the corresponding value in the $Q$ matrix at position $(n, m)$ is 1; otherwise, it is 0. As depicted in , the node $V_{2}$ includes $x_{3}, x_{4}, x_{11} x_{12}, x_{13}$ , so the corresponding value in the column of $V_{2}$ in $Q$ is 1, while the remaining values are 0. Similarly, the values in each column of $Q (V_{1}, V_{2}, \dots V_{n})$ can be determined based on the included pixels.

Figure 4. Acquire the values of the Q matrix.

Assuming $X$ represents the pixel-level space and $H \in R^{m \times r}$ represents the node-level space, we can convert the pixel-level space into the node-level graph space as an encoding step: [6] $H = {\hat{Q}}^{⊤} X$ [6]

$⊤$ denotes the transpose operator. $\hat{Q}$ serves as the regularization for $Q$ , as the node feature is defined as the average of the pixels associated with the node: [7] $\overset{⁁}{Q} = Q D_{Q}^{- 1}$ [7] $D_{Q}$ represents the diagonal matrix, and its inverse matrix $D_{Q}^{- 1} \in R^{m \times m}$ is defined as follows: [8] $D_{Q}^{- 1} = diag ({(\sum_{j} Q_{j 1})}^{- 1} \dots {(\sum_{j} Q_{jm})}^{- 1}))$ [8]

Similar to Equation (6), the $Q$ matrix can also be used as a decoding step to transform node-level features back to pixel-level features: [9] $X^{'} = Q (GATv 2 (H, A))$ [9] where $GATv 2 ()$ represents the output of the GAT layer, and the GAT layer updates the node features $H$ . The adjacency matrix $A$ represents the current scale graph structure. The output of the GAT layer is multiplied on the left by the transformation matrix $Q$ to obtain the pixel-level features $X^{'}$ .

The GNFA module can be constructed by utilizing the encoding and decoding steps explained earlier. offers an intuitive visualization of the procedure for altering pixel features within a GNFA module. Initially, the node feature is determined by taking the average value of the pixel features during the encoding step. Subsequently, the node feature undergoes aggregation and updating through two GAT layers. Finally, the updated node features are substituted with the pixel-level features during the decoding step. To align two sets of node features from different scale graphs, two GNFA modules can be concatenated together. In our MSDesGATnet, the first GNFA module is employed for the larger-scale graph, while the second GNFA module is used for the smaller-scale graph. This approach converts the node features of the larger-scale graph into pixel-level features through the decoding step of the first GNFA module, and subsequently re-encodes the pixel-level features, embedding latent information from the larger-scale graph, into the node features of the smaller-scale graph.

Figure 5. Graph node feature alignment modules.

The SLIC algorithm through the initial seed count can determine the number of nodes. To enhance efficiency, we can pre-compute transformation matrices $Q_{k}$ and ${\hat{Q}}_{k}$ for each scale $k$ , as illustrated in . These pre-computed matrices, $Q_{k}$ and ${\hat{Q}}_{k}$ , enable us to construct the GNFA modules in our MSDesGATnet more efficiently.

Figure 6. Qs achieved in multi-scales.

Multiple graphs dense connection network

In this research, we incorporate the DenseNet concept into our multiple graph-dense connection networks to effectively extract multi-scale features. Similar to DenseNet, we can obtain deep features with a larger receptive field while preserving shallow features by stacking multiple GNN layers. Specifically, as depicted in , $X_{k}$ denotes the input of the $k$ th GNFA module, and $X_{k + 1}$ can be obtained as follows: [10] $X_{k + 1} = (Q_{k} {GAT}_{v 2} ({\overset{⁁}{Q}}_{k}^{⊤} X_{k}, A_{k})) ∥ X_{k}$ [10]

Figure 7. Multiple graphs dense connection network.

$Q_{k}$ and ${\hat{Q}}_{k}$ are obtained through SLIC segmentation, and $A_{k}$ is defined. These matrices can be pre-computed and stored in GPU memory to ensure efficient training. In the GNFA modules, we maintain the same dimensions for the input and output. As a result, the feature dimension of $X_{k + 1}$ is twice that of $X_{k}$ . In the MSDesGATnet, we utilize four scale GNFA modules, allowing us to obtain high-dimensional features with a feature dimension $2^{4} = 16$ times that of the original.

Spectral-spatial convolution module

In MSDesGATnet, the Spectral-Spatial convolution (SSConv) module is utilized twice. The first time, this module is used to extract shallow features at the pixel level, while the second time, it is employed to fuse the high-dimensional features obtained from the multi-scale GNN. As depicted in , the SSConv module comprises two sequential submodules: the spectral convolution submodule and the spatial convolution submodule.

Figure 8. Spectral-Spatial convolution for multi-scales future fusion.

In the spectral convolution submodule, the input undergoes batch normalization as the first step. Then, a 1 × 1 convolution is applied to extract features in the channel dimension. Lastly, a LeakyReLU activation layer is employed for non-linear mapping. As for the spatial convolution submodule, two consecutive 3 × 3 depth-wise convolutions are utilized, with a LeakyReLU layer in between.

Exponential feature maps under the permutation graph space

In our algorithm framework, we have multiple graph spaces, namely $G_{1}$ and $G_{2}$ . The data unit, $X \in R^{H \times W \times C}$ , is defined as the tensor size of the original dataset. The output results of each GNFA module can be expressed as integer multiples of $X$ . We utilize MHSA to extract features from $X$ within the graph space $G_{x}$ , denoted as $X (G_{x})$ . Upon analysis, it has been determined that when $X$ sequentially passes through two graph spaces, $G_{1}$ and $G_{2}$ , under the influence of GNFA, the output includes $X$ , $X (G_{1})$ , $X (G_{2})$ , and $X (G_{1} | G_{2})$ . Furthermore, when analyzing the output of three consecutive graph spaces (adding $G_{3}$ ), we obtain even more diverse output feature maps: $X \mapsto X (G_{3})$ , $X (G_{1}) \mapsto X (G_{1} | G_{3})$ , $X (G_{2}) \mapsto X (G_{2} | G_{3})$ , $X (G_{1} | G_{2}) \mapsto X (G_{1} | G_{2} | G_{3})$ . Through the dense net operation, the final result can be presented as: [11] $X_{k + 1} = ∥ X (G \in {G_{1}, G_{2}, \dots G_{k}})$ [11]

Equation (11) indicates that the feature maps are processed from the sub-graph spaces of all graph spaces. At the same time, the GNFA module ensures that node features in one graph space are represented in nodes of other graph spaces: [12] $H^{G_{x}} = GNFA (Q_{x}, Q_{y}, H^{G_{x}}) = {\hat{Q}}_{y}^{⊤} Q_{x} H^{G_{x}}$ [12]

Overall, the GNFA module and dense connections equip MSDesGATnet to generate an output that encompasses exponentially increasing rich features (all permutations of $k$ graph spaces). This implies that even if some superpixels in a certain space are impure, their influence on the final output feature map is relatively minor. Therefore, compared to other methods based on a single space, it possesses a stronger anti-interference capability. Moreover, SSConv integrates these abundant feature maps, and a weighted loss function tackles the issue of an imbalanced number of ground-truth categories. presents the pseudo-code of MSDesGATnet.

Table 1. Pseudo code of MSDesGATnet.

Display Table

Experiments and analysis

In this section, we choose four public hyperspectral datasets for classification to validate the effectiveness of our proposed MSDesGATnet, namely Indian pines (IP), Kennedy Space Center (KSC), Pavia University (PU), and WHU-Hi-HongHu. Additionally, we select five state-of-the-art algorithms for comparison. We also conduct ablation studies and multi-scale sensitivity analyses.

Dataset description

IP dataset

The Indian Pines dataset was captured by AVIRIS (0.4–2.5 μm) in the USA. The shape of the HSI is 145 × 145 and it comprises 220 bands. The ground truth of the Indian Pines has 16 categories. displays the false color and ground truth of Indian Pines, respectively.

Figure 9. IP: (a) false color map (b) ground truth.

KSC dataset

The KSC dataset was captured by the AVIRIS sensor at the Kennedy Space Center in Florida on March 23, 1996. After the removal of water vapor noise, 176 bands remain, with a spatial resolution of 18 meters. The dataset comprises a total of 13 categories and 614 × 512 pixels. displays the false color and ground truth, respectively.

Figure 10. KSC: (a) false color map (b) ground truth.

PU dataset

The Pavia University dataset is a portion of the hyperspectral dataset of the image of Pavia, Italy, captured by the German airborne reflection optical spectral imager in 2003. The imager continuously captures 115 bands in the 0.43–0.86 μm wavelength range. Due to noise, 12 of these bands are eliminated, so the image generally used consists of the remaining 103 spectral bands. The image size is 610 × 340, and the ground truth contains nine categories. displays the false color and ground truth of PU, respectively.

Figure 11. PU: (a) false color map (b) ground truth.

WHU-Hi-HongHu dataset

The WHU-Hi-HongHu dataset was gathered by a DJI Matrice 600 Pro drone in Wuhan, Hubei Province, China. It encompasses 22 distinct land cover types, including different varieties of crops such as Chinese cabbage and cabbage. The images measure 940 × 475 pixels and comprise 270 channels ranging from 200 to 1000 nanometers in wavelength. The spatial resolution is 0.043 meters, classifying it as a high-resolution (H²) hyperspectral dataset. presents the false color and ground truth, respectively.

Figure 12. WHU-Hi-HongHu: (a) false color map (b) ground truth.

Experimental results and quantitative discussion

Experimental settings

The symbols of MSDesGATnet’s hyper-parameters are defined as follows: $M$ represents multi-scale parameters; $lr$ stands for learning rate; $L$ denotes the number of the GNFA modules; $T$ signifies the number of training epochs. We set different hyper-parameters for three datasets, as shown in . After our experiments on multiple datasets, we found that using MSDesGAT at scale 3 or so could makes the combined metric of accuracy-time optimal.

Table 2. The hyperparameter settings of MSDesGAT.

Display Table

For fairness, all comparison methods and MSDesGATnet are based on the data after SELF. The number of channels after SELF is 95 for IP, 6 for PU, 106 for KSC, and 109 for WHU-Hi-HongHu, respectively.

All the experimental algorithms are implemented with Python-3.9, PyTorch-1.12.1, and Adam optimizer with an RTX-3090 GPU.

Cross-validation and comparison methods

For the dataset selection, 25 samples of each class under each dataset were selected as the training set. We’re using these samples in a 5-fold cross-validation method. In the 5-CV, the results of each iteration are shown in the table. The test set includes all labeled samples, and the data presented for the test set are the averages of the 5-fold cross-validation results, along with the standard deviation.

To assess the performance of MSDesGATnet, we’re comparing it with six other state-of-the-art methods. These methods include three CNN-based methods: (1) Dynamic Convolution Neural Network (DCNN; Makantasis et al. Citation2015), (2) Double-Branch Dual-Attention Network (DBDAnet; Li et al.Citation2020), (3) Spectral-Spatial Residual Network (SSRN; Zhong et al.Citation2018), and two GNN-based methods: (1) Edge Labeling Graph Neural Network (EGNN; Hu et al.Citation2022), (2) Attention Multihop Graph and Multiscale Convolutional Fusion Network (AMGCFN; Zhou et al. Citation2023). We’re using three evaluation metrics in our experiments, namely, overall accuracy (OA), average accuracy (AA), and Kappa ( $κ$ ).

The quantitative and visual results of IP

provides quantitative experimental results on the Indian Pines dataset, including the precision of each class, along with the corresponding OA, AA, and $κ$ . We’ve highlighted the maximum accuracy in each row. The results show that the OA, $κ$ , and AA of MSDesGATnet were higher than those of other methods, and the standard deviation was relatively small. We can find that GNN-based methods perform similarly to CNN-based methods, whereas at the same time, it is with multi-scale methods such as AMGCFN and our proposed MSDesGAT that performs better compared to the single-scale method EGNN, even with the presence of a large number of continuously distributed crops in the IP dataset. Comparing AMGCFN and MSDesGAT, MSDesGAT improves the AA by about 10%, and although both methods use multiple scales, the number of feature maps extracted by MSDesGAT is exponential, and the final feature fusion module can further fuse richer features than AMGCFN. Based on , we also found that MSDesGATnet can also classify similar spectral features with high accuracy, such as C5, C6, and C7. This is due to the attention mechanism of MSDesGAT, which achieves similar effects as DBDAnet and AMGCFN.

Table 3. OAs, AAs and Kappas obtained by all methods in IP. The bolded value in each row indicates the best performance.

Download CSV Display Table

demonstrates the classification map of all methods. Compared with the Ground Truth, we observe that the classification map achieved by MSDesGATnet is closer and smoother to the Ground truth due to the multiple graph encoding and decoding processes. The classification results of MSDesGAT are more in line with the practical needs, i.e., the crops in the IP dataset are spatially continuous within classes and discrete between classes. Therefore, it is reasonable to infer that MSDesGAT performance on the IP baseline dataset is more accurate and robust.

Figure 13. Classification maps of all methods in Indian pines. (a) false-color map. (b) Ground truth. (c) DCNN (OA=71.03%) (d) DBDAnet (OA=65.01%) (e) SSRN (OA=76.27%) (f) EGNN(OA=70.76%) (g) AMGCFN (OA=72.4%) (h) MSDesGATnet (OA=94.3%).

The quantitative and visual results of KSC

Similar to the findings in , MSDesGATnet boasts the highest OA on the KSC dataset. Although AMGCFN also achieved a high AA, the results didn’t show a significant difference. However, MSDesGATnet demonstrated superior accuracy across more categories. At the same time, the performance of the CNN-based method and the GNN method are both excellent due to the small number of samples that have been labeled, 25 samples selected account for 10% to 2% of the total samples, and at the same time, the samples of each class are centrally distributed, which is conducive to convolution kernel capturing by the CNN method.

Table 4. OAs, AAs and Kappas obtained by all methods in KSC. The bolded value in each row indicates the best performance.

Download CSV Display Table

presents the detailed classification outcomes of MSDesGATnet. Misclassification occurred between C4 and C5 and between C2 and C3. Ground truth reveals that C4/C5 and C2/C3 are in close proximity to each other and have small targets. Facing the problem of superpixel impurity, AMGCFN mitigates this problem by introducing a CNN module, and MSDesGAT mitigates this problem by a densely connected multi-scale graph space-feature fusion method, so that AMGCFN has the strongest discriminative ability on C4/C5, and MSDesGAT has the strongest discriminative ability on C2/C3, but at the same time, the number of parameters(M) of MSDesGAT is much smaller than that of AMGCFN and FLOPs(G) are on the same order of magnitude as AMGCFN.

Figure 14. Normalized confusion matrix of MSDesGATnet on (a) IP (b) KSC.

displays the classification maps, with particular emphasis on the two magnified view regions within each map. For the upper magnified region, it can be observed that the classification results of GNN-based method are smoother than those CNN-based method, and MSDesGAT has fewer misclassifications. For the lower magnified region, it can be observed that the classification results of AMGCFN and the proposed MSDesGAT are closer to GT than all other methods. However, in the misclassification results of AMGCFN, there is a misclassification of C3 to the distant C1. MSDesGAT, while having a global receptive field, alleviates this impact through a multi-scale dense graph spatial strategy, which is more in line with practical scenarios.

Figure 15. Classification maps of all methods on KSC dataset. (a) Ground truth. (b) DCNN(OA=88.23%) (c) DBDAnet (OA=98.5%) (d) SSRN (OA=94.53%) (e) EGNN (OA=92.13%) (f) AMGCFN (OA=98.25%) (g) MSDesGATnet (OA=99.46%). In (a)-(g), zoomed-in views of regions of bule were shown in above and bottom at each classification map.

The quantitative and visual results of PU

presents the accuracy of the Pavia University dataset, where it’s observed that our proposed MSDesGATnet achieves the highest OA. When compared to other methods such as AMGCFN, MSDesGATnet not only exhibits significant generalization performance on large-scale categories with large samples like C2 but also on small-scale categories like C8. Concurrently, it’s noted that the subpar performance of EGNN and DCNN is due to the small sample size and varying size of the features. While SSRN can extract spatial and spectral features from relatively clustered samples, both AMGCFN and MSDesGATnet demonstrated commendable performance using multi-scale feature extraction and attention mechanisms.

Table 5. OAs, AAs and Kappas obtained by all methods in PU. The bolded value in each row indicates the best performance.

Download CSV Display Table

presents the specific classification outcomes of MSDesGATnet on PU, while illustrates the classification map of all methods. It’s observed that the misclassification rates between C4 and C1, as well as between C9 and C5, are relatively higher. This is attributed to these categories being near smaller targets, limiting the effectiveness of the attention mechanism under multi-scale conditions. However, the classification results for C7 and C6 outperform other methods like AMGCFN.

Figure 16. Classification maps of all methods on PU dataset. (a) Ground truth. (b) DCNN (OA = 52.80%) (c) DBDAnet (OA = 70.46%) (d) SSRN (OA = 83.68%) (e) EGNN (OA = 71.45%) (f) AMGCFN (OA = 84.04%) (g) MSDesGATnet (OA = 89.79%).

The quantitative and visual results of WHU-Hi-HongHu

showcases the quantitative classification accuracies of all methods on the HongHu dataset. It’s observed that our proposed MSDesGATnet achieved the highest AA, albeit slightly lower than AMGCFN in terms of OA and kappa. MSDesGATnet also excelled in more categories and exhibited relatively small standard deviations, with notably higher classification accuracies on C3 and C14 compared to other methods. Additionally, it was found that the performance of CNN-based methods was inferior to that of GNN-based methods. This is attributed to the fact that extracting a small number of labeled samples from a large set can cause significant separation between training samples beyond the receptive field of CNNs. However, GNN-based methods, through multi-layer information propagation, can aggregate global information and are more suitable for training with a small number of samples than CNN-based methods.

Table 6. OAs, AAs and Kappas achieved by all methods in WHU-Hi-Honghu. The bolded value in each row indicates the best performance.

Download CSV Display Table

Based on , it’s observed that MSDesGATnet can also avoid misclassification between different categories of the same crop, such as C7 and C9. Overall, the classification accuracies of most categories exceed 90%. displays the classification maps of all methods on the HongHu dataset, revealing that AMGCFN and the proposed MSDesGATnet have fewer misclassifications and similar classification outcomes. However, since AMGCFN also incorporates the CNN method, its classification results exhibit more noise compared to MSDesGATnet. MSDesGATnet not only ensures accuracy but also provides smoother and more robust classification maps.

Figure 17. Normalized confusion matrix of MSDesGATnet on: (a) PU (b) HongHu.

Figure 18. Classification maps of all methods on HongHu dataset. (a) ground truth (b) DCNN (OA = 43.7%) (c) DBDAnet (OA = 43.9%) (d) SSRN (OA = 40.3%) (e) EGNN (OA = 77.6%) (f) AMGCFN (OA = 90.6%) (g) MSDesGATnet (OA = 89.4%).

Overall, MSDesGAT outperforms other state-of-the-art methods in classification accuracy and performance on multi-scale feature datasets such as PU and spatially continuous single-scale datasets such as IP. For the problem faced by GNN, which is the impure superpixel nodes, MSDesGAT uses a multi-scale connected graph space. Compared to CNN-enhanced methods such as AMGCFN, the classification results are more in line with practical scenarios. At the same time, MSDesGAT introduces an exponential multi-scale feature fusion module to improve classification accuracy, and the overall network has a small number of parameters(M). FLOPs(G) are on the same order of magnitude as other SOTA methods, such as AMGCFN.

Ablation study

The proposed MSDesGATnet encompasses three crucial operations: Multi-scale dense connection, dynamic graph attention mechanism, and multi-scale feature fusion submodule. To demonstrate the effect of the dense connection, we substitute the dense connection with a simple sequential operation and name it MSGATnet. To illustrate that the GNN method extracts features more effectively than the CNN method, we replace GNN with a spatial-spectral convolution operation based on MSGATnet and simulate the scale reduction of MSGATnet with maxpooling, which is named Multi-scale CNN. Single-scale GATnet is designed to highlight the advantages of multi-scales, and MDesGATnet– further removes the SSConv layer for multi-scale feature map fusion.

In this section, the training set partitioning method and cross-validation are kept consistent with the comparison experiments, including the selected samples, and the results on three datasets are demonstrated in . Among all the results, MSDesGATnet exhibits the best performance and the dynamic graph attention mechanism demonstrates that it is more suitable for small samples than CNN. Furthermore, it’s observed that the use of the SSConv layer can enhance accuracy by approximately 0.4% to 8%. Notably, in datasets containing multiple densely distributed classes (e.g., PU), our proposed densely connected, multi-scale and feature fusion approach can cope with the superpixel impurity problem better than using one of them singularly with an improvement of about 10% in the accuracy precision. In other datasets (e.g., IP, KSC, HongHu), our proposed method also improves the accuracy by 1% to 5% over the single use of densely connected or simply multi-scale. In summary, our proposed multi-scale dense connection, dynamic graph attention mechanism and multi-layer feature fusion prove to be effective.

Table 7. Ablation Study Results on IP. The bolded value in each row indicates the best performance.

Download CSV Display Table

Table 8. Ablation Study Results on KSC. The bolded value in each row indicates the best performance.

Download CSV Display Table

Table 9. Ablation Study Results on PU. The bolded value in each row indicates the best performance.

Download CSV Display Table

Table 10. Ablation Study Results on HongHu. The bolded value in each row indicates the best performance.

Download CSV Display Table

Multi-scale sensitivity analysis

The value and number of scales within the multi-scale often influence the final classification accuracy. we experimented with the classification accuracy at different scales for three datasets based on the proposed MSDesGATnet. Here, $len$ represents the number of scales and $k$ denotes the difference in the number of superpixels between scales. For different $k$ and $len$ values within the same dataset, they all commence from the smallest scale in .

illustrates the trend of classification accuracy with varying $k$ and $len$ values. It’s observed that an increase in the scale number doesn’t necessarily lead to a rise in classification accuracy, but there are one or two higher values. For the IP, PU and HongHu datasets, there exists a specific $k$ and $len$ that render the classification accuracy locally optimal. Meanwhile, for the KSC datasets, the accuracy tends to be higher at smaller scale $len$ and $k$ . Overall, the accuracy gap fluctuates more with different datasets.

Figure 19. Average accuracy at different multiple scales at (a) IP (b) KSC (c) PU (d) HongHu.

Running time

For MSDesGATnet, showcases the average running time of all compared methods, the hyperparameters were set identically to the comparison experiment.

Table 11. Running time (seconds) of MSDesGATnet in IP PU KSC dataset.

Download CSV Display Table

Although the testing run time of MSDesGAT on the KSC and HongHu datasets is longer than other methods because KSC and HongHu have more channels after SELF, the running time of MSDesGAT is smaller on other datasets such as IP and PU. To further reduce the running time of the proposed MSDesGAT, a specific number of dimensionality reduction dimensions can be specified, but to some extent, it is not conducive to classification accuracy. In summary, based on the comprehensive classification accuracy, results, and running time, it can be concluded that our proposed MSDesGAT is effective.

Conclusion

In this study, we introduce a novel end-to-end multi-scale graph network structure, MSDesGATnet, for hyperspectral classification. The innovation of MSDesGATnet resides in the fusion of graph attention mechanisms and multi-scale dense networks, with networks at different scales being interconnected and densely connected. The proposed method has demonstrated superior accuracy compared to numerous state-of-the-art methods. In future research, we aim to incorporate multi-scale superpixel segmentation into the end-to-end network. Moreover, it’s feasible to select suitable graph spaces $G$ based on the resolution of ground features, perform feature enhancement on the corresponding outputs $X (G | \dots)$ , or strengthen the weights in the final feature fusion process SSConv to further enhance classification accuracy. Most importantly, our framework enables fellow researchers to utilize our framework when exploring graph node feature fusion algorithms with new datasets, eliminating the need for additional time-consuming searches for original datasets with compatible resolutions.

Additional information

Funding

This work was supported by the National Natural Science Foundation of China (61901471, 61421001), the Project of Construction and Support for high-level Innovative Teams of Beijing Municipal Institutions (BPHR20220123), College Students’ Innovative Entrepreneurial Training Plan Program of China (202211232031) and Beijing Natural Science Foundation (4214072).

References

Amini, S., Homayouni, S., and Safari, A. 2014. “Semi-supervised classification of hyperspectral image using random forest algorithm.” In 2014 IEEE Geoscience and Remote Sensing Symposium, Quebec City, Canada, July 2014.
Google Scholar
Audebert, N., Le Saux, B., and Lefevre, S. 2019. “Deep learning for classification of hyperspectral data: A comparative review.” IEEE Geoscience and Remote Sensing Magazine, Vol. 7(No. 2): pp. 1–24. doi:10.1109/MGRS.2019.2912563.
Web of Science ®Google Scholar
Aydemir, M.S., and Bilgin, G. 2019. “Semisupervised hyperspectral image classification using deep features.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol. 12(No. 9): pp. 3615–3622. doi:10.1109/JSTARS.2019.2921033.
Web of Science ®Google Scholar
Brody, S., Alon, U., and Yahav, E. 2021. “How attentive are graph attention networks?.” arXiv preprint, 31 Jan 2022, https://arxiv.org/abs/2105.14491.
Google Scholar
Camps-Valls, G., and Bruzzone, L. 2005. “Kernel-based methods for hyperspectral image classification.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 43(No. 6): pp. 1351–1362. doi:10.1109/TGRS.2005.846154.
Web of Science ®Google Scholar
Cao, J., and Wang, B. 2017. “Embedding learning on spectral–spatial graph for semisupervised hyperspectral image classification.” IEEE Geoscience and Remote Sensing Letters, Vol. 14(No. 10): pp. 1805–1809. doi:10.1109/LGRS.2017.2737020.
Web of Science ®Google Scholar
Chen, Z., Wu, G., Gao, H., Ding, Y., Hong, D., and Zhang, B. 2023. “Local aggregation and global attention network for hyperspectral image classification with spectral-induced aligned superpixel segmentation.” Expert Systems with Applications, Vol. 232: pp. 120828. doi:10.1016/j.eswa.2023.120828.
Web of Science ®Google Scholar
Datta, D., Mallick, P.K., Gupta, D., and Chae, G.-S. 2022. “Hyperspectral image classification based on novel hybridization of spatial-spectral-superpixelwise principal component analysis and dense 2D-3D convolutional neural network fusion architecture.” Canadian Journal of Remote Sensing, Vol. 48(No. 5): pp. 663–680. doi:10.1080/07038992.2022.2114440.
Web of Science ®Google Scholar
Ding, Y., Zhang, Z., Zhao, X., Hong, D., Cai, W., Yang, N., and Wang, B. 2023. “Multi-scale receptive fields: Graph attention neural network for hyperspectral image classification.” Expert Systems with Applications, Vol. 223: pp. 119858. doi:10.1016/j.eswa.2023.119858.
Web of Science ®Google Scholar
Hong, D., Gao, L., Yao, J., Zhang, B., Plaza, A., and Chanussot, J. 2021. “Graph convolutional networks for hyperspectral image classification.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 59(No. 7): pp. 5966–5978. doi:10.1109/TGRS.2020.3015157.
Web of Science ®Google Scholar
Hong, D., Hu, J., Yao, J., Chanussot, J., and Zhu, X. 2021. “Multimodal remote sensing benchmark datasets for land cover classification with a shared and specific feature learning model.” ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 178: pp. 68–80. doi: j.isprsjprs.2021.05.011 doi:10.1016/j.isprsjprs.2021.05.011.
PubMed Web of Science ®Google Scholar
Hu, H., Yao, M., He, F., and Zhang, F. 2022. “Graph neural network via edge convolution for hyperspectral image classification.” IEEE Geoscience and Remote Sensing Letters, Vol. 19: pp. 1–5. doi:10.1109/LGRS.2021.3108883.
Web of Science ®Google Scholar
Huang, Y., Peng, J., Sun, W., Chen, N., Du, Q., Ning, Y., and Su, H. 2022. “Two-branch attention adversarial domain adaptation network for hyperspectral image classification.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 60: pp. 1–13. doi:10.1109/TGRS.2022.3215677.
Web of Science ®Google Scholar
Kang, X., Li, S., and Benediktsson, J.A. 2014. “Spectral–spatial hyperspectral image classification with edge-preserving filtering.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 52(No. 5): pp. 2666–2677. doi:10.1109/TGRS.2013.2264508.
Web of Science ®Google Scholar
Li, J., Du, Q., Xi, B., and Li, Y. 2018. “Hyperspectral image classification via sample expansion for convolutional neural network.” In IEEE 2018 9th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Amsterdam, Netherlands, September 2018.
Google Scholar
Li, L., Wang, C., Chen, J., and Ma, J. 2017. “Refinement of hyperspectral image classification with segment-tree filtering.” Remote Sensing, Vol. 9(No. 1): pp. 69. doi:10.3390/rs9010069.
Web of Science ®Google Scholar
Li, R., Zheng, S., Duan, C., Yang, Y., and Wang, X. 2020. “Classification of hyperspectral image based on double-branch dual-attention mechanism network.” Remote Sensing, Vol. 12(No. 3): pp. 582. doi:10.3390/rs12030582.
Web of Science ®Google Scholar
Li, W., Liu, J., and Du, Q. 2016. “Sparse and low-rank graph for discriminant analysis of hyperspectral imagery.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 54(No. 7): pp. 4094–4105. doi:10.1109/TGRS.2016.2536685.
Web of Science ®Google Scholar
Li, W., Prasad, S., Fowler, J.E., and Bruce, L.M. 2012. “Locality-preserving dimensionality reduction and classification for hyperspectral image analysis.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 50(No. 4): pp. 1185–1198. doi:10.1109/TGRS.2011.2165957.
Web of Science ®Google Scholar
Liao, J., and Wang, L. 2020. “Multiple spatial features extraction and fusion for hyperspectral images classification.” Canadian Journal of Remote Sensing, Vol. 46(No. 2): pp. 193–213. doi:10.1080/07038992.2020.1768837.
Web of Science ®Google Scholar
Liao, J., Wang, L., Hao, S., and Zhao, G. 2018. “Hyperspectral image classification based on fusion of guided filter and domain transform interpolated convolution filter.” Canadian Journal of Remote Sensing, Vol. 44(No. 5): pp. 476–490. doi:10.1080/07038992.2018.1546571.
Web of Science ®Google Scholar
Liu, J., Fang, L., Shen, H., and Zhou, S. 2022. “A multiscale joint deep neural network for glacier contour extraction.” Canadian Journal of Remote Sensing, Vol. 48(No. 1): pp. 93–106. doi:10.1080/07038992.2021.1986810.
Web of Science ®Google Scholar
Liu, L., Zuo, D., Wang, Y., and Qu, H. 2022. “Feedback-enhanced few-shot transformer learning for small sized hyperspectral image classification.” IEEE Geoscience and Remote Sensing Letters, Vol. 19: pp. 1–5. doi:10.1109/LGRS.2022.3202235.
Google Scholar
Liu, Q., Peng, J., Ning, Y., Chen, N., Sun, W., Du, Q., and Zhou, Y. 2023. “Refined prototypical contrastive learning for few-shot hyperspectral image classification.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 61: pp. 1–14. doi:10.1109/TGRS.2023.3257341.
Google Scholar
Liu, Q., Xiao, L., Yang, J., and Wei, Z. 2021. “CNN-enhanced graph convolutional network with pixel- and superpixel-level feature fusion for hyperspectral image classification.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 59(No. 10): pp. 8657–8671. doi:10.1109/TGRS.2020.3037361.
Web of Science ®Google Scholar
Makantasis, K., Karantzalos, K., Doulamis, A., and Doulamis, N. 2015. “Deep supervised learning for hyperspectral data classification through convolutional neural networks.” In 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, July 2015.
Google Scholar
Mercier, G., and Lennon, M. 2003. “Support vector machines for hyperspectral image classification with spectral-based kernels.” In IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Symposium, Toulouse, France, July 2003.
Google Scholar
Qin, A., Shang, Z., Tian, J., Wang, Y., Zhang, T., and Tang, Y.Y. 2019. “Spectral–spatial graph convolutional networks for semisupervised hyperspectral image classification.” IEEE Geoscience and Remote Sensing Letters, Vol. 16(No. 2): pp. 241–245. doi:10.1109/LGRS.2018.2869563.
Web of Science ®Google Scholar
Roy, S.K., Krishna, G., Dubey, S.R., and Chaudhuri, B.B. 2020. “HybridSN: Exploring 3-D–2-D CNN feature hierarchy for hyperspectral image classification.” IEEE Geoscience and Remote Sensing Letters, Vol. 17(No. 2): pp. 277–281. doi:10.1109/LGRS.2019.2918719.
Web of Science ®Google Scholar
Tarabalka, Y., Fauvel, M., Chanussot, J., and Benediktsson, J.A. 2010. “SVM- and MRF-based method for accurate classification of hyperspectral images.” IEEE Geoscience and Remote Sensing Letters, Vol. 7(No. 4): pp. 736–740. doi:10.1109/LGRS.2010.2047711.
Web of Science ®Google Scholar
Wang, J., Liu, J., Cui, J., Luan, J., and Fu, Y. 2023. “Multiscale fusion network based on global weighting for hyperspectral feature selection.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol. 16: pp. 2977–2991. doi:10.1109/JSTARS.2023.3251442.
Web of Science ®Google Scholar
Yang, Y., Tang, X., Zhang, X., Ma, J., Liu, F., Jia, X., and Jiao, L. 2022. “Semi-supervised multiscale dynamic graph convolution network for hyperspectral image classification.” IEEE Transactions on Neural Networks and Learning Systems. doi:10.1109/TNNLS.2022.3212985.
Web of Science ®Google Scholar
Yao, D., Zhi-Li, Z., Xiao-Feng, Z., Wei, C., Fang, H., Yao-Ming, C., and Cai, W.W. 2023. “Deep hybrid: multi-graph neural network collaboration for hyperspectral image classification.” Defence Technology, Vol. 23: pp. 164–176. doi:10.1016/j.dt.2022.02.007.
Web of Science ®Google Scholar
Yao, J., Cao, X., Hong, D., Wu, X., Meng, D., Chanussot, J., and Xu, Z. 2022. “Semi-active convolutional neural networks for hyperspectral image classification.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 60: pp. 1–15. doi:10.1109/TGRS.2022.3206208.
Google Scholar
Yao, J., Hong, D., Wang, H., Liu, H., and Chanussot, J. 2023. “UCSL: Toward unsupervised common subspace learning for cross-modal image classification.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 61: pp. 1–12. doi:10.1109/TGRS.2023.3282951.
Web of Science ®Google Scholar
Yao, J., Zhang, B., Li, C., Hong, D., and Chanussot, J. 2023. “Extended Vision Transformer (ExViT) for land use and land cover classification: A multimodal deep learning framework.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 61: pp. 1–15. doi:10.1109/TGRS.2023.3284671.
Web of Science ®Google Scholar
Yu, L., Peng, J., Chen, N., Sun, W., and Du, Q. 2023. “Two-branch deeper graph convolutional network for hyperspectral image classification.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 61: pp. 1–14. doi:10.1109/TGRS.2023.3257369.
Google Scholar
Zhang, X., Zheng, Y., Liu, W., and Wang, Z. 2019. “A hyperspectral image classification algorithm based on atrous convolution.” EURASIP Journal on Wireless Communications and Networking, Vol. 2019(No. 1): pp. 1–12. doi:10.1186/s13638-019-1594-y.
Web of Science ®Google Scholar
Zhao, W., Wu, D., and Liu, Y. 2021. “Hyperspectral image classification with multi-scale graph convolution network.” International Journal of Remote Sensing, Vol. 42(No. 21): pp. 8380–8397. doi:10.1080/01431161.2021.1978585.
Web of Science ®Google Scholar
Zhao, Z., Wang, H., and Yu, X. 2022. “Spectral-spatial graph attention network for semisupervised hyperspectral image classification.” IEEE Geoscience and Remote Sensing Letters, Vol. 19: pp. 1–5. doi:10.1109/LGRS.2021.3059509.
Google Scholar
Zheng, C., Wang, N., and Cui, J. 2019. “Hyperspectral image classification with small training sample size using superpixel-guided training sample enlargement.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 57(No. 10): pp. 7307–7316. doi:10.1109/TGRS.2019.2912330.
Web of Science ®Google Scholar
Zhong, Z., Li, J., Luo, Z., and Chapman, M. 2018. “Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 56(No. 2): pp. 847–858. doi:10.1109/TGRS.2017.2755542.
Web of Science ®Google Scholar
Zhou, H., Luo, F., Zhuang, H., Weng, Z., Gong, X., and Lin, Z. 2023. “Attention multi-hop graph and multi-scale convolutional fusion network for hyperspectral image classification.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 61: pp. 1–14. doi:10.1109/TGRS.2023.3265879.
Google Scholar
Zhang, Y., Xu, S., Hong, D., Gao, H., Zhang, C., Bi, M., and Li, C. 2023. “Multimodal transformer network for hyperspectral and LiDAR classification.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 61: pp. 1–17. doi:10.1109/TGRS.2023.3283508.
Google Scholar

Multi-Scale Dense Graph Attention Network for Hyperspectral Classification

Réseau d’attention dense à plusieurs échelles pour la classification hyperspectrale

Abstract

RÉSUMÉ

Introduction

Relevant methods

Semi-LFDA algorithm

SLIC algorithm

GATv2 algorithm

Proposed MDesGATnet framework

Graph node feature alignment modules

Multiple graphs dense connection network

Spectral-spatial convolution module

Exponential feature maps under the permutation graph space

Table 1. Pseudo code of MSDesGATnet.

Experiments and analysis

Dataset description

IP dataset

KSC dataset

PU dataset

WHU-Hi-HongHu dataset

Experimental results and quantitative discussion

Experimental settings

Table 2. The hyperparameter settings of MSDesGAT.

Cross-validation and comparison methods

The quantitative and visual results of IP

Table 3. OAs, AAs and Kappas obtained by all methods in IP. The bolded value in each row indicates the best performance.

The quantitative and visual results of KSC

Table 4. OAs, AAs and Kappas obtained by all methods in KSC. The bolded value in each row indicates the best performance.

The quantitative and visual results of PU

Table 5. OAs, AAs and Kappas obtained by all methods in PU. The bolded value in each row indicates the best performance.

The quantitative and visual results of WHU-Hi-HongHu

Table 6. OAs, AAs and Kappas achieved by all methods in WHU-Hi-Honghu. The bolded value in each row indicates the best performance.

Ablation study

Table 7. Ablation Study Results on IP. The bolded value in each row indicates the best performance.

Table 8. Ablation Study Results on KSC. The bolded value in each row indicates the best performance.

Table 9. Ablation Study Results on PU. The bolded value in each row indicates the best performance.

Table 10. Ablation Study Results on HongHu. The bolded value in each row indicates the best performance.

Multi-scale sensitivity analysis

Running time

Table 11. Running time (seconds) of MSDesGATnet in IP PU KSC dataset.

Conclusion

Additional information

Funding

Unknown widget #5d0ef076-e0a7-421c-8315-2b007028953f

of type scholix-links

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature