Latent Dirichlet Transformer VAE for Hyperspectral Unmixing with Bundled Endmembers


Giancarlo Giannetti
Visual Computing Lab
Faculty of Science
University of Ontario Institute of Technology
2000 Simcoe St. N., Oshawa ON L1G 0C5

Introduction

This work proposes a deep generative model and two ``stepping ston’’ models for hyperspectral unmixing that integrate transformer architectures with variational autoencoders. The proposed approach leverages self-attention mechanisms to capture global spectral–spatial dependencies within image patches, while a Dirichlet latent distribution enforces the physical constraints of abundance estimation. Building on this foundation, we introduce the Latent Dirichlet Transformer VAE (LDVAE-T), which models endmembers as distributions rather than fixed spectra, enabling the representation of intrinsic spectral variability through learned mean and covariance structures. We evaluate the proposed models on multiple benchmark datasets, including Samson, Jasper Ridge, and HYDICE Urban, and compare their performance against SOTA methods. Experimental results demonstrate improved accuracy in both abundance estimation and endmember extraction. Overall, this work presents a physically grounded and flexible framework for hyperspectral unmixing that effectively captures both global structure and material variability.

Model Architecture

Transformer based encoder constructs a latent encoding and concentration parameters for the Dirichlet distribution that represents abundances for the center pixel. The decoder takes the latent representation and abundances, sampled from the Dirichlet distribution, and predicts mean and covariances for each each endmember (modelled as a Gaussian distribution) and combine these with the abundances to reconstruct the center pixel.

Abundance Maps

Abundance maps for Hydice Urban Dataset.

Abundance maps for Jasper Ridge Dataset.

Abundance maps for Samson Dataset.

(So called) Pure Pixels

The need for modeling endmembers variability. These plots showcase the endmembers extracted from pure pixels, i.e., pixels that are dominated by a single endmember. Note how even within a single image, endmembers exhibit variation. This suggests that we need techniques and methods that explicitly model for variability in endmembers when performing pixel unmixing.


Pure pixels in Hydice Urban Datasets.

Pure pixels in Jasper Datasets.

Pure pixels in Samson Datasets.

Publications

For technical details please look at the following publications