PlanktonTNet: Rethinking Plankton Classification from a Global View with Swin-Transformer

Dekun Yuan, Yanping Qi, Jie Zhang, Zhongwei Li

2025 OCEANS 2025 Brest Cited 0 times

Abstract

Plankton, which includes both phytoplankton and zooplankton, represents a crucial ecological group in marine ecosystems. Its shape varies and is versatile in posture, especially zooplankton. Plankton can be recognized from local or global perspectives. In this paper, we propose the Plankton Transformer Network (PlanktonTNet), which identifies plankton by extracting both local and global features. PlanktonTNet incorporates a Local Features Extraction Module (LFEM) and a Global Features Extraction Module (GFEM) to extract features using CNNs and the Swin Transformer, respectively. Additionally, a Feature Fusion Module (FFM) is employed to combine these features. Among CNN-based models, ResNet-34 achieved the best performance. PlanktonTNet, however, achieves a top-1 accuracy of 97.90% on the Kaggle7 test dataset, a 2.10% improvement over ResNet-34, and a top-1 accuracy of 97.00% on the ZooScanNet6 test dataset, showing a 1.04% gain over ResNet-34. Experimental results on the Kaggle7 and ZooScanNet6 datasets demonstrate that PlanktonTNet significantly outperforms existing methods.

BibTeX

@inproceedings{Yuan2025PlanktonTNet,
  author = {Yuan, Dekun and Qi, Yanping and Zhang, Jie and Li, Zhongwei},
  booktitle = {OCEANS 2025 Brest},
  title = {PlanktonTNet: Rethinking Plankton Classification from a Global View with Swin-Transformer},
  year = {2025},
  volume = {},
  number = {},
  pages = {1-5},
  abstract = {Plankton, which includes both phytoplankton and zooplankton, represents a crucial ecological group in marine ecosystems. Its shape varies and is versatile in posture, especially zooplankton. Plankton can be recognized from local or global perspectives. In this paper, we propose the Plankton Transformer Network (PlanktonTNet), which identifies plankton by extracting both local and global features. PlanktonTNet incorporates a Local Features Extraction Module (LFEM) and a Global Features Extraction Module (GFEM) to extract features using CNNs and the Swin Transformer, respectively. Additionally, a Feature Fusion Module (FFM) is employed to combine these features. Among CNN-based models, ResNet-34 achieved the best performance. PlanktonTNet, however, achieves a top-1 accuracy of 97.90% on the Kaggle7 test dataset, a 2.10% improvement over ResNet-34, and a top-1 accuracy of 97.00% on the ZooScanNet6 test dataset, showing a 1.04% gain over ResNet-34. Experimental results on the Kaggle7 and ZooScanNet6 datasets demonstrate that PlanktonTNet significantly outperforms existing methods.},
  keywords = {Accuracy;Shape;Plankton;Biological system modeling;Oceans;Feature extraction;Transformers;Phytoplankton;Marine ecosystems;Zooplankton;Plankton;Classification;Swin-Transformer;Deep Learning;Convolutional Neural Networks},
  doi = {10.1109/OCEANS58557.2025.11104797},
  issn = {},
  month = {June},
}