Plankton Classification on Imbalanced Dataset via Hybrid Resample Method with LightBGM

Yiran Liu, Xu Qiao, Rui Gao

2020 International Conference on Image, Vision and Computing Cited 3 times

Abstract

Plankton monitoring plays an essential role in marine ecological environment protection, effective identification of its species and quantity can assess the health of the marine ecosystem. Thus, it is valuable to build an automatic classification system for plankton. However, the data of plankton naturally exhibit an imbalance in their class distribution. As a result, we need to take the class-imbalance problem into account for plankton classification. In this paper, we propose a classification model based on a hybrid resample method with LightBGM classifier. Our hybrid resample method combines borderline-SMOTE oversampling and Fuzzy C-means cluster-based undersampling (BSFCM), which is available for handling both within-class and between-class imbalance. In addition, to eliminate the irrelevant factors, dataset preprocessing and feature dimension reduction are employed for the in situ plankton images. The F1-measure and G-mean are used as the evaluation criterion to assess the classification performance. The experimental results show that our BSFCM method using LightBGM classifier is superior to the compared benchmark methods, and achieves good performance on the imbalanced plankton dataset.

BibTeX
@inproceedings{Liu2020ShapeBased,
  author = {Liu, Zonghua and Watson, John},
  booktitle = {Global Oceans 2020: Singapore – U.S. Gulf Coast},
  title = {Shape-based Image Classification and Identification System for Digital Holograms of Marine Particles and Plankton},
  year = {2020},
  volume = {},
  number = {},
  pages = {1-5},
  abstract = {shape-based image analysis system has been developed for classifying and identifying holographic images of plankton and other microscopic marine organisms. The system consists of two parts: imCLASS and imIDENT. imCLASS enables the user to rapidly prepare training data, and then the prepared data can be used to train imIDENT such that it has the ability to automatically classify and identify new images. In our tests, imCLASS took 65.6 s to process 298 raw images, and then 20.1 s to extract and process the features of the images. The processing time of classification of the training data was roughly 17 min. Afterwards the classified training data were used to train imIDENT. This procedure took 49.0 s. After it is trained, imIDENT was able to identify 95 test images with 85.3% accuracy in 41.9 s.},
  keywords = {Oceans;Microscopy;Training data;Feature extraction;Organisms;Image classification;digital holography;image processing;particle classification and identification;machine learning},
  doi = {10.1109/IEEECONF38699.2020.9389156},
  issn = {0197-7385},
  month = {Oct},
}