Glossary

Key terms, methods, and concepts from the thesis on Machine Learning for REIMS Marine Biomass Analysis.

A

Autobots

Autobots (Multi-scale Ensemble Transformer)

A stacked voting ensemble of multi-scale Transformers introduced in this thesis — the name is a pun on the Autobots, a team of diverse Transformers. Three independent Transformer models with 2, 4, and 8 layers/heads respectively act as level-0 base classifiers analyzing the spectrum at low, medium, and high resolution. Their outputs are fed into a learned weighted combination meta-model (level-1) that optimally combines each model's predictions. Achieves 74.13% accuracy on fish body part identification.

Novel Contribution

B

Batch Detection

The task of determining whether two fish samples originate from the same processing batch. Enables rapid recalls if contamination is discovered. Formulated as a pairwise classification task using contrastive learning in this thesis.

Analysis Task

BCA

Balanced Classification Accuracy

An accuracy metric that averages per-class recall, compensating for class imbalance. Used as the primary evaluation metric in this thesis to fairly compare models across unbalanced datasets.

Method

BERT

Bidirectional Encoder Representations from Transformers

A pre-training language model using masked language modeling. Its masked-token objective inspired the Masked Spectra Modelling (MSM) technique in this thesis.

Model

Body Part Identification

Fish Body Part Identification

Classifying which anatomical body part a fish sample originates from (e.g., fillet, frame, offal). A multi-class problem where Ensemble Transformer achieves 74.13% accuracy vs. 51.17% for OPLS-DA.

Analysis Task

C

Chemical Fingerprint

The characteristic pattern of molecular ions detected in a mass spectrum, unique to a given biological sample. REIMS produces chemical fingerprints of tissue that encode species, body part, and contamination information.

Instrumentation

Contrastive Learning

A self-supervised learning paradigm that trains models to pull representations of similar samples together and push dissimilar samples apart. Used in SpectroSim for batch traceability without requiring labeled batch data.

Method

Cross-species Adulteration

Food fraud where a high-value fish species (Hoki) is mixed with a cheaper species (Mackerel). Formulated as a 3-class problem: pure Hoki, pure Mackerel, or a 50/50 mixture. Deep learning achieves 91.97% accuracy vs. 79.96% for OPLS-DA.

Analysis Task

E

Ensemble Transformer

An architecture combining multiple Transformer models trained with different initializations or configurations. Achieves 74.13% body part identification accuracy by aggregating predictions from multiple specialized models.

Model

F

Food Fraud

Deliberate adulteration, mislabeling, or misrepresentation of food products for economic gain. Seafood fraud is estimated to affect 30% of commercially sold seafood globally, driving the need for rapid verification tools like REIMS.

Domain

G

Gone Phishing

Gone Phishing (MoE Transformer)

A novel Mixture of Experts (MoE) Transformer architecture introduced in this thesis for REIMS-based fish fraud detection — the name is a pun on fish and phishing. It replaces the standard feed-forward networks inside each Transformer encoder block with MoE layers: a learned gating mechanism routes each input token to the Top-k most relevant expert sub-networks, whose outputs are combined by a weighted sum. This allows the model to scale capacity without proportionally increasing compute. Achieves 100% accuracy on fish species identification.

Novel Contribution

Grad-CAM

Gradient-weighted Class Activation Mapping

An explainability technique that uses gradients flowing into the final convolutional or attention layer to produce a saliency map. Applied to Transformer models in this thesis to visualize spectral regions important for classification.

Method

H

Hoki

Hoki (Macruronus novaezelandiae)

A deep-sea fish species native to New Zealand waters, used as the high-value species in cross-species adulteration experiments. New Zealand is the world's largest exporter of Hoki, making it a target for seafood fraud.

Domain

I

IUU Fishing

Illegal, Unreported and Unregulated Fishing

Fishing activities that contravene national and international laws, including fishing without authorization, under-reporting catches, and operating in restricted areas. REIMS-based species identification can help verify the provenance of seafood products.

Domain

K

KNN

K-Nearest Neighbours

A non-parametric classification algorithm that assigns a label based on the majority class among the k nearest training samples. Used as a baseline classifier in this thesis.

Model

L

LIME

Local Interpretable Model-agnostic Explanations

An explainability technique that approximates any black-box model locally with an interpretable surrogate. Applied to REIMS models in this thesis to identify which m/z features most influence individual predictions.

Method

M

m/z

Mass-to-Charge Ratio

The x-axis of a mass spectrum, representing the ratio of an ion's mass to its charge. The REIMS dataset spans 2,080 distinct m/z features from approximately 77.04 to 999.32 m/z, each corresponding to a different molecular ion.

Instrumentation

Mackerel

Mackerel (Scomber japonicus)

A pelagic fish species used as the adulterant in cross-species adulteration experiments. Less expensive than Hoki but with a similar appearance when processed, making it a common seafood fraud target.

Domain

MAE

Mean Absolute Error

The average absolute difference between predicted and true values. Used alongside BCA for ordinal classification tasks (oil contamination, adulteration) where the magnitude of classification error matters.

Method

Marine Biomass

The total mass of all living marine organisms within a given area or ecosystem. In the context of this thesis, refers to the biological material (fish and shellfish) analyzed using REIMS for food quality and traceability applications.

Domain

MoE

Mixture of Experts

A neural network architecture where multiple specialized sub-networks (experts) process different inputs, gated by a learned router. The MoE Transformer achieves 100% species identification accuracy by routing different spectral regions to specialized expert networks.

Model

MSM

Masked Spectra Modelling

A novel self-supervised pre-training technique introduced in this thesis, adapting BERT's masked language modeling to sequential REIMS data. Random m/z features are masked and the model learns to reconstruct them, providing a useful initialization for downstream tasks.

Novel Contribution

N

Negative Ionization Mode

A mass spectrometry acquisition mode in which negatively charged ions are detected. REIMS in negative ionization mode primarily detects lipid-related compounds (fatty acids, phospholipids) that form the chemical fingerprint of fish tissue.

Instrumentation

O

Oil Contamination

Oil Contamination Detection

Detection and quantification of oil (e.g., engine or processing equipment oil) introduced into fish samples. Formulated as a 7-class ordinal classification problem with oil concentrations from 0% to 50% in 10% increments.

Analysis Task

OPLS-DA

Orthogonal Partial Least Squares Discriminant Analysis

A supervised chemometrics classification method that separates variation correlated with class labels (predictive) from orthogonal variation. Used as the primary baseline in this thesis, achieving up to 96% accuracy on species identification.

Method

Ordinal Classification

Classification where the target classes have a natural ordering (e.g., 0%, 10%, 20% oil concentration). Standard classification ignores this ordering; ordinal methods exploit it. Used for oil contamination detection in this thesis.

Method

P

PCA

Principal Component Analysis

A linear dimensionality reduction technique that projects data onto axes of maximum variance. Used for visualization and as a preprocessing step, but outperformed by deep learning methods on REIMS classification tasks.

Method

R

REIMS

Rapid Evaporative Ionization Mass Spectrometry

A direct-to-analysis technique that allows near-instantaneous chemical analysis of a sample with minimal to no preparation. A heated blade vaporizes tissue, and the resulting aerosol is directed into a mass spectrometer, producing a chemical fingerprint in seconds.

Instrumentation

S

Self-supervised Learning

A machine learning paradigm that generates supervisory signals from the data itself, without human-labeled annotations. Used in this thesis for MSM pre-training and SpectroSim contrastive learning on REIMS spectra.

Method

SimCLR

Simple Contrastive Learning of Representations

A contrastive learning framework by Chen et al. (2020) that learns representations by maximizing agreement between differently augmented views of the same sample. SpectroSim adapts this framework for mass spectra.

Method

Species Identification

Fish Species Identification

Classifying a REIMS spectrum to determine the fish species. A 2-class problem (Hoki vs. Mackerel) in this thesis, where MoE Transformer achieves 100% accuracy vs. 96.39% for OPLS-DA.

Analysis Task

Species Substitution

A form of food fraud where a premium fish species is replaced with a cheaper alternative. Detected in this thesis using REIMS-based classification, achieving up to 100% accuracy with Transformer models.

Domain

SpectroSim

A novel contrastive learning framework introduced in this thesis for label-free batch traceability. Uses a Transformer encoder within the SimCLR framework to learn pairwise similarity between mass spectra, achieving 70.8% batch detection accuracy without labeled data.

Novel Contribution

SVM

Support Vector Machine

A supervised learning algorithm that finds the optimal hyperplane separating classes in a high-dimensional feature space. Used as a baseline classifier in this thesis.

Model

T

TIC

Total Ion Current

The sum of all detected ion intensities across all m/z values in a single mass spectrum. TIC normalization divides each feature by the total ion current to remove inter-sample variation in overall signal intensity.

Instrumentation

Transfer Learning

A machine learning approach where a model pre-trained on one task or dataset is fine-tuned on a different but related task. In this thesis, models trained on species identification are adapted to oil contamination detection, improving accuracy by up to 22.67%.

Method

Transformer

Transformer Neural Network

A deep learning architecture based on self-attention mechanisms, introduced by Vaswani et al. (2017). Processes all elements of a sequence in parallel, capturing long-range dependencies. Applied to REIMS spectra as sequences of m/z values in this thesis.

Model