Chapter 1

Introduction

~24 min read · 60 references

Introduction to the thesis. For methodology, see Chapter 3: Datasets and Processing.

Section provides a comprehensive overview of the entire thesis. It introduces the global seafood industry and its challenges, e.g., species mislabeling, cross-species adulteration, oil contamination, and batch traceability — all of which undermine consumer trust in seafood products. The chapter highlights the limitations of traditional analytical methods and posits Rapid Evaporative Ionization Mass Spectrometry (REIMS) combined with Machine Learning (ML) as a promising solution. In addition, the chapter outlines the major contributions of this thesis. Then it outlines the primary objectives of this research, including species and body part identification, oil contamination and cross-species adulteration detection, and contrastive learning for batch detection.

Introduction

Fish are a majestic and bountiful natural resource, serving as both an integral part of our marine ecosystems and food on our tables [Food, 2024]. The terminology marine biomass, frequently used throughout this thesis, often refers to the collection of these aquatic creatures. Fishing, when humans harvest these natural resources [Jennings, 2001], strives for greater efficiency, sustainability, and accurate monitoring, due to the diversity of fish species [Co-operation, 2021], which makes evaluating fish stocks and the quality of harvested fish not a straightforward process. Following the catch, fish processing encompasses critical post-catch procedures such as sorting, grading, and packaging [Fellows, 2017]. This chain transforms the raw catch into consumable products and is an area where modern advancements can offer transformative improvements.

Quality Assurance (QA) is essential to ensure that seafood products meet food safety requirements and customer expectations [Montgomery, 2019]. QA employs various tools for quality assurance in a systematic manner, e.g., tools ranging from simple checklists and manual inspection aids to sophisticated analytical techniques. One such advanced tool for quality assurance is Rapid Evaporative Ionization Mass Spectrometry (REIMS) [Schafer, 2009][Cafarella, 2024], which allows for rapid chemical fingerprints of samples, providing detailed information that can be crucial for assessing fish species [Black, 2017][Shen, 2020][Shen, 2022], body parts, contaminants such as oil or cross-species adulteration [Premanandh, 2013], or tracing food origins [Lu, 2024][Gao, 2025][Gkarane, 2025]. This data-rich environment creates a clear need for Automated QA, where intelligent systems are required to interpret complex chemical fingerprints, execute tests, and flag anomalies, thereby increasing efficiency and reliability in fish processing industrial settings [Xing, 2016].

One such advanced tool, which integrates data from tools like REIMS, can be enhanced greatly by Machine Learning (ML), a field of artificial intelligence (AI) that enables computers to learn from data without being explicitly programmed [Russell, 2016]. ML algorithms identify patterns and make predictions.

Problem Statement

The global seafood industry, which plays an important part in the food supply chains and world economy, faces troublesome challenges from fraudulent practices. Examples such as the mislabeling of products, waste utilization of byproducts, batch traceability, adulteration with lower-value products, and other forms of contamination pose significant food safety hazards and undermine consumer trust in seafood products by deceiving customers. Recent meta-analyses show this problem is current and widespread, with mislabeling rates reported as high as 39.1% in the US [Ahles, 2025] and 24.8% in the Eastern South Pacific [Mar' n, 2025]. The current state-of-the-art approach to seafood quality control involves traditional analytical methods that involve time-consuming laboratory procedures and considerable domain expertise - necessitating the demand for rapid, accurate, and automated solutions that can be deployed efficiently within a factory setting.

These automated solutions must perform several crucial quality control tasks to ensure the efficient, safe, and effective monitoring of fish processing factories, including fish species and body part identification, oil contamination and cross-species adulteration detection, and batch traceability. For example, species identification prevents species substitution fraud [Australia, 2016], body part classification facilitates efficient waste utilization by enabling the re-use of byproducts [Ghaly, 2013], cross-species adulteration and oil contamination detection directly impacts food safety standards [Premanandh, 2013], and batch traceability allows for quick recalls if contamination occurs [Mai, 2010].

Motivations

Marine Biomass

Marine biomass is the total mass of all living marine organisms within a given area or ecosystem. Marine biomass is subject to mislabeling, species substitution, cross-species adulteration, and oil contamination. Studies have revealed an alarming amount of mislabeling within the global seafood industry [Pardo, 2016], a trend confirmed by recent large-scale meta-analyses in 2025 [Ahles, 2025][Mar' n, 2025]. This fraud can be complex, with recent studies uncovering hidden mammal and avian species (pork, chicken) in processed fish balls, posing serious ethical and religious consumer challenges [Zhang, 2025]. Whether these species substitutions are intentional and fraudulent, or systematic and prevalent, accurate methods for detecting fish species and cross-species adulteration within seafood products are essential for guaranteeing food authenticity in quality control in fish processing, especially in high-consumption regions like Asia [Do, 2025]. One example of fraudulent species substitution was a restaurant that was found serving Vietnamese Catfish under the pseudonym Australian Dory [Australia, 2016]. Less than half of a fish can be processed into fillets, leaving the byproduct to be repurposed into products such as fertiliser, fish meal, or omega-3 concentrates. One such concentrate is omega-3 supplements [Simopoulos, 2011], which are an essential, but often lacking, ingredient in Western diets [FAO, 2020]. Methods for differentiating between fish body parts in marine biomass byproducts are crucial for maximizing waste utilization in fish processing [Stevens, 2018].

Contamination — contamination is the presence of any harmful or undesirable foreign substance — such as chemicals, pollutants, or other species—that renders the biomass impure, unsafe for consumption, or otherwise unfit for its intended use — whether it be oil pollutants or cross-species adulteration, presents another challenge. Adulteration is where a high-value product is mixed with a cheaper species or body part (e.g., offcuts/offals), which can undermine the integrity of the product and pose health risks [Black, 2019]. Again, this emphasises the importance of biomass species and body part identification systems. One such famous example would be the European Horse Meat Scandal of 2013 [Premanandh, 2013], where beef mince was fraudulently and intentionally mixed with horse meat and served to customers under the label prime beef. Oil contamination, another harmful contaminant, can be introduced from engine oil from the boats, or processing equipment in the factory [Moens, 2003]. It is pivotal that effective systems for contamination detection — be it oil pollutants or cross-species adulteration — are developed for fish processing, to catch contaminants early, well before they make it to your plate.

Not only do we need methods for contamination detection, but also batch detection, which we define as the task of using REIMS to determine if two fish samples originate from the same processing batch, allowing for quick recalls if contamination occurs [Mai, 2010]. This capability is critical for combating Illegal, Unreported, and Unregulated (IUU) fishing [Helyar, 2014] and enabling modern, transparent supply chains [Turkson, 2025]. The current state-of-the-art is rapidly moving toward digital traceability systems based on Blockchain [Dahariya, 2025], Digital Product Passports [Jiang, 2025], and mobile data-entry platforms [Untal, 2025][Gastaldi Garcia, 2025]. However, these systems are vulnerable to fraudulent data entry and face significant adoption barriers, such as cost and infrastructure [Untal, 2025]. This creates a critical need for an alternative, intrinsic verification method that can validate a sample's identity analytically, complementing these digital systems rather than relying on vulnerable physical tags like RFID chips [Mai, 2010].

Rapid Evaporative Ionization Mass Spectrometry

REIMS [Schafer, 2009][Cafarella, 2024] is a direct-to-analysis technique that allows for the near-instantaneous chemical analysis of a sample with minimal to no preparation [Henderson, 2025][Pruekprasert, 2025]. REIMS has the potential to revolutionize the field of seafood quality control [Cafarella, 2024][Black, 2017] by providing detailed molecular profiles that can differentiate between fish species [Black, 2017][Shen, 2020][Shen, 2022], fish body parts [Black, 2019], or trace food origins [Gao, 2025][Gkarane, 2025][Lu, 2024]. It can also be used to identify contaminants, detect adulterations [Premanandh, 2013], and, as this thesis proposes, perform batch traceability [Mai, 2010]. The nature of REIMS data presents challenges; firstly, REIMS sample preparation requires significant domain expertise and resources, resulting in limited training data. Secondly, the limited training data has many features, e.g., 2,080 to be exact. These first two challenges, coupled together, lead to a high-dimensional dataset that suffers from the curse of dimensionality [Koppen, 2000].

A critical point is that a single measured mass-to-charge (\(\mathbf{m/z}\)) feature reflects an ion (such as a deprotonated molecule or adduct), which may represent multiple isomers or compounds. Because REIMS typically does not use mass fragmentation, the identification of the underlying chemical compounds is tentative, relying only on high-resolution mass measurements to narrow down possibilities. This data is best described as a complex chemical fingerprint or molecular profile rather than a definitive list of identified compounds.

In addition, REIMS data, due to the nature of sample preparation, instrumentation, and environmental factors, has noise inherent in the data. Traditional supervised techniques for analysis, such as PCA [Schafer, 2009], PCA-LDA [Balog, 2010][Balog, 2013][Balog, 2016], and OPLS-DA [Bylesjo, 2006][Boccard, 2013]—which remain the standard in many recent studies [Black, 2017][Black, 2019][Verplanken, 2017][Gkarane, 2025][Shen, 2020][Shen, 2022]—struggle to discern the signal from the noise and capture meaningful patterns in the data. Noise is such a prevalent issue that practitioners often discard principal components with relative standard deviations above a certain threshold outright, a practice seen in [Black, 2017][Black, 2019].

Finally, techniques like OPLS-DA are not capable of capturing the complex sequential patterns and feature interactions present in mass spectra. This has created an active debate in recent literature [Xue, 2025]. Some studies find that traditional chemometrics remain more robust than conventional machine learning (ML), like Random Forest or SVM for large-scale industrial tasks [De Graeve, 2023]. Other studies find conventional ML models like K-Nearest Neighbors (KNN) are superior for new tasks like geographical authentication [Lu, 2024]. This indicates a clear need for more sophisticated machine learning techniques to be evaluated. This need is reinforced by the very latest studies, which show that deep learning methods like Artificial Neural Networks (ANNs) can outperform traditional OPLS-DA by better modeling the complex, non-linear relationships in REIMS data [Cardoso, 2025].

Deep Learning

Deep learning for REIMS data analysis often refers to deep neural network-based methods that learn hierarchical feature representations from raw data [Goodfellow, 2016]. While traditional methods like Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA) [Bylesjo, 2006] have been the standard for REIMS biomass analysis, their capabilities are limited when faced with the inherent challenges of the data. REIMS mass spectra are high-dimensional, inherently sequential, and often generated from a limited number of prepared samples, creating significant hurdles for traditional machine learning. To overcome these limitations, this thesis turns to deep learning, a collection of methods that learn hierarchical feature representations directly from raw data. The following sections outline the key challenges of REIMS data and motivate the specific deep learning techniques chosen to address them.

First, a primary challenge is capturing the complex, sequential patterns present in mass spectra at multiple scales, from broad, low-resolution patterns to fine-grained, high-resolution details. To address the sequential nature and long-range dependencies in the data, this thesis employs Transformers [Vaswani, 2017], as their self-attention mechanism is uniquely suited to weigh the importance of all features in a sequence. To further enhance this capability, a multi-scale Ensemble Transformer [Wolpert, 1992] was developed. This architecture combines shallow, medium, and deep Transformers, allowing it to simultaneously analyze broad, low-resolution patterns and fine-grained, high-resolution details within the spectra. This multi-scale approach creates a more robust and comprehensive model than a single architecture could achieve.

Second, the analysis is constrained by the scarcity of labeled data and the need to leverage knowledge across different but related tasks. The preparation of REIMS samples is resource-intensive, resulting in limited training datasets. This is mitigated with Unsupervised Pre-training [Devlin, 2018], a technique where a model first learns the fundamental structure of mass spectra from large amounts of unlabeled data. This process provides a powerful foundation for fine-tuning on smaller, labeled datasets for specific downstream tasks. To further leverage existing knowledge, Transfer Learning [Bozinovski, 1976][Tan, 2018] has also been explored, which allows insights gained from a source task to be applied to a related target task, enabling the model to build on previous learning.

Third, as analytical tasks become more complex, there is a need for architectures that can scale in capacity and solve novel problems. Mixture of Experts (MoE) Transformers [Jacobs, 1991][Kaiser, 2017] address the issue of scalability by allowing a model’s parameters to increase efficiently; this is achieved by routing inputs to specialized expert sub-networks, which increases model capacity without a proportional rise in computational cost. For novel challenges like batch traceability, which requires a pairwise comparison, Contrastive Learning [Chen, 2020] offers a more suitable framework than standard classification. The proposed "SpectroSim" method uses this approach to learn an embedding space where similar samples are grouped, enabling accurate pairwise comparison without relying on explicit class labels.

Finally, for these advanced models to be deployed in real-world settings, a crucial challenge is ensuring their interpretability and trustworthiness. A persistent limitation of deep learning models is their black-box nature, which can hinder adoption by domain experts who need to verify the reasoning behind a decision. This challenge has been explicitly identified in the latest food science literature applying ANNs to REIMS data [Cardoso, 2025]. To ensure the models developed in this thesis are verifiable, post-hoc explainability methods like LIME [Ribeiro, 2016] and Grad-CAM [Selvaraju, 2017] are employed. These techniques provide crucial insights into which \(m/z\) features are driving classification decisions, effectively opening the black box and bridging the gap between complex models and practical, real-world applications by providing chemically relevant hypotheses for domain experts to verify.

Tasks

This thesis addresses three tasks in REIMS-based marine biomass analysis, with a specific focus on species critical to the New Zealand seafood industry. The selected species, Hoki and Mackerel, represent two major commercial fisheries. Hoki, in particular, is New Zealand's largest and most valuable fishery [Industries, 2024], making its accurate identification and quality control a top national priority. These tasks are correlated through their shared objective, data source, and analytical methods used to solve them, as outlined in this thesis. All three tasks directly address major challenges in the seafood industry, such as fraud, waste utilization, safety, and traceability. The three tasks all rely on data generated by the same core technology, REIMS. This thesis systematically explores a set of advanced machine learning models to solve these problems. In particular, those three tasks are:

Fish Species and Body Part Identification: Firstly, a binary classification task between two species of fish:
Hoki and Mackerel. Secondly, a multi-class classification task between seven body parts of fish: head, fillet, frame, skin, liver, guts, and gonads. The first two tasks are related to species substitution (fish species identification) and waste utilization of byproducts (fish body parts identification).

Oil Contamination and Cross-Species Adulteration Detection: These are two ordinal multi-class classification tasks. Firstly, one where there are 7 different concentrations of oil-contaminated fish ranging from 50% to 0%. Secondly, there are three classes of cross-species adulterated fish, including pure Hoki, pure Mackerel, and 50-50 mixed Hoki-Mackerel. Both of these tasks are related to contamination detection to (obviously) prevent oil contamination and cross-species adulteration in fish supply chains.
Batch Detection: This is a pair-wise comparison task, where given two fish, we wish to detect if they originate from the same (or different) batches. A batch is a group in which fish were processed during the REIMS-based sample analysis. Batch detection is a new method of batch traceability, which allows for quick and accurate recalls of new contaminants in products, should they occur.

Furthermore, these tasks are interconnected problems in real-world settings. In a fish processing plant, these are not isolated issues. An adulterated product (Task 2) is also a mislabeled one (Task 1); for example, a product sold as ``Pure Hoki" that is found to be adulterated with Mackerel (Task 2) is by definition also mislabeled (Task 1). If contamination (Task 2) is found, batch detection (Task 3) is required to execute an effective recall. Distinguishing body parts (Task 1) is essential for detecting certain kinds of adulteration, such as mixing high-value fillets with cheaper offal (Task 2). Together, these tasks form an automated, holistic approach to solving quality control challenges in marine biomass analysis.

Research Goals

The overall goal of this thesis is to develop and validate a suite of machine learning methods to enhance the analytical capabilities of REIMS-based marine biomass analysis, targeting rapid, automated, and in situ application in fish processing plants. In achieving this, the thesis is structured around the following three objectives:

Develop an Approach for Fish Species and Body Part Identification: This objective tackles the critical need for accurate product labeling to prevent consumer fraud and optimize the use of byproducts. Existing analytical methods are often too slow for real-time factory use, and methods like the state-of-the-art OPLS-DA are limited in their ability to model complex, non-linear spectral data [De Graeve, 2023][Cardoso, 2025]. The primary challenge lies in accurately classifying species and tissues from complex, high-dimensional REIMS spectra where biochemical differences can be subtle. To address this, the goal is to develop a potentially interpretable machine learning pipeline, exploring advanced deep learning architectures. Specifically, Transformer variants are chosen for their self-attention mechanism, which is hypothesized to be uniquely suited to capturing the complex, long-range dependencies between \(m/z\) features across the entire mass spectrum. A central aim is to ensure these models are verifiable by domain experts through post-hoc explainability methods (LIME and Grad-CAM) that identify the key biochemical features driving classification decisions. This work represents the first application of deep learning with Transformer models to REIMS biomass analysis.
Formalize and Solve Novel Contamination and Adulteration Detection Tasks: This objective addresses direct threats to food safety and economic integrity by formalizing two novel problems for REIMS analysis: oil contamination and cross-species adulteration. These tasks are crucial for consumer protection, but no rapid, in-situ methods currently exist to detect them. The SOTA is reliant on slow, lab-based methods, and the core challenge for an automated approach is identifying the faint chemical signals of contaminants, which are often masked by the dominant biochemical profile of the biomass itself. The goal is to develop highly sensitive models. Transformer variations are leveraged again, as their learned representations are hypothesized to be sensitive enough to detect these subtle signals. Furthermore, transfer learning is explored as a method to overcome data scarcity—a key challenge in food safety—by transferring knowledge from a data-rich source task to these more difficult, data-scarce detection tasks. Unsupervised pretraining is applied to these tasks as an alternative approach to overcoming data scarcity by learning general-purpose representations that can be reused for downstream tasks. Crucially, this objective includes using LIME explanations to validate that the models learn chemically relevant features, ensuring their decisions are transparent and reliable.
This research is the first to apply REIMS analysis to oil contamination detection in biomass and the first to use it for cross-species adulteration detection in marine biomass.

Develop a Contrastive Learning Framework for Batch Detection: This objective addresses the need for robust batch traceability, which is essential for enabling swift product recalls in the event of a safety issue. The SOTA for traceability is no longer just costly physical tags [Mai, 2010], but an emerging field of digital systems like Blockchain [Dahariya, 2025] and Digital Product Passports [Jiang, 2025]. However, these systems are vulnerable to fraudulent data entry [Turkson, 2025] and face high adoption barriers [Untal, 2025]. The main challenge is to create an intrinsic, analytical verification method that can validate these digital records without relying on explicit labels.
To overcome this, the goal is to develop "SpectroSim," a novel contrastive learning method. This framework is chosen specifically because it is designed to learn a similarity-based embedding space in a self-supervised manner. This directly addresses the core challenges by (a) performing a pairwise comparison (``are these two samples similar?") rather than standard classification, and (b) removing the need for impractical, manually-created labels for every new batch. The aim is for this model to outperform standard classification approaches, establishing the first machine learning-based solution for analytical batch detection in this field.

As a whole, this thesis aims to demonstrate the transformative power of ML models, in particular, those of deep learning (e.g., Transformers and MoE), with ML methodologies for enhanced representation learning (e.g., unsupervised pre-training, transfer learning), in conjunction with REIMS technology, can be utilized for fast, accurate and potentially explainable quality control in fish processing. By systematically addressing challenges, from simpler tasks such as species identification to more complex tasks such as oil contamination detection or batch detection, this work contributes to the development of rapid, accurate, and automated systems for ensuring food safety and quality by preventing fish fraud or contamination through mislabeling, adulteration, or other means. This helps consumer trust through the measurable integrity of seafood products in fish processing. The subsequent chapters will elaborate on the methodologies, experimental results, and implications of each of these contributions in detail.

Major Contributions

The major contributions of this thesis are the development and application of novel deep learning methods for REIMS-based marine biomass analysis, for quality control in the seafood industry, particularly in the context of New Zealand. This work is the first to systematically apply advanced deep learning architectures—including Transformers, Mixture of Experts (MoE), and self-supervised contrastive learning—to the specific tasks and datasets explored in this thesis, consistently outperforming traditional analytical methods on this data. The key contributions are organized by the specific challenges addressed:

Novel Deep Learning Architectures for Foundational Biomass Classification Tasks: This thesis contributes two new deep learning models — a Mixture of Experts (MoE) Transformer and a multi-scale Ensemble Transformer—that significantly advance the state-of-the-art for identifying fish species and body parts from the REIMS data. The new knowledge generated shows that specialized Transformer-based architectures can achieve unprecedented accuracy on these foundational quality control tasks.
The contents of this chapter are based on the work presented in [Wood, 2025].
- Technical Contribution:
  A new Mixture of Experts (MoE) Transformer, named ``Gone Phishing," was proposed to enhance model capacity and specialization for this domain. In Addition, a Multi-scale Ensemble Transformer called ``Autobots" is proposed for REIMS-based marine biomass analysis.
- Results and Analysis:
  The introduction of these models led to significant performance gains on the datasets tested. The results demonstrate a clear task-dependent trade-off between the two architectures. The MoE Transformer (``Gone Phishing"), which uses specialized sub-networks, proved superior for the binary fish species classification task, achieving a perfect 100% accuracy. Conversely, the Multi-scale Ensemble Transformer (`"Autobots"), which combines models of different depths, was more robust for the more complex multi-class fish body part classification task, achieving the top accuracy of 74.13% (a significant improvement over the OPLS-DA benchmark of 51.17%).
The First Framework for Detecting Contamination and Adulteration with REIMS:
This research contributes the first-ever application of REIMS analysis to solve two critical food safety issues using the provided datasets: oil contamination and cross-species adulteration. The contribution lies in formalizing these previously unaddressed issues as machine learning problems and demonstrating that deep learning can effectively solve them. Oil contamination was formulated as an ordinal classification task to identify its severity, while cross-species adulteration was framed as a multi-class problem to detect the type and presence of foreign species.
- Technical Contribution:
  A novel unsupervised pre-training technique, Masked Spectra Modeling (MSM), was developed by adapting BERT"s masked language modeling for sequential REIMS data. Transfer Learning approaches were explored for the MoE Transformer for REIMS-based data. The successful adaptation of the deep learning methods (i.e., MoE and Ensemble Transformers) from the previous chapter's classification tasks to these more complex problems is a key contribution, demonstrating the robustness and versatility of the proposed architectures for these data. An exploration of ordinal classification approaches to the oil contamination task is proposed.
- Results and Analysis:
  The models demonstrated strong performance on the test data, showing that pre-training improved accuracy. A key finding was that the difficult task of oil contamination detection consistently benefited from transfer learning, regardless of the source task, improving accuracy on this task by up to 3.59%. This highlights that transfer learning is most effective for more challenging tasks with subtle chemical signals within this dataset. The exploration of ordinal classification approaches for oil contamination concluded that for distance-aware metrics, ordinal classification approaches improved accuracy.
A Novel Self-Supervised Framework for Chemical Batch Traceability: This work contributes a new method for batch traceability, addressing this novel challenge with a self-supervised learning approach that does not require labeled data. The novelty lies in creating a system that can learn to identify the unique chemical fingerprint of a production batch from the raw spectra alone.
- Technical Contribution: A novel contrastive learning framework named ``SpectroSim" was developed. This method adapts the SimCLR framework by incorporating a Transformer-based encoder and a custom projection head, specifically designed for the pairwise comparison of mass spectra.
- Results and Analysis: SpectroSim achieved 70.8% accuracy in the task of batch detection. Crucially, it accomplished this without using any class labels during training, demonstrating the power of self-supervised learning to identify subtle, batch-specific chemical signatures from an imbalanced dataset, far surpassing traditional binary classification approaches, which struggled to achieve over 60% accuracy on this dataset.

Thesis Overview

In this thesis, we explore the design and development of new state-of-the-art deep learning methods for REIMS-based marine biomass analysis.

In Section, we discuss the foundational materials and background necessary for the topic of REIMS marine biomass analysis, where we focus on the application of ML techniques for analyzing REIMS data. Therefore, we discuss: (1) The Challenges of Seafood Integrity, (2) Analytical Techniques for Seafood Authentication, and (3) Machine Learning for Enhanced REIMS Data Analysis. This background material is sufficient to understand the key contributions that comprise this thesis.
In Section, the datasets provided by AgResearch, New Zealand, are introduced. This includes four classification tasks: fish species, fish body part, oil contamination, cross-species adulteration, and batch detection - a pairwise comparison task. Consequently, this chapter covers the following aspects of the dataset: (1) REIMS Data Acquisition and Characteristics, (2) Specific Datasets for Classification Tasks, and (3) Dataset Preprocessing: Normalization. The contents of this chapter encapsulate the ML tasks that are addressed throughout the remainder of this work.
In Section, we address the fundamental tasks of Fish Species and Fish Body Part Identification. This chapter details the initial application of machine learning,
establishing their effectiveness over traditional OPLS-DA benchmarks for these REIMS marine biomass analysis tasks. Model interpretability using LIME and Grad-CAM is also explored. Building on this, the chapter introduces how more advanced methodologies developed in this thesis enhance these identifications: the Mixture of Experts (MoE) Transformer and Ensemble Transformer architectures are applied, and their impact on these specific tasks is evaluated.

In Section, we tackle the more challenging Oil Contamination and Cross-species Adulteration Detection tasks. The chapter outlines how methods like unsupervised pre-training and transfer learning with Transformers prove effective for these complex REIMS data scenarios, with LIME again used for model interpretability. This chapter further demonstrates the capabilities of the advanced techniques central to this thesis: the Mixture of Experts (MoE) Transformer and Ensemble architectures are applied to these difficult classifications, showcasing their performance benefits. Moreover, systematic investigations of transfer learning are presented, highlighting significant outcomes such as consistent performance improvements for oil contamination detection and confirming the task-dependent nature of transfer success. Finally, ordinal classification approaches are explored and evaluated with distance-aware metrics for comparison.
In Section, we develop Contrastive Learning for Batch Detection, a contrastive learning framework with a custom encoder and project head for batch detection for REIMS marine biomass analysis. This contribution compares binary classification to contrastive learning for pair-wise comparison batch detection of marine biomass. The proposed method, SpectroSim, a SimCLR network with a transformer encoder and custom projection head, performs with the highest accuracy (70.8%) at the batch detection task, significantly outperforming binary classification methods.
In Section, the thesis reaches its conclusion, with a review of the key contributions and findings, and remarks on the overall significance of the work to the field of REIMS marine biomass analysis. The limitations of the research, followed by future work directions, are addressed, and finally, some concluding remarks are given.

See also: Literature Survey · Conclusions

References (60)

Ahles (2025) — A meta-analysis of seafood species mislabeling in the United States · 13 citations
Australia (2016) — Melbourne restaurant hunky dory accused of serving catfish to customers instead of dory
Balog (2010) — Identification of biological tissues by rapid evaporative ionization mass spectrometry · 208 citations
Balog (2013) — Intraoperative tissue identification using rapid evaporative ionization mass spectrometry. · 551 citations
Balog (2016) — Identification of the Species of Origin for Meat Products by Rapid Evaporative Ionization Mass Spectrometry · 128 citations
Black (2017) — A real time metabolomic profiling approach to detecting fish fraud using rapid evaporative ionisation mass spectrometry · 94 citations
Black (2019) — Rapid detection and specific identification of offals within minced beef samples utilising ambient mass spectrometry · 49 citations
Boccard (2013) — A consensus orthogonal partial least squares discriminant analysis (OPLS-DA) strategy for multiblock Omics data fusion · 333 citations
Bozinovski (1976) — The influence of pattern similarity and transfer learning upon training of a base perceptron b2
Bylesjo (2006) — OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification · 1,379 citations
Cafarella (2024) — Rapid evaporative ionization mass spectrometry: A survey through 15 years of applications · 7 citations
Cardoso (2025) — Prediction of coffee traits by artificial neural networks and laser-assisted rapid evaporative ionization mass spectrometry · 3 citations
Chen (2020) — A simple framework for contrastive learning of visual representations · 23,578 citations
Co-operation (2021) — Fisheries and Aquaculture in Norway · 10 citations
Dahariya (2025) — Enhancing Livestock Supply Chains with Blockchain Traceability from Source to Market: A Survey
De Graeve (2023) — Multivariate versus machine learning-based classification of rapid evaporative Ionisation mass spectrometry spectra towards industry based large-scale fish speciation · 27 citations
Devlin (2018) — Bert: Pre-training of deep bidirectional transformers for language understanding · 112,446 citations
Do (2025) — The investigation of seafood mislabeling in Asia: a review and coping strategies · 5 citations
FAO (2020) — The State of World Fisheries and Aquaculture, 2020
Fellows (2017) — Food Processing Technology: Principles and Practice · 789 citations
Food (2024) — The State of World Fisheries and Aquaculture 2024
Gao (2025) — Accurate lamb origin identification and molecular differentiation analysis using rapid evaporative ionization mass spectrometry
Gastaldi Garcia (2025) — Technical traceability systems in the food and fisheries sectors: A comparative analysis of Spain and Sweden
Ghaly (2013) — Fish processing wastes as a potential source of proteins · 411 citations
Gkarane (2025) — Towards Real-Time Industry-Proof Pork Breed and Boar Taint Classification Using Rapid Evaporative Ionisation Mass Spectrometry (Reims) · 0 citations
Goodfellow (2016) — Deep Learning · 50,820 citations
Helyar (2014) — Fish product mislabelling: failings of traceability in the production chain and implications for illegal, unreported and unregulated (IUU) fishing. · 168 citations
Henderson (2025) — Advancements in Ambient Ionisation Mass Spectrometry in 2024: An Annual Review · 12 citations
Industries (2024) — Hoki: New Zealand's largest fishery · 16 citations
Jacobs (1991) — Adaptive mixtures of local experts
Jennings (2001) — Marine Fisheries Ecology · 479 citations
Jiang (2025) — Traceability Data in the form of Digital Food Product Passports for Fish Supply Chains · 0 citations
Kaiser (2017) — One model to learn them all · 345 citations
Koppen (2000) — The curse of dimensionality · 0 citations
Lu (2024) — Comparative evaluating laser ionization and iKnife coupled with rapid evaporative ionization mass spectrometry and machine learning for geographical authentication of Larimichthys crocea · 9 citations
Mai (2010) — Benefits of traceability in fish supply chains–case studies · 128 citations
Mar\' n (2025) — A meta-review of DNA-based identification methods and mislabeling analysis of Eastern South Pacific seafood · 2 citations
Moens (2003) — Production and use of food-grade lubricants · 13 citations
Montgomery (2019) — Introduction to Statistical Quality Control · 4,827 citations
Pardo (2016) — Misdescription incidents in seafood sector · 150 citations
Premanandh (2013) — Horse meat scandal–A wake-up call for regulatory authorities · 207 citations
Pruekprasert (2025) — Minimally Invasive Evaluation of Venous Leg Ulcers in an Outpatient Setting Using Rapid Evaporative Ionization Mass Spectrometry Coupled with a Carbon Dioxide Laser · 0 citations
Ribeiro (2016) — "Why should i trust you?" Explaining the predictions of any classifier · 21,085 citations
Russell (2016) — Artificial intelligence: a modern approach · 1,099 citations
Schafer (2009) — In vivo, in situ tissue analysis using rapid evaporative ionization mass spectrometry · 294 citations
Selvaraju (2017) — Grad-cam: Visual explanations from deep networks via gradient-based localization · 25,775 citations
Shen (2020) — Development of an intelligent surgical knife rapid evaporative ionization mass spectrometry based method for real-time differentiation of cod from oilfish · 22 citations
Shen (2022) — Detection of fish frauds (basa catfish and sole fish) via iKnife rapid evaporative ionization mass spectrometry: An in situ and real-time analytical method
Simopoulos (2011) — Evolutionary aspects of diet: the omega-6/omega-3 ratio and the brain · 600 citations
Stevens (2018) — Fish byproducts are an under-valued and untapped resource
Tan (2018) — A survey on deep transfer learning · 2,860 citations
Turkson (2025) — Digital technologies for traceability and transparency in the global fish supply chains: a systematic review and future directions · 26 citations
Untal (2025) — Fishers' preference for mobile traceability platform: challenges in achieving a digital tuna supply chain in Davao Region, Philippines · 4 citations
Vaswani (2017) — Attention is all you need · 171,736 citations
Verplanken (2017) — Rapid evaporative ionization mass spectrometry for high-throughput screening in food analysis: The case of boar taint · 78 citations
Wolpert (1992) — Stacked generalization · 6,303 citations
Wood (2025) — Hook, Line, and Spectra: Machine Learning for Fish Species Identification and Body Part Classification using Rapid Evaporative Ionization Mass Spectrometry · 0 citations
Xing (2016) — Automated Inspection of Food Products by Machine Vision · 13 citations
Xue (2025) — From Laboratory Exploration to Practice: Applications, Challenges, and Development Trends of Rapid Evaporative Ionization Mass Spectrometry Technology in Food Detection · 0 citations
Zhang (2025) — Beyond mislabelling: Chinese fish balls authentication by metabarcoding allows unveiling hidden mammal and avian species

This Thesis

Tasks: Species ID · Body Part ID · Oil Contamination · Adulteration · Batch Detection

Novel Models: Gone Phishing · Autobots · SpectroSim · MSM

Chapters: Ch.0: Acknowledgements · Ch.1: Introduction · Ch.2: Literature Survey · Ch.3: Datasets and Process · Ch.4: Fish Species and Par · Ch.5: Oil Contamination an · Ch.6: Contrastive Learning · Ch.7: Conclusions

← Acknowledgements Literature Survey →