Genetic programming for multiple-feature construction on high-dimensional classification
Abstract
Abstract Data representation is an important factor in deciding the performance of machine learning algorithms including classification. Feature construction (FC) can combine original features to form high-level ones that can help classification algorithms achieve better performance. Genetic programming (GP) has shown promise in FC due to its flexible representation. Most GP methods construct a single feature, which may not scale well to high-dimensional data. This paper aims at investigating different approaches to constructing multiple features and analysing their effectiveness, efficiency, and underlying behaviours to reveal the insight of multiple-feature construction using GP on high-dimensional data. The results show that multiple-feature construction achieves significantly better performance than single-feature construction. In multiple-feature construction, using multi-tree GP representation is shown to be more effective than using the single-tree GP thanks to the ability to consider the interaction of the newly constructed features during the construction process. Class-dependent constructed features achieve better performance than the class-independent ones. A visualisation of the constructed features also demonstrates the interpretability of the GP-based FC approach, which is important to many real-world applications.
Cited in this thesis
Frequently Cited Together
- One model to learn them all2 chapters
- From Laboratory Exploration to Practice: Applications, Challenges, and Developme2 chapters
- Deep Learning2 chapters
- Stacked generalization2 chapters
- Grad-cam: Visual explanations from deep networks via gradient-based localization2 chapters
- Bert: Pre-training of deep bidirectional transformers for language understanding2 chapters
BibTeX
@article{Tran2019,
author = {Tran, Binh and Xue, Bing and Zhang, Mengjie},
journal = {Pattern Recognition},
title = {Genetic programming for multiple-feature construction on high-dimensional classification},
year = {2019},
pages = {404–417},
volume = {93},
publisher = {Elsevier},
}