Genetic programming for multiple-feature construction on high-dimensional classification

Binh Tran, Bing Xue, Mengjie Zhang

2019 Pattern Recognition Cited 95 times

Abstract

Abstract Data representation is an important factor in deciding the performance of machine learning algorithms including classification. Feature construction (FC) can combine original features to form high-level ones that can help classification algorithms achieve better performance. Genetic programming (GP) has shown promise in FC due to its flexible representation. Most GP methods construct a single feature, which may not scale well to high-dimensional data. This paper aims at investigating different approaches to constructing multiple features and analysing their effectiveness, efficiency, and underlying behaviours to reveal the insight of multiple-feature construction using GP on high-dimensional data. The results show that multiple-feature construction achieves significantly better performance than single-feature construction. In multiple-feature construction, using multi-tree GP representation is shown to be more effective than using the single-tree GP thanks to the ability to consider the interaction of the newly constructed features during the construction process. Class-dependent constructed features achieve better performance than the class-independent ones. A visualisation of the constructed features also demonstrates the interpretability of the GP-based FC approach, which is important to many real-world applications.

BibTeX
@article{Tran2019,
  author = {Tran, Binh and Xue, Bing and Zhang, Mengjie},
  journal = {Pattern Recognition},
  title = {Genetic programming for multiple-feature construction on high-dimensional classification},
  year = {2019},
  pages = {404–417},
  volume = {93},
  publisher = {Elsevier},
}