Gaussian Error Linear Units (GELUs)

Dan Hendrycks, Kevin Gimpel

2016 arXiv preprint arXiv:1606.08415 Cited 6,545 times

Abstract

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ($x\mathbf{1}_{x>0}$). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.

BibTeX
@article{Hendrycks2016,
  author = {Hendrycks, Dan and Gimpel, Kevin},
  journal = {arXiv preprint arXiv:1606.08415},
  title = {Gaussian error linear units (gelus)},
  year = {2016},
}