One Model To Learn Them All

Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones et al.

2017 arXiv.org Cited 345 times

Abstract

Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all.

Cited in this thesis

Introduction Literature Survey Fish Species and Part Identification Oil Contamination and Cross-Species Adulteration Conclusions

Frequently Cited Together

A real time metabolomic profiling approach to detecting fish fraud using rapid eBlack 20175 chapters
Attention is all you needVaswani 20175 chapters
From Laboratory Exploration to Practice: Applications, Challenges, and DevelopmeXue 20255 chapters
Development of an intelligent surgical knife rapid evaporative ionization mass sShen 20205 chapters
Detection of fish frauds (basa catfish and sole fish) via iKnife rapid evaporatiShen 20225 chapters
Multivariate versus machine learning-based classification of rapid evaporative IDe Graeve 20235 chapters

BibTeX

@article{Kaiser2017,
  author = {Kaiser, Lukasz and others},
  journal = {arXiv preprint arXiv:1706.05137},
  title = {One model to learn them all},
  year = {2017},
}