How Effective is Task-Agnostic Data Augmentation for Pre-trained Transformers?

AuthorsShayne Longpre, Yu Wang, Christopher DuBois

Task-agnostic forms of data augmentation have proven widely effective in computer vision, even on pretrained models. In NLP similar results are reported most commonly for low data regimes, non-pretrained models, or situationally for pretrained models. In this paper we ask how effective these techniques really are when applied to pretrained transformers. Using two popular varieties of task-agnostic data augmentation (not tailored to any particular task), Easy Data Augmentation (Wei and Zou, 2019) and Back-Translation (Sennrichet al., 2015), we conduct a systematic examination of their effects across 5 classification tasks, 6 datasets, and 3 variants of modern pre-trained transformers, including BERT, XLNet, and RoBERTa. We observe a negative result, finding that techniques which previously reported strong improvements for not pre-trained models fail to consistently improve performance for pre-trained transformers, even when training data is limited. We hope this empirical analysis helps inform practitioners where data augmentation techniques may confer improvements.

How Effective is Task-Agnostic Data Augmentation for Pre-trained Transformers?

Related readings and updates.

Exploring Empty Spaces: Human-in-the-Loop Data Augmentation

EMNLP 2020

Discover opportunities in Machine Learning.