How Effective is Task-Agnostic Data Augmentation for Pre-trained Transformers?
AuthorsShayne Longpre, Yu Wang, Christopher DuBois
AuthorsShayne Longpre, Yu Wang, Christopher DuBois
Task-agnostic forms of data augmentation have proven widely effective in computer vision, even on pretrained models. In NLP similar results are reported most commonly for low data regimes, non-pretrained models, or situationally for pretrained models. In this paper we ask how effective these techniques really are when applied to pretrained transformers. Using two popular varieties of task-agnostic data augmentation (not tailored to any particular task), Easy Data Augmentation (Wei and Zou, 2019) and Back-Translation (Sennrichet al., 2015), we conduct a systematic examination of their effects across 5 classification tasks, 6 datasets, and 3 variants of modern pre-trained transformers, including BERT, XLNet, and RoBERTa. We observe a negative result, finding that techniques which previously reported strong improvements for not pre-trained models fail to consistently improve performance for pre-trained transformers, even when training data is limited. We hope this empirical analysis helps inform practitioners where data augmentation techniques may confer improvements.
March 26, 2025research area Human-Computer Interaction, research area Tools, Platforms, Frameworksconference CHI
Apple sponsored the Empirical Methods in Natural Language Processing (EMNLP) conference, which was held virtually from November 16 to 20. EMNLP is a leading conference focused on natural language processing.