This paper was accepted to the workshop on Distribution Shifts in NeurIPS 2023.

Large-scale training of models has become exceedingly more expensive. In an ever changing world where Petabytes of new data is generated every day, we want to be able to continually train models. In this paper, we create a benchmark for continual large-scale training of CLIP models where the data distribution varies only by time. Compared with traditional continual learning literature, there is no hard separation of tasks, i.e., we assume an infinite stream of data in a canonical format arrives that exhibits natural distribution shifts as time passes. We create multiple such benchmarks for CLIP training based on standard benchmarks such as DataComp and YFCC15M. We propose various evaluations and demonstrate that models trained on data up to a certain year will lose performance on certain categories of rapidly changing data. We propose simple learning rate schedules, and training with replay buffers to reduce the gap in forward transfer. We demonstrate that a simple baseline that continues training from the last checkpoint and replays old data can be competitive with an Oracle that gets all data up to now in one pass and trains with a large budget.

Related readings and updates.

This paper was accepted at the Scalable Continual Learning for Lifelong Foundation Models (SCLLFM) Workshop at NeurIPS 2024. Large Language Models (LLMs) trained on historical web data inevitably become outdated. We investigate evaluation strategies and update methods for LLMs as new data becomes available. We introduce a web-scale dataset for time-continual pretraining of LLMs derived from 114 dumps of Common Crawl (CC) - orders of magnitude…
Read more
Keeping large foundation models up to date on latest data is inherently expensive. To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models. This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines. We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataComp, TiC-YFCC, and TiC-Redcaps…
Read more