View publication

*Equal Contributors

This paper was accepted at the International Workshop on Federated Learning in the Age of Foundation Models (FL@FM) at NeurIPS 2023.

Personalized federated learning (PFL) aims at learning personalized models for users in a federated setup. We focus on the problem of privately estimating histograms (in the KL metric) for each user in the network. Conventionally, for more general problems, learning a global model jointly via federated averaging, and then finetuning locally for each user has been a winning strategy. But this can be suboptimal if the user distribution observes diverse subpopulations, as one might expect with user vocabularies. To tackle this, we study an alternative PFL technique: clustering-based personalization that first identifies diverse subpopulations when present, enabling users to collaborate more closely with others from the same subpopulation. We motivate our algorithm via a stylized generative process mixture of Dirichlets, and propose initialization/pre-processing techniques that reduce the iteration complexity of clustering. This enables the application of privacy mechanisms at each step of our iterative procedure, making the algorithm user-level differentially private without a severe drop in utility due to added noise. Finally, we present empirical results on Reddit user's data, where we compare our method with other well-known PFL approaches applied to private histogram estimation.

Related readings and updates.

This paper presents an implementation of machine learning model training using private federated learning (PFL) on edge devices. We introduce a novel framework that uses PFL to address the challenge of training a model using users' private data. The framework ensures that user data remain on individual devices, with only essential model updates transmitted to a central server for aggregation with privacy guarantees. We detail the architecture of…
Read more
Motivated by the problem of next word prediction on user devices we introduce and study the problem of personalized frequency histogram estimation in a federated setting. In this problem, over some domain, each user observes a number of samples from a distribution which is specific to that user. The goal is to compute for all users a personalized estimate of the user's distribution with error measured in KL divergence. We focus on addressing two…
Read more