Summary of Dual-personalizing Adapter For Federated Foundation Models, by Yiyuan Yang et al.
Dual-Personalizing Adapter for Federated Foundation Models
by Yiyuan Yang, Guodong Long, Tao Shen, Jing Jiang, Michael Blumenstein
First submitted to arxiv on: 28 Mar 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Recently, foundation models have demonstrated impressive adaptability to various tasks through fine-tuning diverse instruction data. Federated foundation models (FedFM) emerged as a privacy preservation method for collaborative model fine-tuning under federated learning settings by leveraging distributed datasets with non-IID data. To alleviate communication and computation overhead, parameter-efficient methods were introduced for efficiency, and some research adapted personalization methods to FedFM for better user preferences alignment. However, a critical gap in existing research is the neglect of test-time distribution shifts in real-world applications. Conventional methods for test-time distribution shifts in personalized FL are less effective for FedFM due to their failure to adapt to complex distribution shift scenarios and the requirement to train all parameters. To bridge this gap, we refined the setting in FedFM, termed test-time personalization, which aims to learn personalized federated foundation models on clients while effectively handling test-time distribution shifts simultaneously. We explored a simple yet effective solution, Federated Dual-Personalizing Adapter (FedDPA) architecture, by co-working with a foundation model, global adapter and local adapter jointly tackling the test-time distribution shifts and client-specific personalization. Additionally, we introduced an instance-wise dynamic weighting mechanism that dynamically integrates the global and local adapters for each test instance during inference, facilitating effective test-time personalization. The proposed method has been evaluated on benchmark datasets across different NLP tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Recently, a type of AI model called foundation models have shown they can adapt to many different tasks by learning from lots of instruction data. This is important because it allows these models to be used for many different purposes without needing to be retrained every time. One way this is done is through something called federated learning, where multiple devices or computers work together to train the model while keeping their own data private. However, there’s a problem with this approach: when it comes time to use the model in real-world situations, it often doesn’t perform well because the data used during training and testing are different. To solve this issue, researchers have developed a new method called test-time personalization that can learn personalized models on individual devices while still handling these differences in data. This is achieved through an architecture called Federated Dual-Personalizing Adapter (FedDPA), which combines two types of adapters to handle both the device-specific and distribution shift challenges. The effectiveness of this approach has been evaluated on various natural language processing tasks. |
Keywords
» Artificial intelligence » Alignment » Federated learning » Fine tuning » Inference » Natural language processing » Nlp » Parameter efficient