Summary of Fair Federated Data Clustering Through Personalization: Bridging the Gap Between Diverse Data Distributions, by Shivam Gupta et al.
Fair Federated Data Clustering through Personalization: Bridging the Gap between Diverse Data Distributions
by Shivam Gupta, Tarushi, Tsering Wangzes, Shweta Jain
First submitted to arxiv on: 5 Jul 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Machine learning algorithms have seen significant performance boosts due to the rapid growth of data from edge devices. However, traditional machine learning paradigms face two major challenges: centralizing data for training and dealing with missing class labels in most generated data. There’s poor incentive for clients to manually label their data due to high cost and lack of expertise. To overcome these issues, initial attempts have been made to handle unlabelled data in a privacy-preserving distributed manner using unsupervised federated data clustering. The goal is to partition data available on clients into k partitions (called clusters) without actual exchange of data. Most existing algorithms are highly dependent on data distribution patterns across clients or are computationally expensive. We propose p-FClus, which addresses the goals of achieving lower clustering cost and uniform cost across clients in a single round of communication between the server and clients. We validate the efficacy of p-FClus against various federated datasets, showcasing its data independence nature, applicability to any finite -norm, while simultaneously achieving lower cost and variance. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Machine learning algorithms have gotten better because of lots of data from devices like smartphones. But there are two big problems: all the data needs to be sent to one place for training, and most of the time, we don’t know what kind of labels to use. It’s hard for people to label their own data because it takes too long and they’re not experts. To fix this, some researchers have tried using unsupervised clustering on devices without sharing actual data. The goal is to group similar data together on each device. Many existing methods rely on knowing how the data is spread out or are really slow. We came up with a new method called p-FClus that tries to balance two things: making it easier for devices to group their own data, and making sure all devices have the same level of difficulty. We tested p-FClus on many different datasets and showed that it’s good at handling different kinds of data and is fair. |
Keywords
» Artificial intelligence » Clustering » Machine learning » Unsupervised