Summary of Better Knowledge Enhancement For Privacy-preserving Cross-project Defect Prediction, by Yuying Wang et al.
Better Knowledge Enhancement for Privacy-Preserving Cross-Project Defect Prediction
by Yuying Wang, Yichen Li, Haozhao Wang, Lei Zhao, Xiaofang Zhang
First submitted to arxiv on: 23 Dec 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Software Engineering (cs.SE)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Federated Learning (FL) approach, FedDP, addresses the challenge of Cross-Project Defect Prediction (CPDP) by leveraging data from multiple projects while preserving privacy. The model training is hindered by data heterogeneity across proprietary projects, which FedDP aims to overcome through two novel solutions: Local Heterogeneity Awareness and Global Knowledge Distillation. The distillation dataset consists of open-source project data, and the global model is optimized using a heterogeneity-aware local model ensemble via knowledge distillation. Experimental results on 19 projects from two datasets show that FedDP outperforms baselines. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary FedDP helps predict defects in software projects without sharing sensitive data between companies. This problem is hard because different projects have different data, making it tough to train a good model. To solve this issue, researchers propose a new way of training models using Federated Learning. They use two ideas: first, they make sure the local models are aware of their own differences; second, they distill knowledge from one model to another. This approach uses open-source project data as a “teacher” and trains a global model with multiple local models that learn from each other. The results show that this method works better than others. |
Keywords
» Artificial intelligence » Distillation » Federated learning » Knowledge distillation