Loading Now

Summary of Efficient Semi-supervised Inference For Logistic Regression Under Case-control Studies, by Zhuojun Quan et al.


Efficient semi-supervised inference for logistic regression under case-control studies

by Zhuojun Quan, Yuanyuan Lin, Kani Chen, Wen Yu

First submitted to arxiv on: 23 Feb 2024

Categories

  • Main: Machine Learning (stat.ML)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper explores semi-supervised learning in machine learning settings where a labeled dataset with outcomes and covariates is paired with an unlabeled dataset containing only covariates. The authors focus on binary outcome data collected through case-control sampling, which helps alleviate imbalance structures. Under the logistic model assumption, case-control data can provide consistent slope parameter estimates but not intercept parameter estimates. However, by incorporating the unlabeled data, the intercept parameter becomes identifiable. The authors develop a likelihood function and iterative algorithm to obtain maximum likelihood estimators that are consistent, asymptotically normal, and semiparametrically efficient. Simulation studies demonstrate the proposed method’s finite-sample performance, showing improved estimation efficiency for slope parameters and accurate marginal case proportion estimates.
Low GrooveSquid.com (original content) Low Difficulty Summary
Semi-supervised learning helps machines learn from a mix of labeled and unlabeled data. In this case, researchers are trying to figure out how to use both types of data to make predictions about whether something will happen or not (like a yes/no question). They’re using a special way of collecting data called “case-control sampling” that helps balance the amount of data they have for each outcome. With just labeled data, they can only estimate some things, but by adding in unlabeled data, they can get even more accurate results. The researchers came up with a new way to do this using math and computer algorithms, and it works really well.

Keywords

* Artificial intelligence  * Likelihood  * Machine learning  * Semi supervised