Summary of Efficient Semi-supervised Inference For Logistic Regression Under Case-control Studies, by Zhuojun Quan et al.

Efficient semi-supervised inference for logistic regression under case-control studies

by Zhuojun Quan, Yuanyuan Lin, Kani Chen, Wen Yu

First submitted to arxiv on: 23 Feb 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper explores semi-supervised learning in machine learning settings where a labeled dataset with outcomes and covariates is paired with an unlabeled dataset containing only covariates. The authors focus on binary outcome data collected through case-control sampling, which helps alleviate imbalance structures. Under the logistic model assumption, case-control data can provide consistent slope parameter estimates but not intercept parameter estimates. However, by incorporating the unlabeled data, the intercept parameter becomes identifiable. The authors develop a likelihood function and iterative algorithm to obtain maximum likelihood estimators that are consistent, asymptotically normal, and semiparametrically efficient. Simulation studies demonstrate the proposed method’s finite-sample performance, showing improved estimation efficiency for slope parameters and accurate marginal case proportion estimates.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Semi-supervised learning helps machines learn from a mix of labeled and unlabeled data. In this case, researchers are trying to figure out how to use both types of data to make predictions about whether something will happen or not (like a yes/no question). They’re using a special way of collecting data called “case-control sampling” that helps balance the amount of data they have for each outcome. With just labeled data, they can only estimate some things, but by adding in unlabeled data, they can get even more accurate results. The researchers came up with a new way to do this using math and computer algorithms, and it works really well.

Keywords

* Artificial intelligence * Likelihood * Machine learning * Semi supervised

Efficient semi-supervised inference for logistic regression under case-control studies

by Zhuojun Quan, Yuanyuan Lin, Kani Chen, Wen Yu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Towards Principled Task Grouping For Multi-task Learning, by Chenguang Wang et al.

Summary of The Impact Of Lora on the Emergence Of Clusters in Transformers, by Hugo Koubbi et al.

Related Posts