Summary of An Experiment on Feature Selection Using Logistic Regression, by Raisa Islam et al.
An Experiment on Feature Selection using Logistic Regression
by Raisa Islam, Subhasish Mazumdar, Rakibul Islam
First submitted to arxiv on: 31 Jan 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper investigates a method for feature selection in supervised machine learning using L1 and L2 regularization strategies with logistic regression (LR). The approach synthesizes findings from both methods to enhance explainability and performance. The CIC-IDS2018 dataset is used, which has two problematic classes that are hard to separate. The study compares LR+L1 against LR+L2 by varying the feature set sizes for each ranking. No significant difference in accuracy is found between the two methods once the feature set is selected. A synthesized feature set is also tested on Decision Tree and Random Forest models, showing close accuracy despite a small feature set size. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper looks at how to choose which features are most important in machine learning. It uses two techniques called L1 and L2 regularization with logistic regression to do this. The study chooses a big dataset that has some tricky classes to separate. It then compares the results of using L1 or L2 regularization separately, and also combines them to see if it helps. The paper finds that combining the methods doesn’t make much difference in how well the model performs. It also tests the combined method on more complex models like Decision Trees and Random Forests, and sees that they do pretty well despite using fewer features. |
Keywords
* Artificial intelligence * Decision tree * Feature selection * Logistic regression * Machine learning * Random forest * Regularization * Supervised