Summary of Making Pre-trained Language Models Great on Tabular Prediction, by Jiahuan Yan et al.
Making Pre-trained Language Models Great on Tabular Prediction
by Jiahuan Yan, Bo Zheng, Hongxia Xu, Yiheng Zhu, Danny Z. Chen, Jimeng Sun, Jian Wu, Jintai Chen
First submitted to arxiv on: 4 Mar 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Medium Difficulty summary: This paper presents a novel deep learning model, TP-BERTa, designed specifically for tabular data prediction tasks such as regression or classification. Unlike traditional image and language processing, tabular data prediction is hindered by the heterogeneity among tables, limiting the transferability of deep neural networks (DNNs). The authors leverage knowledge from diverse domains to develop a language model (LM) that can comprehend feature names across various tables. However, the discrete text representation space of LMs is incompatible with numerical feature values in tables. To address this challenge, the authors propose a novel relative magnitude tokenization and intra-feature attention approach. These techniques convert scalar numerical feature values into finely discrete, high-dimensional tokens and integrate feature values with corresponding feature names. Experimental results demonstrate that TP-BERTa outperforms tabular DNNs and is competitive with Gradient Boosted Decision Tree models in typical tabular data regime. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Low Difficulty summary: This paper creates a new way for computers to learn from tables of numbers (called tabular data). The problem is that these tables are different from each other, making it hard for computers to understand how to use the information. To solve this, the authors use a type of language model that can read and understand words in tables. However, these models have trouble with numbers, so they developed a new way to turn numbers into words that the computer can understand. The results show that their new method works well and is better than other ways computers are currently using to learn from tabular data. |
Keywords
* Artificial intelligence * Attention * Classification * Decision tree * Deep learning * Language model * Regression * Tokenization * Transferability