Summary of House Of Cards: Massive Weights in Llms, by Jaehoon Oh et al.

House of Cards: Massive Weights in LLMs

by Jaehoon Oh, Seungjun Shin, Dokwan Oh

First submitted to arxiv on: 2 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed paper investigates the massive activations in large language models (LLMs), which lead to an overemphasis on specific tokens. The research reveals that these activations originate from intermediate states of feed-forward network modules in early layers, rather than hidden states. A key finding is that massive weights are responsible for these activations, and setting them to zero disrupts LLM functionality. However, setting all non-massive weights to zero only results in a minor performance drop. This suggests that the pre-training process focuses on learning massive weights. To alleviate this issue, the authors propose MacDrop (massive weights curriculum dropout), a simple method that applies dropout to pre-trained massive weights during fine-tuning, starting with high probabilities and decreasing them as training progresses. The paper demonstrates the effectiveness of MacDrop through various experiments, including zero-shot downstream tasks, long-context tasks, and ablation studies.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Large language models (LLMs) are a type of artificial intelligence that can process and understand human language. But did you know that these models have a hidden problem? They tend to focus too much on certain words or phrases, which can make them less useful for real-world applications. This is because the models get “stuck” in specific patterns or biases in their training data. The researchers in this paper looked at how LLMs work and found that they are indeed biased towards certain parts of language. They also proposed a simple solution to fix this problem, called MacDrop. This method helps the model learn more general knowledge by gradually reducing its reliance on these biases during fine-tuning.

Keywords

* Artificial intelligence * Dropout * Fine tuning * Zero shot

House of Cards: Massive Weights in LLMs

by Jaehoon Oh, Seungjun Shin, Dokwan Oh

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-context, by Spencer Frei and Gal Vardi

Summary of Social Media Authentication and Combating Deepfakes Using Semi-fragile Invisible Image Watermarking, by Aakash Varma Nadimpalli et al.

Related Posts