Summary of Unibias: Unveiling and Mitigating Llm Bias Through Internal Attention and Ffn Manipulation, by Hanzhang Zhou et al.
UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation
by Hanzhang Zhou, Zijian Feng, Zixiao Zhu, Junlang Qian, Kezhi Mao
First submitted to arxiv on: 31 May 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This research paper investigates the internal mechanisms that lead to bias in large language models (LLMs) using the in-context learning (ICL) paradigm. The authors identify how feedforward neural networks (FFNs) and attention heads contribute to this bias, which can result in prompt brittleness, or sensitivity to design settings such as example selection, order, and prompt formatting. To mitigate these biases, they propose UniBias, an inference-only method that effectively identifies and eliminates biased FFN vectors and attention heads. Experimental results across 12 NLP datasets show that UniBias significantly enhances ICL performance and alleviates prompt brittleness of LLMs. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This study looks at how language models learn and make predictions when given specific examples to work with. They find that the way these models process information can be biased, which means they might not give accurate results for certain types of questions or prompts. The researchers identify what parts of the model are causing this bias and develop a new method called UniBias to correct it. This helps language models make more accurate predictions and reduces their sensitivity to how they’re asked questions. |
Keywords
» Artificial intelligence » Attention » Inference » Nlp » Prompt