Summary of Unibias: Unveiling and Mitigating Llm Bias Through Internal Attention and Ffn Manipulation, by Hanzhang Zhou et al.

UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation

by Hanzhang Zhou, Zijian Feng, Zixiao Zhu, Junlang Qian, Kezhi Mao

First submitted to arxiv on: 31 May 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This research paper investigates the internal mechanisms that lead to bias in large language models (LLMs) using the in-context learning (ICL) paradigm. The authors identify how feedforward neural networks (FFNs) and attention heads contribute to this bias, which can result in prompt brittleness, or sensitivity to design settings such as example selection, order, and prompt formatting. To mitigate these biases, they propose UniBias, an inference-only method that effectively identifies and eliminates biased FFN vectors and attention heads. Experimental results across 12 NLP datasets show that UniBias significantly enhances ICL performance and alleviates prompt brittleness of LLMs.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This study looks at how language models learn and make predictions when given specific examples to work with. They find that the way these models process information can be biased, which means they might not give accurate results for certain types of questions or prompts. The researchers identify what parts of the model are causing this bias and develop a new method called UniBias to correct it. This helps language models make more accurate predictions and reduces their sensitivity to how they’re asked questions.

Keywords

» Artificial intelligence » Attention » Inference » Nlp » Prompt

UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation

by Hanzhang Zhou, Zijian Feng, Zixiao Zhu, Junlang Qian, Kezhi Mao

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Disrupting Diffusion: Token-level Attention Erasure Attack Against Diffusion-based Customization, by Yisu Liu et al.

Summary of Malt: Multi-scale Action Learning Transformer For Online Action Detection, by Zhipeng Yang et al.

Related Posts