Loading Now

Summary of Your Mixture-of-experts Llm Is Secretly An Embedding Model For Free, by Ziyue Li et al.


Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

by Ziyue Li, Tianyi Zhou

First submitted to arxiv on: 14 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
MoE LLMs have been found to excel in generation tasks, but their potential as embedding models has been limited by their decoder-only architecture. Our study shows that expert routers in MoE LLMs can serve as off-the-shelf embedding models with promising performance on a diverse range of tasks without requiring any further finetuning. The routing weights (RW) of these models are found to be complementary to the hidden state (HS) of LLMs, a widely used embedding method. Compared to HS, RW is more robust and focuses on high-level semantics. We propose MoEE, combining RW and HS, which achieves better performance than using either separately. Our experiments on 6 tasks with 20 datasets from MTEB demonstrate the significant improvement brought by MoEE without further finetuning.
Low GrooveSquid.com (original content) Low Difficulty Summary
MoE LLMs are really smart at making things up, like sentences or stories. But they’re not very good at understanding what words mean, unless someone has already told them what those words mean. Our research shows that a special part of these language models can actually be used to understand what words mean without needing any extra training. This is cool because it means we can use these language models in new ways, like helping us understand what people are saying when they’re talking about something complicated. We tested this idea by using 6 different methods and 20 different groups of words to see how well our special method worked. And the results were really impressive!

Keywords

» Artificial intelligence  » Decoder  » Embedding  » Semantics